The Mongo storage plugin works best if you allow it to push down some filtering 
to Mongo instead of using it to pump all data to Drill.

Try select count(collection) from mongo.`connect`.events group by collection;

Check to see if field filtering is pushed down to Mongo instead of just Mongo 
pumping all the data in the collection to Drill.

Using the Mongo plugin does provide the benefit of predicate filtering, but 
also the filtering only the fields required out of documents in a collection.

It is also best to check that the drill bits are matched up with Mongo shards 
for locality, don’t know how your environment is configured.


On Jan 7, 2015, at 12:45 AM, Adam Gilmore <[email protected]> wrote:

> Yes - sorry - I added the group by to make it do something a bit more than
> just a count - a count returns very quickly.
> 
> It does look like it's trying to stream all events into Drill, but either
> way, I only have 1M documents - it should be a bit faster than minutes,
> right?  Whereas I can export all documents using mongoexport, for example,
> in a matter of seconds.
> 
> On Wed, Jan 7, 2015 at 10:20 AM, Andries Engelbrecht <
> [email protected]> wrote:
> 
>> The query seems a bit strange as the count(*) will only return the number
>> of records, yet there is a group by clause which may confuse the optimizer.
>> 
>> Look at the query plan and see if the Mongo storage plug in is potentially
>> sending all the records to Drill.
>> 
>> Mongo will likely do a simple record count that will return quickly.
>> 
>> Try the query without the group by clause in Drill, it hold return very
>> quickly. Will also be interesting to compare query plans in Drill.
>> 
>> —Andries
>> 
>> 
>> On Jan 6, 2015, at 4:08 PM, Adam Gilmore <[email protected]> wrote:
>> 
>>> Hi all,
>>> 
>>> I'm trying to test out Mongo with Drill but seem to be running into very
>>> slow performance.
>>> 
>>> I have about 1M documents loaded into Mongo, and I'm doing something as
>>> simple as:
>>> 
>>> select count(*) from mongo.`connect`.events group by collection;
>>> 
>>> where "collection" is a string field in the document.
>>> 
>>> This takes minutes to complete, which to me seems very strange.
>>> 
>>> Any ideas why this would be that slow?  I can run an identical query
>>> directly on Mongo and it returns in sub-second time.
>> 
>> 

Reply via email to