Hi Adam, We found out an issue in MongoRecordReader (DRILL-1971 <https://issues.apache.org/jira/browse/DRILL-1971>), for slowness. Updated patch for this.
On Wed, Jan 7, 2015 at 2:15 PM, Adam Gilmore <[email protected]> wrote: > Yes - sorry - I added the group by to make it do something a bit more than > just a count - a count returns very quickly. > > It does look like it's trying to stream all events into Drill, but either > way, I only have 1M documents - it should be a bit faster than minutes, > right? Whereas I can export all documents using mongoexport, for example, > in a matter of seconds. > > On Wed, Jan 7, 2015 at 10:20 AM, Andries Engelbrecht < > [email protected]> wrote: > > > The query seems a bit strange as the count(*) will only return the number > > of records, yet there is a group by clause which may confuse the > optimizer. > > > > Look at the query plan and see if the Mongo storage plug in is > potentially > > sending all the records to Drill. > > > > Mongo will likely do a simple record count that will return quickly. > > > > Try the query without the group by clause in Drill, it hold return very > > quickly. Will also be interesting to compare query plans in Drill. > > > > —Andries > > > > > > On Jan 6, 2015, at 4:08 PM, Adam Gilmore <[email protected]> wrote: > > > > > Hi all, > > > > > > I'm trying to test out Mongo with Drill but seem to be running into > very > > > slow performance. > > > > > > I have about 1M documents loaded into Mongo, and I'm doing something as > > > simple as: > > > > > > select count(*) from mongo.`connect`.events group by collection; > > > > > > where "collection" is a string field in the document. > > > > > > This takes minutes to complete, which to me seems very strange. > > > > > > Any ideas why this would be that slow? I can run an identical query > > > directly on Mongo and it returns in sub-second time. > > > > > -- Kamesh.
