I know there was recently a patch around Mongo slowness with regards to a
bug in the reader; however, the querying is still fairly slow when compared
to Mongo's aggregation framework itself (in our tests 5-10 times slower).
- What kind of queries are you running? I would not be surprised if Drill
was slower on aggregation queries due to limited operator pushdown support.

Do you guys think this could be valid, can you think of anything else that
might be slowing Mongo down (apart from the obvious network
communication/transfer etc.)
- Likely. Mongo plugin relies on Drill's native JSON at the expense of
additional processing overhead - I think. We should be able to vectorize
records directly from BSON. I am not sure how much of existing JSON reader
code we can re-use in this case though. It would be nice if we had common
abstractions in place for processing JSON-like records.

and could you suggest a way we could validate what part of it is slow?
- You can check query profiles [at http://drill-host:8047] to compare how
long each operator/query takes. In case of BSON -> JSON string -> vector
transformation you should specifically look at scan operator timings.


Regards.
-Hanifi

On Tue, May 5, 2015 at 7:50 PM, Adam Gilmore <[email protected]> wrote:

> Hi guys,
>
> I know there was recently a patch around Mongo slowness with regards to a
> bug in the reader; however, the querying is still fairly slow when compared
> to Mongo's aggregation framework itself (in our tests 5-10 times slower).
>
> My guess is this is due to the fact we serialize BSON to JSON and then
> parse JSON to Drill's vectors.  I haven't confirmed my hunch, but it seems
> almost certainly that this would be a cause for potential performance loss.
>
> Ideally, I think the BSON should be parsed directly into Drill's vectors,
> rather than using the JSON reader.
>
> Do you guys think this could be valid, can you think of anything else that
> might be slowing Mongo down (apart from the obvious network
> communication/transfer etc.) and could you suggest a way we could validate
> what part of it is slow?
>

Reply via email to