Hi Kunal,

I have Web UI enabled, but queries are not being run from there. Queries
are being run through JDBC. Attaching the Drill query, drill profile. Also
I have enabled profiling on MongoDB and captured all the queries triggered
by drill on MongoDB. I have grouped all MongoDB queries by shape and taken
the count of each query.
If I understand correctly, drill doesn't store MongoDB schema, so one count
query and one simple find on collection is expected, as it has to capture
schema information. But, I see count and find being run 43 times each for
one single query in drill.

On Sat, 20 Oct 2018 at 00:06, Kunal Khatua <ku...@apache.org> wrote:

> Hi Bala
>
> Can you share details of the profiles itself? It might be that the
> MongoDB storage plugin is translating the query into 100 mongo queries
> because of some (100?) specific filter criteria in the Drill query?
>
> JVM Heap usage fluctuation would indicate frequent object creation and
> garbage collection, probably by the Mongo storage plugin itself. Are you
> running the query through the WebUI or via JDBC? As long as you are not
> seeing any GC logs indicating a leak in the heap memory, the heap usage
> fluctuating is normal for your 6GB heap allocation.
>
> If you are using the WebUI or REST API, it is possible that there is
> overhead in Drill rendering the resultset that can cause higher heap and
> CPU usage.
>
>
> On 10/18/2018 1:10:25 PM, Balasubramanian Naganathan <balsu1...@gmail.com>
> wrote:
> Hello,
> We have tableau BI tool which is getting data from MongoDB using Apache
> drill.
> We are running drill on 5 nodes each having 8 core and 16 GB RAM, but they
> are not running as a cluster. Each node is an individual instance. We have
> a Load Balancer to load balance across these 5 nodes.
> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/bin/java
> -Xms6G -Xmx6G -XX:MaxDirectMemorySize=13G -XX:ReservedCodeCacheSize=1024m
> -Ddrill.exec.enable-epoll=false -Dproperty=value -Duser.timezone=UTC
> -XX:+UseStringDeduplication -Dproperty=value -Duser.timezone=UTC
> -Duser.timezone=UTC -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.port=27017
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.ssl=false -XX:+UseG1GC
> -Dlog.path=/opt/apache-drill-1.14.0/log/drillbit.log
> -Dlog.query.path=/opt/apache-drill-1.14.0/log/drillbit_queries.json -cp
> /opt/apache-drill-1.14.0/conf:/opt/apache-drill-1.14.0/jars/*:/opt/apache-drill-1.14.0/jars/ext/*:/opt/apache-drill-1.14.0/jars/3rdparty/*:/opt/apache-drill-1.14.0/jars/classb/*:/opt/apache-drill-1.14.0/jars/3rdparty/linux/*
>
> org.apache.drill.exec.server.Drillbit
>
> We see that apache drill is reaching 100% CPU when we run 30 queries per
> second. All the queries are very simple queries without any aggregation.
> Also each query in Apache drill is getting converted to 100 queries in
> Mongo. 97% of the queries are find(1) and count() mongoDB Queries. Not
> sure
> why they are triggered.
>
> Also we tried adding "-XX:+UseStringDeduplication", JVM parameter, We saw
> the load become uneven after adding this parameter. Are there any other
> JVM
> parameters that we can add to improve drill performance?
> Even though there are no calls to drill, the heap memory usage is
> fluctuating. It goes up to around 70% (~4GB) and comes back to around 1GB.
>
> Thanks,
> Bala
>
>

Attachment: mongo_query_counts.json
Description: application/json

Attachment: drill_profile.json
Description: application/json

Reply via email to