Re: Apache drill issue - cpu spiking to 100%

2018-10-22 Thread Kunal Khatua
Hi Bala/Srihari

 In the Drill query profile, I dont see the query running for more than a 
second. The example you gave is with 350msec. I'm not sure what the backend 
Mongo storage plugin is doing. You'll probably need to profile the Drillbit 
using something like Yourkit or JVisualVM to identify what threads are burning 
CPU and the possible hotspots. I dont see anything hinting at a problem. 

Kunal

On 10/20/2018 12:30:50 AM, Srihari Prabhakar  wrote:
Hi Kunal,

I have Web UI enabled, but queries are not being run from there. Queries are 
being run through JDBC. Attaching the Drill query, drill profile. Also I have 
enabled profiling on MongoDB and captured all the queries triggered by drill on 
MongoDB. I have grouped all MongoDB queries by shape and taken the count of 
each query.
If I understand correctly, drill doesn't store MongoDB schema, so one count 
query and one simple find on collection is expected, as it has to capture 
schema information. But, I see count and find being run 43 times each for one 
single query in drill.


On Sat, 20 Oct 2018 at 00:06, Kunal Khatua mailto:ku...@apache.org]> wrote:

Hi Bala

Can you share details of the profiles itself? It might be that the MongoDB 
storage plugin is translating the query into 100 mongo queries because of some 
(100?) specific filter criteria in the Drill query?

JVM Heap usage fluctuation would indicate frequent object creation and garbage 
collection, probably by the Mongo storage plugin itself. Are you running the 
query through the WebUI or via JDBC? As long as you are not seeing any GC logs 
indicating a leak in the heap memory, the heap usage fluctuating is normal for 
your 6GB heap allocation. 

If you are using the WebUI or REST API, it is possible that there is overhead 
in Drill rendering the resultset that can cause higher heap and CPU usage.


On 10/18/2018 1:10:25 PM, Balasubramanian Naganathan mailto:balsu1...@gmail.com]> wrote:
Hello,
We have tableau BI tool which is getting data from MongoDB using Apache
drill.
We are running drill on 5 nodes each having 8 core and 16 GB RAM, but they
are not running as a cluster. Each node is an individual instance. We have
a Load Balancer to load balance across these 5 nodes.
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/bin/java
-Xms6G -Xmx6G -XX:MaxDirectMemorySize=13G -XX:ReservedCodeCacheSize=1024m
-Ddrill.exec.enable-epoll=false -Dproperty=value -Duser.timezone=UTC
-XX:+UseStringDeduplication -Dproperty=value -Duser.timezone=UTC
-Duser.timezone=UTC -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=27017
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false -XX:+UseG1GC
-Dlog.path=/opt/apache-drill-1.14.0/log/drillbit.log
-Dlog.query.path=/opt/apache-drill-1.14.0/log/drillbit_queries.json -cp
/opt/apache-drill-1.14.0/conf:/opt/apache-drill-1.14.0/jars/*:/opt/apache-drill-1.14.0/jars/ext/*:/opt/apache-drill-1.14.0/jars/3rdparty/*:/opt/apache-drill-1.14.0/jars/classb/*:/opt/apache-drill-1.14.0/jars/3rdparty/linux/*
org.apache.drill.exec.server.Drillbit

We see that apache drill is reaching 100% CPU when we run 30 queries per
second. All the queries are very simple queries without any aggregation.
Also each query in Apache drill is getting converted to 100 queries in
Mongo. 97% of the queries are find(1) and count() mongoDB Queries. Not sure
why they are triggered.

Also we tried adding "-XX:+UseStringDeduplication", JVM parameter, We saw
the load become uneven after adding this parameter. Are there any other JVM
parameters that we can add to improve drill performance?
Even though there are no calls to drill, the heap memory usage is
fluctuating. It goes up to around 70% (~4GB) and comes back to around 1GB.

Thanks,
Bala


Re: Apache drill issue - cpu spiking to 100%

2018-10-20 Thread Srihari Prabhakar
Hi Kunal,

I have Web UI enabled, but queries are not being run from there. Queries
are being run through JDBC. Attaching the Drill query, drill profile. Also
I have enabled profiling on MongoDB and captured all the queries triggered
by drill on MongoDB. I have grouped all MongoDB queries by shape and taken
the count of each query.
If I understand correctly, drill doesn't store MongoDB schema, so one count
query and one simple find on collection is expected, as it has to capture
schema information. But, I see count and find being run 43 times each for
one single query in drill.

On Sat, 20 Oct 2018 at 00:06, Kunal Khatua  wrote:

> Hi Bala
>
> Can you share details of the profiles itself? It might be that the
> MongoDB storage plugin is translating the query into 100 mongo queries
> because of some (100?) specific filter criteria in the Drill query?
>
> JVM Heap usage fluctuation would indicate frequent object creation and
> garbage collection, probably by the Mongo storage plugin itself. Are you
> running the query through the WebUI or via JDBC? As long as you are not
> seeing any GC logs indicating a leak in the heap memory, the heap usage
> fluctuating is normal for your 6GB heap allocation.
>
> If you are using the WebUI or REST API, it is possible that there is
> overhead in Drill rendering the resultset that can cause higher heap and
> CPU usage.
>
>
> On 10/18/2018 1:10:25 PM, Balasubramanian Naganathan 
> wrote:
> Hello,
> We have tableau BI tool which is getting data from MongoDB using Apache
> drill.
> We are running drill on 5 nodes each having 8 core and 16 GB RAM, but they
> are not running as a cluster. Each node is an individual instance. We have
> a Load Balancer to load balance across these 5 nodes.
> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/bin/java
> -Xms6G -Xmx6G -XX:MaxDirectMemorySize=13G -XX:ReservedCodeCacheSize=1024m
> -Ddrill.exec.enable-epoll=false -Dproperty=value -Duser.timezone=UTC
> -XX:+UseStringDeduplication -Dproperty=value -Duser.timezone=UTC
> -Duser.timezone=UTC -Dcom.sun.management.jmxremote
> -Dcom.sun.management.jmxremote.port=27017
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.ssl=false -XX:+UseG1GC
> -Dlog.path=/opt/apache-drill-1.14.0/log/drillbit.log
> -Dlog.query.path=/opt/apache-drill-1.14.0/log/drillbit_queries.json -cp
> /opt/apache-drill-1.14.0/conf:/opt/apache-drill-1.14.0/jars/*:/opt/apache-drill-1.14.0/jars/ext/*:/opt/apache-drill-1.14.0/jars/3rdparty/*:/opt/apache-drill-1.14.0/jars/classb/*:/opt/apache-drill-1.14.0/jars/3rdparty/linux/*
>
> org.apache.drill.exec.server.Drillbit
>
> We see that apache drill is reaching 100% CPU when we run 30 queries per
> second. All the queries are very simple queries without any aggregation.
> Also each query in Apache drill is getting converted to 100 queries in
> Mongo. 97% of the queries are find(1) and count() mongoDB Queries. Not
> sure
> why they are triggered.
>
> Also we tried adding "-XX:+UseStringDeduplication", JVM parameter, We saw
> the load become uneven after adding this parameter. Are there any other
> JVM
> parameters that we can add to improve drill performance?
> Even though there are no calls to drill, the heap memory usage is
> fluctuating. It goes up to around 70% (~4GB) and comes back to around 1GB.
>
> Thanks,
> Bala
>
>


mongo_query_counts.json
Description: application/json


drill_profile.json
Description: application/json


Re: Apache drill issue - cpu spiking to 100%

2018-10-19 Thread Kunal Khatua
Hi Bala

Can you share details of the profiles itself? It might be that the MongoDB 
storage plugin is translating the query into 100 mongo queries because of some 
(100?) specific filter criteria in the Drill query?

JVM Heap usage fluctuation would indicate frequent object creation and garbage 
collection, probably by the Mongo storage plugin itself. Are you running the 
query through the WebUI or via JDBC? As long as you are not seeing any GC logs 
indicating a leak in the heap memory, the heap usage fluctuating is normal for 
your 6GB heap allocation. 

If you are using the WebUI or REST API, it is possible that there is overhead 
in Drill rendering the resultset that can cause higher heap and CPU usage.


On 10/18/2018 1:10:25 PM, Balasubramanian Naganathan  
wrote:
Hello,
We have tableau BI tool which is getting data from MongoDB using Apache
drill.
We are running drill on 5 nodes each having 8 core and 16 GB RAM, but they
are not running as a cluster. Each node is an individual instance. We have
a Load Balancer to load balance across these 5 nodes.
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/bin/java
-Xms6G -Xmx6G -XX:MaxDirectMemorySize=13G -XX:ReservedCodeCacheSize=1024m
-Ddrill.exec.enable-epoll=false -Dproperty=value -Duser.timezone=UTC
-XX:+UseStringDeduplication -Dproperty=value -Duser.timezone=UTC
-Duser.timezone=UTC -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=27017
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false -XX:+UseG1GC
-Dlog.path=/opt/apache-drill-1.14.0/log/drillbit.log
-Dlog.query.path=/opt/apache-drill-1.14.0/log/drillbit_queries.json -cp
/opt/apache-drill-1.14.0/conf:/opt/apache-drill-1.14.0/jars/*:/opt/apache-drill-1.14.0/jars/ext/*:/opt/apache-drill-1.14.0/jars/3rdparty/*:/opt/apache-drill-1.14.0/jars/classb/*:/opt/apache-drill-1.14.0/jars/3rdparty/linux/*
org.apache.drill.exec.server.Drillbit

We see that apache drill is reaching 100% CPU when we run 30 queries per
second. All the queries are very simple queries without any aggregation.
Also each query in Apache drill is getting converted to 100 queries in
Mongo. 97% of the queries are find(1) and count() mongoDB Queries. Not sure
why they are triggered.

Also we tried adding "-XX:+UseStringDeduplication", JVM parameter, We saw
the load become uneven after adding this parameter. Are there any other JVM
parameters that we can add to improve drill performance?
Even though there are no calls to drill, the heap memory usage is
fluctuating. It goes up to around 70% (~4GB) and comes back to around 1GB.

Thanks,
Bala