Hi Kunal, I have Web UI enabled, but queries are not being run from there. Queries are being run through JDBC. Attaching the Drill query, drill profile. Also I have enabled profiling on MongoDB and captured all the queries triggered by drill on MongoDB. I have grouped all MongoDB queries by shape and taken the count of each query. If I understand correctly, drill doesn't store MongoDB schema, so one count query and one simple find on collection is expected, as it has to capture schema information. But, I see count and find being run 43 times each for one single query in drill.
On Sat, 20 Oct 2018 at 00:06, Kunal Khatua <ku...@apache.org> wrote: > Hi Bala > > Can you share details of the profiles itself? It might be that the > MongoDB storage plugin is translating the query into 100 mongo queries > because of some (100?) specific filter criteria in the Drill query? > > JVM Heap usage fluctuation would indicate frequent object creation and > garbage collection, probably by the Mongo storage plugin itself. Are you > running the query through the WebUI or via JDBC? As long as you are not > seeing any GC logs indicating a leak in the heap memory, the heap usage > fluctuating is normal for your 6GB heap allocation. > > If you are using the WebUI or REST API, it is possible that there is > overhead in Drill rendering the resultset that can cause higher heap and > CPU usage. > > > On 10/18/2018 1:10:25 PM, Balasubramanian Naganathan <balsu1...@gmail.com> > wrote: > Hello, > We have tableau BI tool which is getting data from MongoDB using Apache > drill. > We are running drill on 5 nodes each having 8 core and 16 GB RAM, but they > are not running as a cluster. Each node is an individual instance. We have > a Load Balancer to load balance across these 5 nodes. > /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/bin/java > -Xms6G -Xmx6G -XX:MaxDirectMemorySize=13G -XX:ReservedCodeCacheSize=1024m > -Ddrill.exec.enable-epoll=false -Dproperty=value -Duser.timezone=UTC > -XX:+UseStringDeduplication -Dproperty=value -Duser.timezone=UTC > -Duser.timezone=UTC -Dcom.sun.management.jmxremote > -Dcom.sun.management.jmxremote.port=27017 > -Dcom.sun.management.jmxremote.authenticate=false > -Dcom.sun.management.jmxremote.ssl=false -XX:+UseG1GC > -Dlog.path=/opt/apache-drill-1.14.0/log/drillbit.log > -Dlog.query.path=/opt/apache-drill-1.14.0/log/drillbit_queries.json -cp > /opt/apache-drill-1.14.0/conf:/opt/apache-drill-1.14.0/jars/*:/opt/apache-drill-1.14.0/jars/ext/*:/opt/apache-drill-1.14.0/jars/3rdparty/*:/opt/apache-drill-1.14.0/jars/classb/*:/opt/apache-drill-1.14.0/jars/3rdparty/linux/* > > org.apache.drill.exec.server.Drillbit > > We see that apache drill is reaching 100% CPU when we run 30 queries per > second. All the queries are very simple queries without any aggregation. > Also each query in Apache drill is getting converted to 100 queries in > Mongo. 97% of the queries are find(1) and count() mongoDB Queries. Not > sure > why they are triggered. > > Also we tried adding "-XX:+UseStringDeduplication", JVM parameter, We saw > the load become uneven after adding this parameter. Are there any other > JVM > parameters that we can add to improve drill performance? > Even though there are no calls to drill, the heap memory usage is > fluctuating. It goes up to around 70% (~4GB) and comes back to around 1GB. > > Thanks, > Bala > >
mongo_query_counts.json
Description: application/json
drill_profile.json
Description: application/json