Drill does not release DB connections when a storage plugin is deleted

2019-02-13 Thread Rahul Raj
Drill does not release the database connection when we disable or delete a storage plugin. It can be seen with "lsof -i -p <> | grep <>" that the number of socket connections to db host keep increasing if we keep updating an existing storage plugin. JdbcStoragePlugin does not override

RE: Performace issue

2019-02-13 Thread Lee, David
These days.. I use Python to read a jsonl file line by line into a python dictionary. Convert the data (flatten, etc..) into a tabular record set. Write the tabular data into parquet. Read parquet using tools like Drill, etc. JSON support is lacking in Drill and other tools because these SQL

Re: Performace issue

2019-02-13 Thread PRAVEEN DEVERACHETTY
HI Sorabh, Here is the sample query passed using REST API. This query is sent as a body and submit job(REST). convert_from function convert json string to json string object. Then flatten on the resultset defined in you query. Please let me know if anythng else is required. Our main gao SELECT

S3 plugin configuration problem cause wrong constructed hostname

2019-02-13 Thread Updike, Clark
I'm trying to run Drill 1.15 against an S3 compatible Minio instance following the steps described here: https://blog.minio.io/query-minio-datastore-with-apache-drill-dcaf71d0cee5 So on my Minio server, minio1, I have a bucket, drillbucket1. I'm putting in core-site.xml that my fs.s3a.endpoint

Re: Multiple fragments in apache drill

2019-02-13 Thread Kunal Khatua
Hi Hugues The number of fragments is determined by the number of sources (i.e. whether the data can be read in parallel) and the number of estimated rows. CSV and Parquet files are easy to read in parallel, but JSON files are not, because Drill does not know how many JSON documents exist in the

Re: Performace issue

2019-02-13 Thread Sorabh Hamirwasia
Hi Praveen, I am probably missing something here because I don't understand how are you feeding data to Drill in memory using the rest api. As you mentioned data has to be stored on disk or some db for Drill to fetch it. Can you please share the query profile for your query ? P.S. Attachments are

Re: Performace issue

2019-02-13 Thread PRAVEEN DEVERACHETTY
As per my understanding with Apache drill, it is based on the file store only. Please help me if i can create any plugins for the following use case 1. Create a json object and push to Apache drill in memory(cache). I can create json object in java, and if any api available from drill to push this

Multiple fragments in apache drill

2019-02-13 Thread Kwizera hugues Teddy
Hello Team drill, I'm executing a query in Apache drill cluster, however, it is making only 1 minor segment. I have tried various queries like union of 2 queries , aggragation etc, and executing it on millions records however it is still making 1 fragment only. Is there any configuration change

Re: Compatibility Matrix for the ODBC/JDBC clients

2019-02-13 Thread Bob Rudis
Hey Joel… My experience has been the JDBC-side was more tolerant than the free ODBC driver. It took a bit for the ODBC driver to be released after Drill 1.13.0 and the macOS and Linux ODBC versions (I don't use legacy Windows operating systems) kept tossing messages in logs to the point where

Compatibility Matrix for the ODBC/JDBC clients

2019-02-13 Thread Joel Pfaff
Hello, Does Drill mandate the perfect alignment of the Drillbits versions and the ODBC/JDBC client versions, or is it possible to use an older client with a new server or vice versa? Regards, Joel

Re: HDFS storage prefix returning Error: VALIDATION ERROR: null

2019-02-13 Thread Arjun kr
Just wanted to confirm on the name node URI. Can you verify if 8020 is your namenode ipc port? May be you can run 'hadoop fs -ls hdfs://host18-namenode:8020/tmp' and verify it? Get Outlook for Android From: Abhishek Girish Sent: Tuesday, February 12, 11:37 PM Subject: