Re: Performace issue

2019-02-12 Thread Kunal Khatua
You'll need to edit the memory settings in DRILL_HOME/conf/drill-env.sh  I suspect that your 5MB JSON data might be having a lot of objects, which need to be serialized in memory. FLATTEN has the problem that it replicates the data parent data for each child node that is being flattened into a

Re: Performace issue

2019-02-12 Thread Sorabh Hamirwasia
Hi Praveen, Can you also share what is the schema of your entire dataset and in what format it's stored? Thanks, Sorabh On Tue, Feb 12, 2019 at 10:02 AM Kunal Khatua wrote: > You'll need to edit the memory settings in DRILL_HOME/conf/drill-env.sh > I suspect that your 5MB JSON data might be

Re: HDFS storage prefix returning Error: VALIDATION ERROR: null

2019-02-12 Thread Abhishek Girish
I meant for you to run show files in hdfs.tmp But it looks like the plugin might not be initialized correctly (check if the hostname provided in the connection string can be resolved) Or you may not have used the right user when launching sqlline (user may not have permissions on the hdfs root

Re: Performace issue

2019-02-12 Thread PRAVEEN DEVERACHETTY
Hi Sorabh, Data is in json string format, sent over rest api. Using convert_from function to convert json string to json array and flatten the result array into multiple rows. Data is not stored in the disk. All data is in the memory. Thanks, Praveen On Tue, Feb 12, 2019 at 11:49 PM Sorabh

Re: HDFS storage prefix returning Error: VALIDATION ERROR: null

2019-02-12 Thread Krishnanand Khambadkone
The command show files in dfs.tmp does return the right output. However when I try to run a simple hdfs query  select s.application_id  from  hdfs.`/user/hive/spark_data/dt=2019-01-25/part-4-ae91cbe2-5410-4bec-ad68-10a053fb2b68.json`   it returns,   Error: VALIDATION ERROR: Schema [[hdfs]]

Re: Performace issue

2019-02-12 Thread PRAVEEN DEVERACHETTY
Our json data has 5000 objects, each object has around 40 attributes. Our data does not have any child rows, the reason we are using FLATTEN because we are sending the data using rest api post method. Using CONVERT_FROM function to format it into json in the memory(no storage plugin), as it is an

Re: HDFS storage prefix returning Error: VALIDATION ERROR: null

2019-02-12 Thread Abhishek Girish
Hey Krishnanand, As mentioned by other folks in earlier threads, can you make sure to include ALL RELEVANT details in your emails? That includes the query, storage plugin configuration, data format, sample data / description of the data, the full log for the query failure? It's necessary if one

HDFS storage prefix returning Error: VALIDATION ERROR: null

2019-02-12 Thread Krishnanand Khambadkone
I have defined a hdfs storage type with all the required properties.  However, when I try to use that in the query it returns  Error: VALIDATION ERROR: null

Re: HDFS storage prefix returning Error: VALIDATION ERROR: null

2019-02-12 Thread Abhishek Girish
Can you please share the full error message (please see [1]) Also, can you please see if this works: show files in dfs.tmp; This is to check if the DFS plugin is successfully initialized and Drill can see the files on HDFS. And if that works, check if simpler queries on the data works: select *

Re: HDFS storage prefix returning Error: VALIDATION ERROR: null

2019-02-12 Thread Krishnanand Khambadkone
Here is the hdfs storage definition and query I am using.  Same query runs fine if run off local filesystem with dfs storage prefix.  All I am doing is swapping dfs for hdfs. {   "type": "file",   "connection": "hdfs://host18-namenode:8020/",   "config": null,   "workspaces": {     "tmp":