You'll need to edit the memory settings in DRILL_HOME/conf/drill-env.sh
I suspect that your 5MB JSON data might be having a lot of objects, which need
to be serialized in memory.
FLATTEN has the problem that it replicates the data parent data for each child
node that is being flattened into a
Hi Praveen,
Can you also share what is the schema of your entire dataset and in what
format it's stored?
Thanks,
Sorabh
On Tue, Feb 12, 2019 at 10:02 AM Kunal Khatua wrote:
> You'll need to edit the memory settings in DRILL_HOME/conf/drill-env.sh
> I suspect that your 5MB JSON data might be
I meant for you to run
show files in hdfs.tmp
But it looks like the plugin might not be initialized correctly (check if
the hostname provided in the connection string can be resolved)
Or you may not have used the right user when launching sqlline (user may
not have permissions on the hdfs root
Hi Sorabh, Data is in json string format, sent over rest api. Using
convert_from function to convert json string to json array and flatten the
result array into multiple rows. Data is not stored in the disk. All data
is in the memory.
Thanks,
Praveen
On Tue, Feb 12, 2019 at 11:49 PM Sorabh
The command show files in dfs.tmp does return the right output.
However when I try to run a simple hdfs query
select s.application_id from
hdfs.`/user/hive/spark_data/dt=2019-01-25/part-4-ae91cbe2-5410-4bec-ad68-10a053fb2b68.json`
it returns,
Error: VALIDATION ERROR: Schema [[hdfs]]
Our json data has 5000 objects, each object has around 40 attributes. Our
data does not have any child rows, the reason we are using FLATTEN because
we are sending the data using rest api post method. Using CONVERT_FROM
function to format it into json in the memory(no storage plugin), as it is
an
Hey Krishnanand,
As mentioned by other folks in earlier threads, can you make sure to
include ALL RELEVANT details in your emails? That includes the query,
storage plugin configuration, data format, sample data / description of the
data, the full log for the query failure? It's necessary if one
I have defined a hdfs storage type with all the required properties. However,
when I try to use that in the query it returns
Error: VALIDATION ERROR: null
Can you please share the full error message (please see [1])
Also, can you please see if this works: show files in dfs.tmp; This is to
check if the DFS plugin is successfully initialized and Drill can see the
files on HDFS. And if that works, check if simpler queries on the data
works: select *
Here is the hdfs storage definition and query I am using. Same query runs
fine if run off local filesystem with dfs storage prefix. All I am doing is
swapping dfs for hdfs.
{
"type": "file",
"connection": "hdfs://host18-namenode:8020/",
"config": null,
"workspaces": {
"tmp":
10 matches
Mail list logo