Here is the hdfs storage definition and query I am using. Same query runs
fine if run off local filesystem with dfs storage prefix. All I am doing is
swapping dfs for hdfs.
{
"type": "file",
"connection": "hdfs://host18-namenode:8020/",
"config": null,
"workspaces": {
"tmp": {
"location": "/tmp",
"writable": true,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
},
"root": {
"location": "/",
"writable": false,
"defaultInputFormat": null,
"allowAccessOutsideWorkspace": false
}
},
"formats": null,
"enabled": true
}
select s.application_id, get_spark_attrs(s.spark_event,'spark.executor.memory')
as spark_attributes from
hdfs.`/user/hive/spark_data/dt=2019-01-25/part-00004-ae91cbe2-5410-4bec-ad68-10a053fb2b68.json`
s where (REGEXP_REPLACE(REGEXP_REPLACE(substr(s.spark_event,11),
'[^0-9A-Za-z]"', ''),'(".*)','') = 'SparkListenerEnvironmentUpdate' or
REGEXP_REPLACE(REGEXP_REPLACE(substr(s.spark_event,11), '[^0-9A-Za-z]"',
''),'(".*)','') = 'SparkListenerApplicationStart' or
REGEXP_REPLACE(REGEXP_REPLACE(substr(s.spark_event,11), '[^0-9A-Za-z]"',
''),'(".*)','') = 'SparkListenerApplicationEnd') group by application_id,
spark_attributes order by application_id;
On Tuesday, February 12, 2019, 3:04:40 PM PST, Abhishek Girish
<[email protected]> wrote:
This message is eligible for Automatic Cleanup! ([email protected]) Add
cleanup rule | More info
Hey Krishnanand,
As mentioned by other folks in earlier threads, can you make sure to
include ALL RELEVANT details in your emails? That includes the query,
storage plugin configuration, data format, sample data / description of the
data, the full log for the query failure? It's necessary if one needs to be
able to understand the issue or offer help.
Regards,
Abhishek
On Tue, Feb 12, 2019 at 2:37 PM Krishnanand Khambadkone
<[email protected]> wrote:
> I have defined a hdfs storage type with all the required properties.
> However, when I try to use that in the query it returns
> Error: VALIDATION ERROR: null
>