First off, this is my first attempt at drill,
(BTW: congratulations on the release ;-)
so perhaps I misunderstood something
I want to query my parquet files on HDFS.
I setup the 1.0 release on a machine (node1)
that already had CDH5 and a working Zookeeper.
With the hdfs storage plugin config below I can query a parquet file
on the local machine just fine.
E.g.:
0: jdbc:drill:drillbit=localhost> select a,b,c FROM hdfs.`/hdfs/path/test.par`
limit 5;
## drill-override.conf
drill.exec: {
cluster-id: "mydrillcluster",
zk.connect: "node1:2181"
}
## storage plugin config
{
"type": "file",
"enabled": true,
"connection": "hdfs://127.0.0.1:8020/",
"workspaces": null,
"formats": {
"parquet": {
"type": "parquet"
}
}
}
Can I query a remote HDFS, by simply pointing the storage plugin config?
After changing the IP address in the connection parameter above, I get this
error.
0: jdbc:drill:drillbit=localhost> select a,b,c FROM hdfs.`/tmp/test.par` limit
5;
Error: PARSE ERROR: From line 1, column 38 to line 1, column 41: Table
'hdfs./tmp/test.par' not found
[Error Id: 4156f66c-3dac-4e87-b7f8-f0bdc19d57d7 on node1.company.com:31010]
(state=,code=0)
.....
Caused by: org.apache.drill.common.exceptions.UserRemoteException: PARSE ERROR:
From line 1, column 38 to line 1, column 41: Table 'hdfs./tmp/test.par' not
found
But the namenode:port/path is correct because this workds from node1:
[alan@node1 drill]$ hdfs dfs -fs hdfs://10.10.10.10:8020/ -ls /tmp/test.par
-rw-r--r-- 1 alan supergroup 4947359 2015-05-21 13:55 /tmp/test.par
Alan