RE: HDFS file is listable but not queryable (object not found)

2020-07-27 Thread Updike, Clark
Thanks Paul. I just realized I had missed this email earlier. Given the uncertainty around the issue, and the possibility that it is more than one issue, I think I will hold off for now on digging deeper into it. I was trying to run Drill against HDFS to collect a performance baseline as we

Re: RE: HDFS file is listable but not queryable (object not found)

2020-07-27 Thread Updike, Clark
If my setup is not valid, can someone elaborate on what is required? User impersonation? Running kerberos between the client and the drillbits? Again, hdfs requires kerberos but I don't need it for Drill itself. Thanks, Clark On 7/24/20, 11:42 AM, "Updike, Clark" wrote: Yes, I've

Re: Re: HDFS file is listable but not queryable (object not found)

2020-07-25 Thread Paul Rogers
Hi Clark, This is a hard one. On the one hand, the "SASL" part of the data node log messages suggests that Drill tried to do a data node operation, and it failed for security reasons. But, we can't be sure if the two are connected. On the other hand, the stack trace does not show the entries

Re: Re: HDFS file is listable but not queryable (object not found)

2020-07-24 Thread Updike, Clark
Yes, I've read that page but it wasn't clear to me how much of it applied. I don't need kerberos auth from the client to the drillbits. But the drillbits must use kerberos auth when interacting with hdfs. By putting the principal and keytab info into the drillbit config

Re: HDFS file is listable but not queryable (object not found)

2020-07-24 Thread Charles Givre
Hey Clark, Have you gone through this: https://drill.apache.org/docs/configuring-kerberos-security/ As Paul indicated, this does seem like the likely suspect as to why this isn't working or at least the next thing to verify. I'm

Re: RE: HDFS file is listable but not queryable (object not found)

2020-07-24 Thread Updike, Clark
Using CDH version of 2.6.0. I was not able to find any errors on the Drill side besides what I already provided from sqlline. However, I did find an exception on some of the datanodes (below). Everything works find using hdfs cli commands (ls, get, cat). I have set up

Re: RE: HDFS file is listable but not queryable (object not found)

2020-07-23 Thread Paul Rogers
Hi Clark, Security was going to be my next question. The stack trace didn't look like one where the file open would fail: the planner doesn't actually open a JSON file. There is no indication of the HDFS call that might have failed. Another question is: what version of HDFS are you using? I

Re: RE: HDFS file is listable but not queryable (object not found)

2020-07-23 Thread Updike, Clark
I should mention that this is a kerberized HDFS cluster. I'm still not sure why the SHOW FILES would work but the query would not--but it could be behind the issue somehow. On 7/23/20, 2:18 PM, "Updike, Clark" wrote: No change unfortunately: apache drill> select * from

Re: RE: HDFS file is listable but not queryable (object not found)

2020-07-23 Thread Updike, Clark
No change unfortunately: apache drill> select * from hdfs.`root`.`/tmp/employee.json`; Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 18: Object '/tmp/employee.json' not found within 'hdfs.root' On 7/23/20, 2:11 PM, "Paul Rogers" wrote: Hi Clark, Try using

Re: Re: HDFS file is listable but not queryable (object not found)

2020-07-23 Thread Paul Rogers
Hi Clark, Try using `hdfs`.`root` rather than `hdfs.root`. Calcite wants to walk down `hdfs` then `root`. There is no workspace called `hdfs.root`. Thanks, - Paul On Thu, Jul 23, 2020 at 8:58 AM Updike, Clark wrote: > Oops, sorry. No luck there either unfortunately: > > apache drill> SELECT

Re: Re: HDFS file is listable but not queryable (object not found)

2020-07-23 Thread Updike, Clark
Oops, sorry. No luck there either unfortunately: apache drill> SELECT * FROM hdfs.`/tmp/employee.json`; Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 18: Object '/tmp/employee.json' not found within 'hdfs' On 7/23/20, 11:52 AM, "Charles Givre" wrote: Oh.. I

Re: HDFS file is listable but not queryable (object not found)

2020-07-23 Thread Charles Givre
Oh.. I meant: SELECT * FROM hdfs.`/tmp/employee.json` > On Jul 23, 2020, at 11:41 AM, Updike, Clark wrote: > > No change unfortunately... > > $ hdfs dfs -ls hdfs://nn01:8020/tmp/employee.json > -rw-r--r-- 2 me supergroup 474630 2020-07-23 10:53 > hdfs://nn01:8020/tmp/employee.json >

Re: Re: HDFS file is listable but not queryable (object not found)

2020-07-23 Thread Updike, Clark
Sorry if my use of hdfs as the name caused any confusion. I simply copied the dfs plugin to hdfs to make it clear what it was, but otherwise, it is essentially the same as the dfs with just the tweaks for hdfs, viz: { "type": "file", "connection": "hdfs://nn01:8020", "config": null,

Re: HDFS file is listable but not queryable (object not found)

2020-07-23 Thread Rafael Jaimes III
Right, but do you need the rest of the config at the top of the dfs default config? Here's what I assume to be the full config taken from my 1.17 dfs config (with other formats deleted): { "type": "file", "connection": "file:///", "config": null, "workspaces": { "tmp": {

Re: Re: HDFS file is listable but not queryable (object not found)

2020-07-23 Thread Updike, Clark
No change unfortunately... $ hdfs dfs -ls hdfs://nn01:8020/tmp/employee.json -rw-r--r-- 2 me supergroup 474630 2020-07-23 10:53 hdfs://nn01:8020/tmp/employee.json apache drill> select * from hdfs.root.`hdfs://nn01:8020/tmp/employee.json`; Error: VALIDATION ERROR: From line 1, column 15 to

Re: HDFS file is listable but not queryable (object not found)

2020-07-23 Thread Charles Givre
Rafael, Clark is using the filesystem plugin to query a Hadoop cluster. It seems weird that you can enumerate the files in a directory but when you try to query that file, it breaks... -- C > On Jul 23, 2020, at 11:35 AM, Rafael Jaimes III wrote: > > Hi all, > > It looks like the file

Re: HDFS file is listable but not queryable (object not found)

2020-07-23 Thread Charles Givre
Hi Clark, That's strange. My initial thought is that this could be a permission issue. However, it might also be that Drill isn't finding the file for some reason. Could you try: SELECT * FROM hdfs.`` Best, --- C > On Jul 23, 2020, at 11:23 AM, Updike, Clark wrote: > > This is in

HDFS file is listable but not queryable (object not found)

2020-07-23 Thread Updike, Clark
This is in 1.17. I can use SHOW FILES to list the file I'm targeting, but I cannot query it: apache drill> show files in hdfs.root.`/tmp/employee.json`; +---+-+++--++-+-+-+ |