Re: About integration of drill and arrow

Paul Rogers Wed, 15 Jan 2020 19:10:11 -0800

Hi Ted,

Thanks much for the feedback! Good test cases indeed. The good news is that 
we're close to finishing a "V2" JSON reader that smooths over a few more JSON 
quirks like the "first column null" issue that can cause problems:

{n: 1, v: null}{n: 2, v: "Gotcha! Wasn't Int, is actually VARCHAR!"}

Glad your queries work. You gave an example that had fooled me multiple times:

select nest.u from dfs.root.`/Users/tdunning/x.json`;

The trick here is that Drill has no schema. All the parser can tell is, "hey, 
I've got a two-part name, `nest.u`. For me a two part name means schema.table 
or table.column, so, since `nest` isn't a schema, it must be a table.Oh, look, 
no such table exists. FAIL!" Using a three-part name works (IIRC):

select t.nest.u from dfs.root.`/Users/tdunning/x.json` t;

Now Drill sees that `t` is a table name, and works its way down from there.

If Drill had a schema, then the planner could first check if `nest` is a 
schema, then if it is a table, then if it is a structured field in the query. 
Impala can do this because it has a schema; Drill can't. We can hope that, with 
the new schema work being added to Drill, that your query will "do the right 
thing" in the future.

Adding `columns` to your query won't help: the `columns` name is valid in only 
one place: when working with CSV (or, more generally, delimited) data with no 
headers.

This gets back to Jaing's point: we could really use better/more documentation. 
We're good at the bare basics, "such-and-so syntax exists", but we're not as 
good at explaining how to solve problems using Drill features. The Learning 
Apache Drill book tries to address some holes. Clearly, if you have a hard time 
with this, being part of the team that created Drill, we've got a bit of work 
to do! (To be honest, neither Impala nor Presto are much better in the "how to" 
department.)

Additional use cases/frustrations are very welcome as you find them.

Thanks,
- Paul

    On Wednesday, January 15, 2020, 3:44:09 PM PST, Ted Dunning 
<[email protected]> wrote:  

 On Wed, Jan 15, 2020 at 2:58 PM Paul Rogers <[email protected]>
wrote:

> ...
>
> For example, Ted, you mention lack of nullability on structure members.
> But, Drill represents structures as MAPs, and MAPs can have nullable
> members. So, there is likely more to your request than the short summary
> suggests. Perhaps you can help us understand this a bit more.
>

This was quite a while ago.

I was reading JSON data with substructures of variable form.

I think, however, that this impression is old news. I just tried it and it
works the way I wanted.

Here is my data:

{"top":"a","nest":{"u":1, "v":"other"}}
{"top":"b","nest":{"v":"this", "w":"that"}}

And here are some queries that behave just the way that I wanted:

apache drill> *select* * *from* dfs.root.`/Users/tdunning/x.json`;

+-----+-------------------------+

| *top* | *        nest          * |

+-----+-------------------------+

| a  | {"u":1,"v":"other"}    |

| b  | {"v":"this","w":"that"} |

+-----+-------------------------+

2 rows selected (0.079 seconds)

apache drill> *select* nest *from* dfs.root.`/Users/tdunning/x.json`;

+-------------------------+

| *        nest          * |

+-------------------------+

| {"u":1,"v":"other"}    |

| {"v":"this","w":"that"} |

+-------------------------+

2 rows selected (0.114 seconds)

apache drill> *select* nest.u *from* dfs.root.`/Users/tdunning/x.json`;

Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 11: Table
'nest' not found

[Error Id: b2100faf-adf7-453e-957f-56726b96e06f ] (state=,code=0)

apache drill> *select* columns.nest.u *from* dfs.root.
`/Users/tdunning/x.json`;

Error: VALIDATION ERROR: From line 1, column 8 to line 1, column 14: Table
'columns' not found

[Error Id: a793e6bd-c2ed-477a-9f23-70d67b2b85df ] (state=,code=0)

apache drill> *select* x.nest.u *from* dfs.root.`/Users/tdunning/x.json` x;

+--------+

| *EXPR$0* |

+--------+

| 1      |

| null  |

+--------+

2 rows selected (0.126 seconds)
apache drill>

Re: About integration of drill and arrow

Reply via email to