Re: msgpack format reader with schema learning feature

2018-10-11 Thread Jean-Claude Cote
I'm pretty sure I can have the type of the list. Here's an example of the schema I use. The root map correspond to the row. Notice how the arrayOfArray says it is a LIST:REPEATED. But if you drill down and ask for the first child of that MaterializedField you get the inner array named $data$ and

Re: msgpack format reader with schema learning feature

2018-10-11 Thread Arina Yelchiyeva
Paul, sounds good. I like the idea of mock scanner being done first, since besides csv and json, other readers would have to be updated as well. Could you please share Jira number(-s) if any so I can follow them? Kind regards, Arina On Thu, Oct 11, 2018 at 8:52 AM Paul Rogers wrote: > Hi JC,

Re: msgpack format reader with schema learning feature

2018-10-10 Thread Paul Rogers
Hi JC, Drill's complex types can be a bit confusing. Note that, in your example, for the REPEATED BIGINT, we know that this is an array (REPEATED) and we know the type of each element (BIGINT). But, that REPEATED LIST, it is a list of ... what? The element type is missing. This is not the

Re: msgpack format reader with schema learning feature

2018-10-10 Thread Paul Rogers
Hi Arina, Very glad to hear the schema work is moving ahead. IMHO, the Drill schema effort might want to start simple and grow. I think JC has indicated the simplest possible solution: provide a schema file to accompany a table. JC has also showed the next step up the chain: create that file

Re: msgpack format reader with schema learning feature

2018-10-10 Thread Jean-Claude Cote
Hey Paul, You mentionned that "But, for a LIST, the Materialized field does not include the child types" However MaterializedField do have type information for child types. You can see it in this example. I think it has all relevant information. Anyways all test cases I've tried so far are

Re: msgpack format reader with schema learning feature

2018-10-10 Thread Arina Yelchiyeva
Somehow this is correlates with two projects which are currently actively being investigated / prototyping: 1. Drill metastore (DRILL-6552) 2. Providing schema from the query (casts, hints). The second one will allow to provide schema using hints, as well as from the file. Regarding how to use

Re: msgpack format reader with schema learning feature

2018-10-09 Thread Paul Rogers
Hi JC, Very cool indeed. You are the man! Ted's been advocating for this approach for as long as I can remember (2+ years). You're well on your way to solving the JSON problems that I documented a while back in DRILL-4710 and summarize as "Drill can't predict the future." Basically, without a