Re: Drill TupleMetadata created from DFDL Schema - how do I inform Drill about it

2023-10-18 Thread Paul Rogers
Hi Mike, Earlier on, there were two approaches discussed: 1. Using a Daffodil schema to map to a Drill schema, and use Drill's existing schema mechanisms for all of Drill's existing input formats. 2. Using a Daffodil-specific reader so that Daffodil does the data parsing. Some of my earlier

Re: Drill TupleMetadata created from DFDL Schema - how do I inform Drill about it

2023-10-18 Thread Charles Givre
Got it. I’ll review today and tomorrow and hopefully we can get you unblocked. Sent from my iPhone > On Oct 18, 2023, at 18:01, Mike Beckerle wrote: > > I am very much hoping someone will look at my open PR soon. > https://github.com/apache/drill/pull/2836 > > I am basically blocked on

Re: Drill TupleMetadata created from DFDL Schema - how do I inform Drill about it

2023-10-18 Thread Mike Beckerle
I am very much hoping someone will look at my open PR soon. https://github.com/apache/drill/pull/2836 I am basically blocked on this effort until you help me with one key area of that. I expect the part I am puzzling over is routine to you, so it will save me much effort. This is the key area

Re: Drill TupleMetadata created from DFDL Schema - how do I inform Drill about it

2023-10-18 Thread Paul Rogers
Hi Charles, The persistent store is just ZooKeeper, and ZK is known to work poorly as a distributed DB. ZK works great for things like tokens, node registrations and the like. But, ZK scales very poorly for things like schemas (or query profiles or a list of active queries.) A more scalable

Re: Drill TupleMetadata created from DFDL Schema - how do I inform Drill about it

2023-10-18 Thread Charles Givre
Hi Mike, I hope all is well. I remembered one other piece which might be useful for you. Drill has an interface called a PersistentStore which is used for storing artifacts such as tokens etc. I've uesd it on two occasions: in the GoogleSheets plugin and the Http plugin. In both cases, I

Re: Drill TupleMetadata created from DFDL Schema - how do I inform Drill about it

2023-10-13 Thread Mike Beckerle
Very helpful. Answers to your questions, and comments are below: On Thu, Oct 12, 2023 at 5:14 PM Charles Givre wrote: > HI Mike, > I hope all is well. I'll take a stab at answering your questions. But I > have a few questions as well: > > 1. Are you writing a storage or format plugin for

Re: Drill TupleMetadata created from DFDL Schema - how do I inform Drill about it

2023-10-12 Thread Charles Givre
One more thought... As a suggestion, I'd recommend getting the batch reader to first work with the DFDL schema file. Once that's done, Paul and I can assist with caching, metastores etc. -- C > On Oct 12, 2023, at 5:13 PM, Charles Givre wrote: > > HI Mike, > I hope all is well. I'll

Re: Drill TupleMetadata created from DFDL Schema - how do I inform Drill about it

2023-10-12 Thread Charles Givre
HI Mike, I hope all is well. I'll take a stab at answering your questions. But I have a few questions as well: 1. Are you writing a storage or format plugin for DFDL? My thinking was that this would be a format plugin, but let me know if you were thinking differently 2. In traditional

Re: Drill TupleMetadata created from DFDL Schema - how do I inform Drill about it

2023-10-12 Thread Paul Rogers
Mike, Excellent progress! Very impressive. So now you are talking about the planning side of things. There are multiple ways this could be done. Let's start with some basics. Recall that Drill is distributed: a file can be in S3 or old-school HDFS (along with other variations). When Drill is run

Drill TupleMetadata created from DFDL Schema - how do I inform Drill about it

2023-10-12 Thread Mike Beckerle
So when a data format is described by a DFDL schema, I can generate equivalent Drill schema (TupleMetadata). This schema is always complete. I have unit tests working with this. To do this for a real SQL query, I need the DFDL schema to be identified on the SQL query by a file path or URI. Q: