Hi Mike,
Earlier on, there were two approaches discussed:
1. Using a Daffodil schema to map to a Drill schema, and use Drill's
existing schema mechanisms for all of Drill's existing input formats.
2. Using a Daffodil-specific reader so that Daffodil does the data parsing.
Some of my earlier
Got it. I’ll review today and tomorrow and hopefully we can get you unblocked.
Sent from my iPhone
> On Oct 18, 2023, at 18:01, Mike Beckerle wrote:
>
> I am very much hoping someone will look at my open PR soon.
> https://github.com/apache/drill/pull/2836
>
> I am basically blocked on
I am very much hoping someone will look at my open PR soon.
https://github.com/apache/drill/pull/2836
I am basically blocked on this effort until you help me with one key area
of that.
I expect the part I am puzzling over is routine to you, so it will save me
much effort.
This is the key area
Hi Charles,
The persistent store is just ZooKeeper, and ZK is known to work poorly as a
distributed DB. ZK works great for things like tokens, node registrations
and the like. But, ZK scales very poorly for things like schemas (or query
profiles or a list of active queries.)
A more scalable
Hi Mike,
I hope all is well. I remembered one other piece which might be useful for
you. Drill has an interface called a PersistentStore which is used for storing
artifacts such as tokens etc. I've uesd it on two occasions: in the
GoogleSheets plugin and the Http plugin. In both cases, I
Very helpful.
Answers to your questions, and comments are below:
On Thu, Oct 12, 2023 at 5:14 PM Charles Givre wrote:
> HI Mike,
> I hope all is well. I'll take a stab at answering your questions. But I
> have a few questions as well:
>
>
1. Are you writing a storage or format plugin for
One more thought... As a suggestion, I'd recommend getting the batch reader to
first work with the DFDL schema file. Once that's done, Paul and I can assist
with caching, metastores etc.
-- C
> On Oct 12, 2023, at 5:13 PM, Charles Givre wrote:
>
> HI Mike,
> I hope all is well. I'll
HI Mike,
I hope all is well. I'll take a stab at answering your questions. But I have
a few questions as well:
1. Are you writing a storage or format plugin for DFDL? My thinking was that
this would be a format plugin, but let me know if you were thinking differently
2. In traditional
Mike,
Excellent progress! Very impressive.
So now you are talking about the planning side of things. There are
multiple ways this could be done. Let's start with some basics. Recall that
Drill is distributed: a file can be in S3 or old-school HDFS (along with
other variations). When Drill is run
So when a data format is described by a DFDL schema, I can generate
equivalent Drill schema (TupleMetadata). This schema is always complete. I
have unit tests working with this.
To do this for a real SQL query, I need the DFDL schema to be identified on
the SQL query by a file path or URI.
Q:
10 matches
Mail list logo