> Begin forwarded message: > > From: Charles Givre <[email protected]> > Subject: Re: Use cases for DFDL > Date: November 3, 2019 at 9:31:17 AM EST > To: Julian Feinauer <[email protected]> > Cc: "[email protected]" <[email protected]>, "Costello, Roger > L." <[email protected]>, [email protected], [email protected] > > Hi Julian, > It seems like there is a beginning of convergence of the minds here. I went > to the Apache Roadshow in DC and that was where I learned about DFDL and > immediately thought this was a really interesting possibility. > > I'd love to see if we could foster some collaboration between the various > projects on this. From the Drill side of things, it would make it SO much > easier to get Drill to read (and by extension query) various data types. I'd > be willing to contribute time from the Drill side, but I definitely will need > help understanding how DFDL works. > > --C > > > >> On Nov 3, 2019, at 8:01 AM, Julian Feinauer <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi Charles, >> >> this is an interesting idea and in fact we also discussed the same matter >> for Calcite at ApacheCon NA. >> But, I agree that it would be really powerful together with a complete >> Runtime like Drill. >> >> Julian >> >> >> Von: Charles Givre <[email protected] <mailto:[email protected]>> >> Antworten an: "[email protected] <mailto:[email protected]>" >> <[email protected] <mailto:[email protected]>> >> Datum: Mittwoch, 30. Oktober 2019 um 19:38 >> An: "Costello, Roger L." <[email protected] <mailto:[email protected]>> >> Cc: "[email protected] <mailto:[email protected]>" >> <[email protected] <mailto:[email protected]>> >> Betreff: Re: Use cases for DFDL >> >> +1 >> >> >>> On Oct 30, 2019, at 2:36 PM, Costello, Roger L. <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Excellent! Okay, here’s the use case: >>> >>> A Daffodil extension could be created for Apache Drill so that you could >>> parse any kind of data with Daffodil using a DFDL schema, and then you >>> could use ANSI SQL to query the data, join it with other data, do analysis, >>> etc., just as if it came from a database. So, instead of parsing data to >>> XML and then using XPath to pull out data, you could instead parse data to >>> Apache Drill's data representation and then use ANSI SQL to pull out data, >>> and even combine it with other non-Daffodil data types. The advantage for >>> this would be that it would make it very easy to enable Drill to query new >>> data types (IE simply by using a DFDL schema) and it would enable users to >>> easily query this data without having to load it into another system. >>> >>> How’s that Charles? >>> >>> /Roger >>> From: Charles Givre <[email protected] <mailto:[email protected]>> >>> Sent: Wednesday, October 30, 2019 2:28 PM >>> To: Costello, Roger L. <[email protected] <mailto:[email protected]>> >>> Cc: [email protected] <mailto:[email protected]> >>> Subject: [EXT] Re: Use cases for DFDL >>> >>> Close... One minor nit is that Drill doesn't use a "query-like" syntax. It >>> is regular ANSI SQL. IMHO, I think this. would be a really great >>> collaboration of the two communities. >>> --C >>> >>> >>> >>> >>>> On Oct 30, 2019, at 1:10 PM, Costello, Roger L. <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Thanks again Charles. Is the following use case description correct? >>>> >>>> A Daffodil extension could be created for Apache Drill so that you could >>>> parse any kind of data with Daffodil using a DFDL schema, and then you >>>> could use Apache Drill's query-like syntax and rich capabilities to query >>>> parts of that data, join it with other data, do analysis, etc., just as if >>>> it came from a database. So, instead of parsing data to XML and then using >>>> XPath to pull out data, you could instead parse data to Apache Drill's >>>> data representation and then use Drills rich data-query capabilities to >>>> pull out data, and even combine it with other non-Daffodil data types. The >>>> advantage for this would be that it would make it very easy to enable >>>> Drill to query new data types (IE simply by using a DFDL schema) and it >>>> would enable users to easily query this data without having to load it >>>> into another system. >>>> >>>> Is that correct? >>>> >>>> /Roger >>>> From: Charles Givre <[email protected] <mailto:[email protected]>> >>>> Sent: Wednesday, October 30, 2019 12:19 PM >>>> To: Costello, Roger L. <[email protected] <mailto:[email protected]>> >>>> Cc: [email protected] <mailto:[email protected]> >>>> Subject: [EXT] Re: Use cases for DFDL >>>> >>>> Not exactly... >>>> I was thinking of using DFDL to enable Drill to create a schema for data >>>> that Drill cannot read. If DFDL can be used to describe the schema, a >>>> plugin could be written for Drill that mirrors this schema and ultimately >>>> reads the data files. Drill wouldn't be populating any database, but >>>> rather directly querying the data. >>>> >>>> The advantage for this would be that it would make it very easy to enable >>>> Drill to query new data types (IE simply by using a DFDL schema) and it >>>> would enable users to easily query this data w/o having to load it into >>>> another system. Does that make sense? >>>> -- C >>>> >>>> >>>>> On Oct 30, 2019, at 12:13 PM, Costello, Roger L. <[email protected] >>>>> <mailto:[email protected]>> wrote: >>>>> >>>>> Thanks Charles. Let me see if I understand the use case correctly. >>>>> >>>>> Use DFDL to parse data to populate a database and then use Apache Drill >>>>> to query the database. >>>>> >>>>> Is that correct? >>>>> >>>>> /Roger >>>>> >>>>> From: Charles Givre <[email protected] <mailto:[email protected]>> >>>>> Sent: Wednesday, October 30, 2019 12:01 PM >>>>> To: [email protected] <mailto:[email protected]> >>>>> Subject: [EXT] Re: Use cases for DFDL >>>>> >>>>> To add to this discussion, I'm the PMC chair for Apache Drill. I think a >>>>> compelling use case for DFDL would be enabling Drill to use DFDL to >>>>> enable Drill to query data based on a DFDL schema. This same concept >>>>> could be applied to other SQL query engines such as Presto and/or Impala. >>>>> >>>>> IMHO, this would facilitate the analysis of data sets supported by DFDL. >>>>> -- C >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> On Oct 30, 2019, at 11:53 AM, Costello, Roger L. <[email protected] >>>>>> <mailto:[email protected]>> wrote: >>>>>> >>>>>> Thanks Mike! I updated the slide: >>>>>> >>>>>> <image002.png> >>>>>> >>>>>> From: Beckerle, Mike <[email protected] >>>>>> <mailto:[email protected]>> >>>>>> Sent: Wednesday, October 30, 2019 11:45 AM >>>>>> To: [email protected] <mailto:[email protected]> >>>>>> Subject: [EXT] Re: Use cases for DFDL >>>>>> >>>>>> I would not pick on RDF data stores as the target. >>>>>> >>>>>> Parsing data to populate a database (any variety) is the actual case. >>>>>> The fact that we did do one project involving RDF is why I cited that >>>>>> example in particular but pulling data into any data store/data base >>>>>> begins with the ability to parse the data, and then process it into >>>>>> suitable form. >>>>>> >>>>>> This is an incomplete list so perhaps this slide title should be >>>>>> "Example Use Cases for DFDL" ? >>>>>> >>>>>> ...mikeb >>>>>> From: Costello, Roger L. <[email protected] <mailto:[email protected]>> >>>>>> Sent: Monday, October 28, 2019 10:41 AM >>>>>> To: [email protected] <mailto:[email protected]> >>>>>> <[email protected] <mailto:[email protected]>> >>>>>> Subject: Use cases for DFDL >>>>>> >>>>>> Hi Folks, >>>>>> >>>>>> I created a slide of use cases. See below. Do you agree with the slide? >>>>>> Anything you would add, delete, or change? /Roger >>>>>> >>>>>> <image003.png> >
