> Begin forwarded message:
> 
> From: Charles Givre <[email protected]>
> Subject: Re: Use cases for DFDL
> Date: November 3, 2019 at 9:31:17 AM EST
> To: Julian Feinauer <[email protected]>
> Cc: "[email protected]" <[email protected]>, "Costello, Roger 
> L." <[email protected]>, [email protected], [email protected]
> 
> Hi Julian, 
> It seems like there is a beginning of convergence of the minds here.  I went 
> to the Apache Roadshow in DC and that was where I learned about DFDL and 
> immediately thought this was a really interesting possibility.
> 
> I'd love to see if we could foster some collaboration between the various 
> projects on this.  From the Drill side of things, it would make it SO much 
> easier to get Drill to read (and by extension query) various data types.  I'd 
> be willing to contribute time from the Drill side, but I definitely will need 
> help understanding how DFDL works.   
> 
> --C
> 
> 
> 
>> On Nov 3, 2019, at 8:01 AM, Julian Feinauer <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Hi Charles,
>>  
>> this is an interesting idea and in fact we also discussed the same matter 
>> for Calcite at ApacheCon NA.
>> But, I agree that it would be really powerful together with a complete 
>> Runtime like Drill.
>>  
>> Julian
>>  
>>  
>> Von: Charles Givre <[email protected] <mailto:[email protected]>>
>> Antworten an: "[email protected] <mailto:[email protected]>" 
>> <[email protected] <mailto:[email protected]>>
>> Datum: Mittwoch, 30. Oktober 2019 um 19:38
>> An: "Costello, Roger L." <[email protected] <mailto:[email protected]>>
>> Cc: "[email protected] <mailto:[email protected]>" 
>> <[email protected] <mailto:[email protected]>>
>> Betreff: Re: Use cases for DFDL
>>  
>> +1
>> 
>> 
>>> On Oct 30, 2019, at 2:36 PM, Costello, Roger L. <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>>  
>>> Excellent! Okay, here’s the use case:
>>>  
>>> A Daffodil extension could be created for Apache Drill so that you could 
>>> parse any kind of data with Daffodil using a DFDL schema, and then you 
>>> could use ANSI SQL to query the data, join it with other data, do analysis, 
>>> etc., just as if it came from a database. So, instead of parsing data to 
>>> XML and then using XPath to pull out data, you could instead parse data to 
>>> Apache Drill's data representation and then use ANSI SQL to pull out data, 
>>> and even combine it with other non-Daffodil data types. The advantage for 
>>> this would be that it would make it very easy to enable Drill to query new 
>>> data types (IE simply by using a DFDL schema) and it would enable users to 
>>> easily query this data without having to load it into another system.
>>>  
>>> How’s that Charles?
>>>  
>>> /Roger
>>> From: Charles Givre <[email protected] <mailto:[email protected]>> 
>>> Sent: Wednesday, October 30, 2019 2:28 PM
>>> To: Costello, Roger L. <[email protected] <mailto:[email protected]>>
>>> Cc: [email protected] <mailto:[email protected]>
>>> Subject: [EXT] Re: Use cases for DFDL
>>>  
>>> Close... One minor nit is that Drill doesn't use a "query-like" syntax. It 
>>> is regular ANSI SQL.  IMHO, I think this. would be a really great 
>>> collaboration of the two communities.
>>> --C
>>>  
>>> 
>>> 
>>> 
>>>> On Oct 30, 2019, at 1:10 PM, Costello, Roger L. <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>>  
>>>> Thanks again Charles. Is the following use case description correct?
>>>>  
>>>> A Daffodil extension could be created for Apache Drill so that you could 
>>>> parse any kind of data with Daffodil using a DFDL schema, and then you 
>>>> could use Apache Drill's query-like syntax and rich capabilities to query 
>>>> parts of that data, join it with other data, do analysis, etc., just as if 
>>>> it came from a database. So, instead of parsing data to XML and then using 
>>>> XPath to pull out data, you could instead parse data to Apache Drill's 
>>>> data representation and then use Drills rich data-query capabilities to 
>>>> pull out data, and even combine it with other non-Daffodil data types. The 
>>>> advantage for this would be that it would make it very easy to enable 
>>>> Drill to query new data types (IE simply by using a DFDL schema) and it 
>>>> would enable users to easily query this data without having to load it 
>>>> into another system.
>>>>  
>>>> Is that correct?
>>>>  
>>>> /Roger
>>>> From: Charles Givre <[email protected] <mailto:[email protected]>> 
>>>> Sent: Wednesday, October 30, 2019 12:19 PM
>>>> To: Costello, Roger L. <[email protected] <mailto:[email protected]>>
>>>> Cc: [email protected] <mailto:[email protected]>
>>>> Subject: [EXT] Re: Use cases for DFDL
>>>>  
>>>> Not exactly...
>>>> I was thinking of using DFDL to enable Drill to create a schema for data 
>>>> that Drill cannot read.  If DFDL can be used to describe the schema, a 
>>>> plugin could be written for Drill that mirrors this schema and ultimately 
>>>> reads the data files.  Drill wouldn't be populating any database, but 
>>>> rather directly querying the data.
>>>>  
>>>> The advantage for this would be that it would make it very easy to enable 
>>>> Drill to query new data types (IE simply by using a DFDL schema) and it 
>>>> would enable users to easily query this data w/o having to load it into 
>>>> another system.  Does that make sense?
>>>> -- C
>>>>  
>>>>  
>>>>> On Oct 30, 2019, at 12:13 PM, Costello, Roger L. <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>>  
>>>>> Thanks Charles. Let me see if I understand the use case correctly.
>>>>>  
>>>>> Use DFDL to parse data to populate a database and then use Apache Drill 
>>>>> to query the database.
>>>>>  
>>>>> Is that correct?
>>>>>  
>>>>> /Roger 
>>>>>  
>>>>> From: Charles Givre <[email protected] <mailto:[email protected]>> 
>>>>> Sent: Wednesday, October 30, 2019 12:01 PM
>>>>> To: [email protected] <mailto:[email protected]>
>>>>> Subject: [EXT] Re: Use cases for DFDL
>>>>>  
>>>>> To add to this discussion, I'm the PMC chair for Apache Drill.  I think a 
>>>>> compelling use case for DFDL would be enabling Drill to use DFDL to 
>>>>> enable Drill to query data based on a DFDL schema.  This same concept 
>>>>> could be applied to other SQL query engines such as Presto and/or Impala. 
>>>>>  
>>>>> IMHO, this would facilitate the analysis of data sets supported by DFDL. 
>>>>> -- C
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> On Oct 30, 2019, at 11:53 AM, Costello, Roger L. <[email protected] 
>>>>>> <mailto:[email protected]>> wrote:
>>>>>>  
>>>>>> Thanks Mike! I updated the slide:
>>>>>>  
>>>>>> <image002.png>
>>>>>>  
>>>>>> From: Beckerle, Mike <[email protected] 
>>>>>> <mailto:[email protected]>> 
>>>>>> Sent: Wednesday, October 30, 2019 11:45 AM
>>>>>> To: [email protected] <mailto:[email protected]>
>>>>>> Subject: [EXT] Re: Use cases for DFDL
>>>>>>  
>>>>>> I would not pick on RDF data stores as the target.
>>>>>>  
>>>>>> Parsing data to populate a database (any variety) is the actual case. 
>>>>>> The fact that we did do one project involving RDF is why I cited that 
>>>>>> example in particular but pulling data into any data store/data base 
>>>>>> begins with the ability to parse the data, and then process it into 
>>>>>> suitable form.
>>>>>>  
>>>>>> This is an incomplete list so perhaps this slide title should be 
>>>>>> "Example Use Cases for DFDL" ?
>>>>>>  
>>>>>> ...mikeb
>>>>>> From: Costello, Roger L. <[email protected] <mailto:[email protected]>>
>>>>>> Sent: Monday, October 28, 2019 10:41 AM
>>>>>> To: [email protected] <mailto:[email protected]> 
>>>>>> <[email protected] <mailto:[email protected]>>
>>>>>> Subject: Use cases for DFDL
>>>>>>  
>>>>>> Hi Folks,
>>>>>>  
>>>>>> I created a slide of use cases. See below. Do you agree with the slide? 
>>>>>> Anything you would add, delete, or change?  /Roger
>>>>>>  
>>>>>> <image003.png>
> 

Reply via email to