Re: Question about Drill internal data representation for Daffodil tree infosets

2023-10-11 Thread Paul Rogers
Mike,

Just to echo Charles, thanks for the work; sounds like you are making good
progress.

The question you asked is tricky. Charles is right, the type of the data
structure is a map. The output you showed appears to be from  the sqlline
tool. If so, then it helps to understand that sqlline "cheats" by
converting maps to strings for display, making it look like you have a
string column.

Also, remember that Drill uses the standard JSON structure internally, just
as you described. However, referencing any column projects it to the top
level. Clients don't understand complex JSON types (maps, arrays, etc.
Sqlline compensates by converting the data to strings for display.

- Paul

On Tue, Oct 10, 2023 at 12:55 PM Charles Givre  wrote:

> Hi Mike,
> Thanks for all the work you are doing on Drill.
>
> To answer your question, sub1 should be treated as a map in Drill.  You
> can verify this with the following query:
>
> SELECT drillTypeOf(sub1) FROM...
>
> In general, I'm pretty sure that Drill doesn't output strings that look
> like JSON objects unless they actually are complex objects.
>
> Take a look here for data type functions:
> https://drill.apache.org/docs/data-type-functions/
> Best,
> -- C
>
>
> > On Oct 10, 2023, at 7:56 AM, Mike Beckerle  wrote:
> >
> > I am trying to understand the options for populating Drill data from a
> > Daffodil data parse.
> >
> > Suppose you have this JSON
> >
> > {"parent": { "sub1": { "a1":1, "a2":2}, sub2:{"b1":3, "b2":4, "b3":5}}}
> >
> > or this equivalent XML:
> >
> > 
> >  12
> >  345
> > 
> >
> > Unlike those texts, Daffodil is going to have a tree data structure
> where a
> > parent node contains two child nodes sub1 and sub2, and each of those has
> > children a1, a2, and b1, b2, b3 respectively.
> > It's analogous roughly to the DOM tree of the XML, or the tree of nested
> > JSON map nodes you'd get back from a JSON parse of that text.
> >
> > In Drill to query the JSON like:
> >
> > select parent.sub1 from myStructure
> >
> > gives you back single column containing what seems to be a string like
> >
> > |sub1|
> > --
> > | { "a1":1, "a2":2}  |
> >
> > So, my question is this. Is this actually a string in Drill, (what is the
> > type of sub1?) or is sub1 actually a Drill data row/map node value with
> two
> > node children, that just happens to print out looking like a JSON string?
> >
> > Thanks for any insight here.
> >
> > Mike Beckerle
> > Apache Daffodil PMC | daffodil.apache.org
> > OGF DFDL Workgroup Co-Chair |
> www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
> > Owl Cyber Defense | www.owlcyberdefense.com
>
>


Re: Question about Drill internal data representation for Daffodil tree infosets

2023-10-10 Thread Charles Givre
Hi Mike, 
Thanks for all the work you are doing on Drill.

To answer your question, sub1 should be treated as a map in Drill.  You can 
verify this with the following query:

SELECT drillTypeOf(sub1) FROM...

In general, I'm pretty sure that Drill doesn't output strings that look like 
JSON objects unless they actually are complex objects.

Take a look here for data type functions:  
https://drill.apache.org/docs/data-type-functions/
Best,
-- C


> On Oct 10, 2023, at 7:56 AM, Mike Beckerle  wrote:
> 
> I am trying to understand the options for populating Drill data from a
> Daffodil data parse.
> 
> Suppose you have this JSON
> 
> {"parent": { "sub1": { "a1":1, "a2":2}, sub2:{"b1":3, "b2":4, "b3":5}}}
> 
> or this equivalent XML:
> 
> 
>  12
>  345
> 
> 
> Unlike those texts, Daffodil is going to have a tree data structure where a
> parent node contains two child nodes sub1 and sub2, and each of those has
> children a1, a2, and b1, b2, b3 respectively.
> It's analogous roughly to the DOM tree of the XML, or the tree of nested
> JSON map nodes you'd get back from a JSON parse of that text.
> 
> In Drill to query the JSON like:
> 
> select parent.sub1 from myStructure
> 
> gives you back single column containing what seems to be a string like
> 
> |sub1|
> --
> | { "a1":1, "a2":2}  |
> 
> So, my question is this. Is this actually a string in Drill, (what is the
> type of sub1?) or is sub1 actually a Drill data row/map node value with two
> node children, that just happens to print out looking like a JSON string?
> 
> Thanks for any insight here.
> 
> Mike Beckerle
> Apache Daffodil PMC | daffodil.apache.org
> OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
> Owl Cyber Defense | www.owlcyberdefense.com



Question about Drill internal data representation for Daffodil tree infosets

2023-10-10 Thread Mike Beckerle
I am trying to understand the options for populating Drill data from a
Daffodil data parse.

Suppose you have this JSON

{"parent": { "sub1": { "a1":1, "a2":2}, sub2:{"b1":3, "b2":4, "b3":5}}}

or this equivalent XML:


  12
  345


Unlike those texts, Daffodil is going to have a tree data structure where a
parent node contains two child nodes sub1 and sub2, and each of those has
children a1, a2, and b1, b2, b3 respectively.
It's analogous roughly to the DOM tree of the XML, or the tree of nested
JSON map nodes you'd get back from a JSON parse of that text.

In Drill to query the JSON like:

select parent.sub1 from myStructure

gives you back single column containing what seems to be a string like

|sub1|
--
| { "a1":1, "a2":2}  |

So, my question is this. Is this actually a string in Drill, (what is the
type of sub1?) or is sub1 actually a Drill data row/map node value with two
node children, that just happens to print out looking like a JSON string?

Thanks for any insight here.

Mike Beckerle
Apache Daffodil PMC | daffodil.apache.org
OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl
Owl Cyber Defense | www.owlcyberdefense.com