Re: Question about Drill internal data representation for Daffodil tree infosets
Mike, Just to echo Charles, thanks for the work; sounds like you are making good progress. The question you asked is tricky. Charles is right, the type of the data structure is a map. The output you showed appears to be from the sqlline tool. If so, then it helps to understand that sqlline "cheats" by converting maps to strings for display, making it look like you have a string column. Also, remember that Drill uses the standard JSON structure internally, just as you described. However, referencing any column projects it to the top level. Clients don't understand complex JSON types (maps, arrays, etc. Sqlline compensates by converting the data to strings for display. - Paul On Tue, Oct 10, 2023 at 12:55 PM Charles Givre wrote: > Hi Mike, > Thanks for all the work you are doing on Drill. > > To answer your question, sub1 should be treated as a map in Drill. You > can verify this with the following query: > > SELECT drillTypeOf(sub1) FROM... > > In general, I'm pretty sure that Drill doesn't output strings that look > like JSON objects unless they actually are complex objects. > > Take a look here for data type functions: > https://drill.apache.org/docs/data-type-functions/ > Best, > -- C > > > > On Oct 10, 2023, at 7:56 AM, Mike Beckerle wrote: > > > > I am trying to understand the options for populating Drill data from a > > Daffodil data parse. > > > > Suppose you have this JSON > > > > {"parent": { "sub1": { "a1":1, "a2":2}, sub2:{"b1":3, "b2":4, "b3":5}}} > > > > or this equivalent XML: > > > > > > 12 > > 345 > > > > > > Unlike those texts, Daffodil is going to have a tree data structure > where a > > parent node contains two child nodes sub1 and sub2, and each of those has > > children a1, a2, and b1, b2, b3 respectively. > > It's analogous roughly to the DOM tree of the XML, or the tree of nested > > JSON map nodes you'd get back from a JSON parse of that text. > > > > In Drill to query the JSON like: > > > > select parent.sub1 from myStructure > > > > gives you back single column containing what seems to be a string like > > > > |sub1| > > -- > > | { "a1":1, "a2":2} | > > > > So, my question is this. Is this actually a string in Drill, (what is the > > type of sub1?) or is sub1 actually a Drill data row/map node value with > two > > node children, that just happens to print out looking like a JSON string? > > > > Thanks for any insight here. > > > > Mike Beckerle > > Apache Daffodil PMC | daffodil.apache.org > > OGF DFDL Workgroup Co-Chair | > www.ogf.org/ogf/doku.php/standards/dfdl/dfdl > > Owl Cyber Defense | www.owlcyberdefense.com > >
Re: Question about Drill internal data representation for Daffodil tree infosets
Hi Mike, Thanks for all the work you are doing on Drill. To answer your question, sub1 should be treated as a map in Drill. You can verify this with the following query: SELECT drillTypeOf(sub1) FROM... In general, I'm pretty sure that Drill doesn't output strings that look like JSON objects unless they actually are complex objects. Take a look here for data type functions: https://drill.apache.org/docs/data-type-functions/ Best, -- C > On Oct 10, 2023, at 7:56 AM, Mike Beckerle wrote: > > I am trying to understand the options for populating Drill data from a > Daffodil data parse. > > Suppose you have this JSON > > {"parent": { "sub1": { "a1":1, "a2":2}, sub2:{"b1":3, "b2":4, "b3":5}}} > > or this equivalent XML: > > > 12 > 345 > > > Unlike those texts, Daffodil is going to have a tree data structure where a > parent node contains two child nodes sub1 and sub2, and each of those has > children a1, a2, and b1, b2, b3 respectively. > It's analogous roughly to the DOM tree of the XML, or the tree of nested > JSON map nodes you'd get back from a JSON parse of that text. > > In Drill to query the JSON like: > > select parent.sub1 from myStructure > > gives you back single column containing what seems to be a string like > > |sub1| > -- > | { "a1":1, "a2":2} | > > So, my question is this. Is this actually a string in Drill, (what is the > type of sub1?) or is sub1 actually a Drill data row/map node value with two > node children, that just happens to print out looking like a JSON string? > > Thanks for any insight here. > > Mike Beckerle > Apache Daffodil PMC | daffodil.apache.org > OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl > Owl Cyber Defense | www.owlcyberdefense.com
Question about Drill internal data representation for Daffodil tree infosets
I am trying to understand the options for populating Drill data from a Daffodil data parse. Suppose you have this JSON {"parent": { "sub1": { "a1":1, "a2":2}, sub2:{"b1":3, "b2":4, "b3":5}}} or this equivalent XML: 12 345 Unlike those texts, Daffodil is going to have a tree data structure where a parent node contains two child nodes sub1 and sub2, and each of those has children a1, a2, and b1, b2, b3 respectively. It's analogous roughly to the DOM tree of the XML, or the tree of nested JSON map nodes you'd get back from a JSON parse of that text. In Drill to query the JSON like: select parent.sub1 from myStructure gives you back single column containing what seems to be a string like |sub1| -- | { "a1":1, "a2":2} | So, my question is this. Is this actually a string in Drill, (what is the type of sub1?) or is sub1 actually a Drill data row/map node value with two node children, that just happens to print out looking like a JSON string? Thanks for any insight here. Mike Beckerle Apache Daffodil PMC | daffodil.apache.org OGF DFDL Workgroup Co-Chair | www.ogf.org/ogf/doku.php/standards/dfdl/dfdl Owl Cyber Defense | www.owlcyberdefense.com