Some issues with nested (JSON) queries

Ashwin Jayaprakash Thu, 15 Jan 2015 08:29:09 -0800

Hello, I was trying to run some queries on a JSON document. I think I may
have discovered some bugs. I was using Drill 0.7.0.


This is the JSON document (test1.json):

{
    "id": "0001",
    "type": "donut",
    "name": "Cake",
    "ppu": 0.55,
    "batters":
        {
            "batter":
                [
                    { "id": "1001", "type": "Regular" },
                    { "id": "1002", "type": "Chocolate" },
                    { "id": "1003", "type": "Blueberry" },
                    { "id": "1004", "type": "Devil's Food" }
                ]
        },
    "topping":
        [
            { "id": "5001", "type": "None" },
            { "id": "5002", "type": "Glazed" },
            { "id": "5005", "type": "Sugar" },
            { "id": "5007", "type": "Powdered Sugar" },
            { "id": "5006", "type": "Chocolate with Sprinkles" },
            { "id": "5003", "type": "Chocolate" },
            { "id": "5004", "type": "Maple" }
        ]
}


1) I think the parser got confused with the various "type" fields. I think
this query is valid as "j.type" is "donut" for the one and only row.
Although there are other "type" fields, I believe my query should have
worked.

select j.id id, j.name name, flatten(j.topping) tt,
flatten(j.batters.batter) bb from
dfs.root.`/Users/ashwin.jayaprakash/Downloads/apache-drill-0.7.0/sample/test1.json`
j where j.type = 'donut';
Query failed: Query failed: Failure while running fragment., Trying to
flatten a non-repeated filed.


2) The parser appears to be automatically converting "id" to a tinyint. I
suppose this is correct, but wanted your opinion on this.

select j.id id, j.name name, flatten(j.topping) tt,
flatten(j.batters.batter) bb from
dfs.root.`/Users/ashwin.jayaprakash/Downloads/apache-drill-0.7.0/sample/test1.json`
j where id = 'donut';
Query failed: Query failed: Failure while running fragment., index: -4,
length: 4 (expected: range(0, 16384))


3) Isn't there a way to filter the records before the flattening happens by
specifying that the path "j.topping.type" should only be "Sugar".

select j.id id, j.name name, flatten(j.topping) tt,
flatten(j.batters.batter) bb from
dfs.root.`/Users/ashwin.jayaprakash/Downloads/apache-drill-0.7.0/sample/test1.json`
j where j.topping.type = 'Sugar';
Query failed: Query failed: Failure while running fragment.,
org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be cast to
org.apache.drill.exec.vector.complex.MapVector

4) Is there a length function supported on the nested arrays?

5) There is a spelling mistake in the error message :) "Trying to flatten a
non-repeated filed." - "filed"

Thanks.

Some issues with nested (JSON) queries

Reply via email to