Hello, I was trying to run some queries on a JSON document. I think I may
have discovered some bugs. I was using Drill 0.7.0.
This is the JSON document (test1.json):
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}
1) I think the parser got confused with the various "type" fields. I think
this query is valid as "j.type" is "donut" for the one and only row.
Although there are other "type" fields, I believe my query should have
worked.
select j.id id, j.name name, flatten(j.topping) tt,
flatten(j.batters.batter) bb from
dfs.root.`/Users/ashwin.jayaprakash/Downloads/apache-drill-0.7.0/sample/test1.json`
j where j.type = 'donut';
Query failed: Query failed: Failure while running fragment., Trying to
flatten a non-repeated filed.
2) The parser appears to be automatically converting "id" to a tinyint. I
suppose this is correct, but wanted your opinion on this.
select j.id id, j.name name, flatten(j.topping) tt,
flatten(j.batters.batter) bb from
dfs.root.`/Users/ashwin.jayaprakash/Downloads/apache-drill-0.7.0/sample/test1.json`
j where id = 'donut';
Query failed: Query failed: Failure while running fragment., index: -4,
length: 4 (expected: range(0, 16384))
3) Isn't there a way to filter the records before the flattening happens by
specifying that the path "j.topping.type" should only be "Sugar".
select j.id id, j.name name, flatten(j.topping) tt,
flatten(j.batters.batter) bb from
dfs.root.`/Users/ashwin.jayaprakash/Downloads/apache-drill-0.7.0/sample/test1.json`
j where j.topping.type = 'Sugar';
Query failed: Query failed: Failure while running fragment.,
org.apache.drill.exec.vector.complex.RepeatedMapVector cannot be cast to
org.apache.drill.exec.vector.complex.MapVector
4) Is there a length function supported on the nested arrays?
5) There is a spelling mistake in the error message :) "Trying to flatten a
non-repeated filed." - "filed"
Thanks.