[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2017-04-05 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957331#comment-15957331
 ] 

Rahul Challapalli commented on DRILL-3562:
--

Thanks for the analysis [~arina]

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2017-04-05 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956645#comment-15956645
 ] 

Arina Ielchiieva commented on DRILL-3562:
-

I see, one more point then. DRILL-3562 made changes in JsonReader and 
FlattenRecordBatch classes. If there are no empty arrays in json files, only 
changes in FlattenRecordBatch may have had influence.

Error message from DRILL-5399 "Flatten does not support inputs of non-list 
values." may be thrown in the two places:
1. In 
https://github.com/apache/drill/blob/ddcf89548bd33c0cd3e062f1f6d5027eed822372/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java#L282
 but it is before code changes in DRILL-3562.
2. In 
https://github.com/apache/drill/blob/ddcf89548bd33c0cd3e062f1f6d5027eed822372/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java#L139
 but this one is connected with value vector which is taken from the incoming 
batch but not from the FlattenRecordBatch where changes were made.

Also there is [a 
check|https://github.com/apache/drill/blob/ddcf89548bd33c0cd3e062f1f6d5027eed822372/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java#L321]
 in FlattenRecordBatch which won't pass data from queries in DRILL-5399 to 
changes made in DRILL-3562. So far I don't see any relation between  DRILL-3652 
and DRILL-5399.

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2017-04-04 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955358#comment-15955358
 ] 

Rahul Challapalli commented on DRILL-3562:
--

[~arina] The output of kvgen function can be an empty array in the examples 
presented in DRILL-5399. 

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2017-04-04 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954884#comment-15954884
 ] 

Arina Ielchiieva commented on DRILL-3562:
-

I am not sure that this Jira could be related to the issue describe in 
DRILL-5399. As far as I understand, changes in DRILL-3652 only apply when json 
has empty fields, on the contrary all data sets used in DRILL-5399 do not 
contain empty values.

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2017-04-03 Thread Rahul Challapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953920#comment-15953920
 ] 

Rahul Challapalli commented on DRILL-3562:
--

[~arina] & [~amansinha100] I tested the case reported in this jira and also few 
other related cases. No issues found. However I observed that DRILL-5399 is 
happening more frequently in 1.10.0 compared to 1.9.0. Can this fix be related 
to that issue in one way or the other? If you think there is no relation, we 
can close this issue. 

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2017-03-31 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950596#comment-15950596
 ] 

Khurram Faraaz commented on DRILL-3562:
---

[~arina] thanks for confirming. Verified that SQL reported in this JIRA returns 
correct results on Drill 1.10.0
Test added here framework/resources/Functional/json/json_storage/drill_3562.q

{noformat}
0: jdbc:drill:schema=dfs.tmp> select count(*) from (select FLATTEN(t.a.b.c) AS 
c from `empty_array.json` t) flat WHERE flat.c.d.e = 'f' ;
+-+
| EXPR$0  |
+-+
| 1   |
+-+
1 row selected (0.241 seconds)
{noformat}

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2017-03-31 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950582#comment-15950582
 ] 

Arina Ielchiieva commented on DRILL-3562:
-

Yes, it does. This behavior is expected in unit test 
TestJsonReader.testFlattenEmptyArrayWithAllTextMode.

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2017-03-31 Thread Khurram Faraaz (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15950452#comment-15950452
 ] 

Khurram Faraaz commented on DRILL-3562:
---

[~arina] Is this the expected result for the second SQL below ?

{noformat}
0: jdbc:drill:schema=dfs.tmp> select * from `drill_3562.json`;
+-+
|a|
+-+
| {"b":{"c":[]}}  |
+-+
1 row selected (0.138 seconds)
0: jdbc:drill:schema=dfs.tmp> select FLATTEN(t.a.b.c) AS c from 
`drill_3562.json` t;
++
| c  |
++
++
No rows selected (0.181 seconds)
{noformat}

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
> Fix For: 1.10.0
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2017-01-11 Thread Serhii Harnyk (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15818118#comment-15818118
 ] 

Serhii Harnyk commented on DRILL-3562:
--

Besides the initialization of empty arrays we have the problem with ordering of 
columns with arrays. 
Query 
{code}
select * from example 
{code}
for Json
{noformat}
{ "a": [], "c": [], "c1": 1 }
{ "a": [1], "c": [1], "c1": 1 }
{noformat}
returns result
{noformat}
---
| c1| a | c |
---
| 1   | []  | []  |
| 1   | [1] | [1] |
---
{noformat}
with wrong columns order.

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
> Fix For: Future
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2017-01-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15810598#comment-15810598
 ] 

ASF GitHub Bot commented on DRILL-3562:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/713#discussion_r95100719
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/vector/complex/fn/JsonReader.java
 ---
@@ -59,6 +59,12 @@
   private final boolean readNumbersAsDouble;
 
   /**
+   * Collection for tracking empty array writers during reading
+   * and storing them for initializing empty arrays
+   */
+  private final Set emptyArrayWritersSet = Sets.newHashSet();
--- End diff --

Any reason why this needs to be HashSet rather than just a List ?  The 
HashSet may change the insertion order, so if you had 2 or more empty arrays in 
the same Json doc within one list,  the output of Flatten could end up changing 
the order.  


> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
>  Labels: ready-to-commit
> Fix For: Future
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2017-01-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15801992#comment-15801992
 ] 

ASF GitHub Bot commented on DRILL-3562:
---

Github user paul-rogers commented on a diff in the pull request:

https://github.com/apache/drill/pull/713#discussion_r94814315
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/flatten/FlattenRecordBatch.java
 ---
@@ -305,12 +306,23 @@ protected boolean setupNewSchema() throws 
SchemaChangeException {
 
 final NamedExpression flattenExpr = new 
NamedExpression(popConfig.getColumn(), new 
FieldReference(popConfig.getColumn()));
 final ValueVectorReadExpression vectorRead = 
(ValueVectorReadExpression)ExpressionTreeMaterializer.materialize(flattenExpr.getExpr(),
 incoming, collector, context.getFunctionRegistry(), true);
-final TransferPair tp = 
getFlattenFieldTransferPair(flattenExpr.getRef());
-
-if (tp != null) {
-  transfers.add(tp);
-  container.add(tp.getTo());
-  transferFieldIds.add(vectorRead.getFieldId().getFieldIds()[0]);
+final FieldReference fieldReference = flattenExpr.getRef();
+final TransferPair transferPair = 
getFlattenFieldTransferPair(fieldReference);
+
+if (transferPair != null) {
+  final ValueVector flattenVector = transferPair.getTo();
+
+  // checks that list has only default ValueVector and replaces 
resulting ValueVector to INT typed ValueVector
+  if (exprs.size() == 0 && 
flattenVector.getField().getType().equals(Types.LATE_BIND_TYPE)) {
+final MaterializedField outputField = 
MaterializedField.create(fieldReference.getAsNamePart().getName(), 
Types.OPTIONAL_INT);
+final ValueVector vector = TypeHelper.getNewVector(outputField, 
oContext.getAllocator());
--- End diff --

The fix appears to be to transform an empty list into an empty list of 
integers. That is, Drill does not have the concept of "empty list", only "empty 
list of type X" and we are guessing the type to be integer.

We've had issues elsewhere in the product where such guesses turn out to be 
wrong. Perhaps the next row/batch has a non-empty list, but of strings. Or 
worse, of objects (maps.) Downstream operators cannot handle this.

The result is that a query fails for no better reason than we caused it to 
fail by guessing the wrong type.

Clearly, fixing the broader problem is beyond the scope of this fix. I am 
pointing out, however, that a consequence of the assumptirnmade here is that 
some queries, somewhere later, will fail due to an artificial schema change.

The correct solution is to introduce an "Unknown" type and mark this a 
vector of type "Unknown". All we know is that it is a list; the member types 
are unknown. Then, in downstream operators, when we encounter a schema change, 
we know that an empty list of "Unknown" type is compatible with a list of any 
other type (say maps.)


> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
>  Labels: ready-to-commit
> Fix For: Future
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2016-12-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785593#comment-15785593
 ] 

ASF GitHub Bot commented on DRILL-3562:
---

GitHub user Serhii-Harnyk opened a pull request:

https://github.com/apache/drill/pull/713

DRILL-3562: Query fails when using flatten on JSON data where some do…

…cuments have an empty array
1. Added set for ListWriters tracking to keep empty arrays for further 
initializing in ensureAtLeastOneField method. 
2. Added check to avoid schema generating with field type "Late" and mode 
"Optional", replaced it to "Int" type in FlattenRecordBatch class.
3. Added unit tests to cover cases querying Json with empty arrays with 
flatten.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Serhii-Harnyk/drill DRILL-3562

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/713.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #713


commit 815de0cc6f16b247b3a655007241a074e38394c7
Author: Serhii-Harnyk 
Date:   2016-12-20T16:55:41Z

DRILL-3562: Query fails when using flatten on JSON data where some 
documents have an empty array




> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
> Fix For: Future
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2016-12-20 Thread Serhii Harnyk (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15763911#comment-15763911
 ] 

Serhii Harnyk commented on DRILL-3562:
--

This issue is still reproducing on the Json file:
{noformat}
{ "a": { "b": { "c": [] } } }
{noformat}
with query:
{code}
select FLATTEN(t.a.b.c) AS c from dfs.`/home/files/flat.json` t;
{code}

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
>Assignee: Serhii Harnyk
> Fix For: Future
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2016-07-28 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397236#comment-15397236
 ] 

Arina Ielchiieva commented on DRILL-3562:
-

[~dekken], [~fmethot]
Can you please confirm that issue is still reproducing?

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
> Fix For: Future
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2016-07-26 Thread Arina Ielchiieva (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15394058#comment-15394058
 ] 

Arina Ielchiieva commented on DRILL-3562:
-

Can't reproduce the issue.
Json file:
{noformat}
{ "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } }
{ "a": { "b": { "c": [] } } }
{noformat}

Query:
{code:sql}
select count(*) from (select FLATTEN(t.a.b.c) AS c from 
dfs.`/home/files/flat.json` t) flat WHERE flat.c.d.e = 'f' limit 1;
{code}

Result:
{noformat}
1
{noformat}


> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
> Fix For: Future
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2016-06-30 Thread JIRA

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15357603#comment-15357603
 ] 

F Méthot commented on DRILL-3562:
-

Really would like to see this one fixed!
Here is a workaround we are doing to get to our data:

This will extract data without the null arrays:
   select t.a.b.c as c from dfs.`flat.json` t where  t.a.b.c[0]['d'] is not null
   (d is an value name expected to be found within the array)
but flatten still won't work:
To get flatten working:
create table TEMP_JSON_DATA  as (select t.a.b.c as c from dfs..`flat.json` 
t where  t.a.b.c[0]['d'] is not null);
then 
   select flatten(c) from TEMP_JSON_DATA;

(using parquet format for temp table)

For interactive analysis of data, is a pretty lame workaround, but for 
scripting environment that worked out fine, if you automate dropping of the 
temp table.



> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
> Fix For: Future
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2016-03-19 Thread Ian Hellstrom (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201352#comment-15201352
 ] 

Ian Hellstrom commented on DRILL-3562:
--

Is this a duplicate of 
[DRILL-2217|http://issues.apache.org/jira/browse/DRILL-2217]?

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
> Fix For: Future
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array

2016-02-03 Thread Dan Osipov (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130550#comment-15130550
 ] 

Dan Osipov commented on DRILL-3562:
---

Facing the same issue, any workaround?

> Query fails when using flatten on JSON data where some documents have an 
> empty array
> 
>
> Key: DRILL-3562
> URL: https://issues.apache.org/jira/browse/DRILL-3562
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - JSON
>Affects Versions: 1.1.0
>Reporter: Philip Deegan
> Fix For: Future
>
>
> Drill query fails when using flatten when some records contain an empty array 
> {noformat}
> SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) 
> flat WHERE flat.c.d.e = 'f' limit 1;
> {noformat}
> Succeeds on 
> { "a": { "b": { "c": [  { "d": {  "e": "f" } } ] } } }
> Fails on
> { "a": { "b": { "c": [] } } }
> Error
> {noformat}
> Error: SYSTEM ERROR: ClassCastException: Cannot cast 
> org.apache.drill.exec.vector.NullableIntVector to 
> org.apache.drill.exec.vector.complex.RepeatedValueVector
> {noformat}
> Is it possible to ignore the empty arrays, or do they need to be populated 
> with dummy data?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)