from:"Daniel Dai \(Jira\)"

[jira] [Comment Edited] (PARQUET-1879) Apache Arrow can not read a Parquet File written with Parqet-Avro 1.11.0 with a Map field

2022-09-01 Thread Daniel Dai (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598759#comment-17598759
 ] 

Daniel Dai edited comment on PARQUET-1879 at 9/1/22 7:46 AM:
-

This seems to be a backward-incompatible change. We cannot read parquet file 
created pre-1.11.1 using the new version. Here is a sample error message:
{code:java}
org.apache.parquet.io.InvalidRecordException: key_value not found in optional 
group canonicals (MAP) {
  repeated group map (MAP_KEY_VALUE) {
required binary key (ENUM);
optional group value {
  optional int32 index;
  optional int64 pinId;
  optional group indexableTextIndexes (LIST) {
repeated int32 indexableTextIndexes_tuple;
  }
  optional int32 indexExpLq;
  optional int32 indexExp;
  optional boolean imageOnly;
  optional boolean link404;
  optional boolean unsafe;
  optional boolean imageNotOnPage;
  optional boolean linkStatusError;
}
  }
}
at org.apache.parquet.schema.GroupType.getFieldIndex(GroupType.java:176)
at org.apache.parquet.schema.GroupType.getType(GroupType.java:208)
at 
org.apache.parquet.schema.GroupType.checkGroupContains(GroupType.java:348)
at org.apache.parquet.schema.GroupType.checkContains(GroupType.java:339)
at 
org.apache.parquet.schema.GroupType.checkGroupContains(GroupType.java:349)
at 
org.apache.parquet.schema.MessageType.checkContains(MessageType.java:124)
at 
org.apache.parquet.hadoop.api.ReadSupport.getSchemaForRead(ReadSupport.java:56)
at 
org.apache.parquet.hadoop.thrift.ThriftReadSupport.init(ThriftReadSupport.java:187)
at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:200)
at 
org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:182)
at 
org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:216)
at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:213)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:168)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
I am not sure what's the best way to fix it. I am thinking about adding a 
walker in the construct of FileMetaData to fix the schema, is it a good idea?
 


was (Author: daijy):
This seems to be a backward-incompatible change. We cannot read parquet file 
created pre-1.11.1 using the new version. Here is a same error message:
{code:java}
org.apache.parquet.io.InvalidRecordException: key_value not found in optional 
group canonicals (MAP) {
  repeated group map (MAP_KEY_VALUE) {
required binary key (ENUM);
optional group value {
  optional int32 index;
  optional int64 pinId;
  optional group indexableTextIndexes (LIST) {
repeated int32 indexableTextIndexes_tuple;
  }
  optional int32 indexExpLq;
  optional int32 indexExp;
  optional boolean imageOnly;
  optional boolean link404;
  optional boolean unsafe;
  optional boolean imageNotOnPage;
  optional boolean linkStatusError;
}
  }
}
at org.apache.parquet.schema.GroupType.getFieldIndex(GroupType.java:176)
at org.apache.parquet.schema.GroupType.getType(GroupType.java:208)
at 
org.apache.parquet.schema.GroupType.checkGroupContains(GroupType.java:348)
at org.apache.parquet.schema.GroupType.checkContains(GroupType.java:339)
at 
org.apache.parquet.schema.GroupType.checkGroupContains(GroupType.java:349)
at 
org.apache.parquet.schema.MessageType.checkContains(MessageType.java:124)
at 
org.apache.parquet.hadoop.api.ReadSupport.getSchemaForRead(ReadSupport.java:56)
at 
org.apache.parquet.hadoop.thrift.ThriftReadSupport.init(ThriftReadSupport.java:187)
at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:200)
at 
org.apache.parquet.hadoop.ParquetRecor

[jira] [Commented] (PARQUET-1879) Apache Arrow can not read a Parquet File written with Parqet-Avro 1.11.0 with a Map field

2022-08-31 Thread Daniel Dai (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17598759#comment-17598759
 ] 

Daniel Dai commented on PARQUET-1879:
-

This seems to be a backward-incompatible change. We cannot read parquet file 
created pre-1.11.1 using the new version. Here is a same error message:
{code:java}
org.apache.parquet.io.InvalidRecordException: key_value not found in optional 
group canonicals (MAP) {
  repeated group map (MAP_KEY_VALUE) {
required binary key (ENUM);
optional group value {
  optional int32 index;
  optional int64 pinId;
  optional group indexableTextIndexes (LIST) {
repeated int32 indexableTextIndexes_tuple;
  }
  optional int32 indexExpLq;
  optional int32 indexExp;
  optional boolean imageOnly;
  optional boolean link404;
  optional boolean unsafe;
  optional boolean imageNotOnPage;
  optional boolean linkStatusError;
}
  }
}
at org.apache.parquet.schema.GroupType.getFieldIndex(GroupType.java:176)
at org.apache.parquet.schema.GroupType.getType(GroupType.java:208)
at 
org.apache.parquet.schema.GroupType.checkGroupContains(GroupType.java:348)
at org.apache.parquet.schema.GroupType.checkContains(GroupType.java:339)
at 
org.apache.parquet.schema.GroupType.checkGroupContains(GroupType.java:349)
at 
org.apache.parquet.schema.MessageType.checkContains(MessageType.java:124)
at 
org.apache.parquet.hadoop.api.ReadSupport.getSchemaForRead(ReadSupport.java:56)
at 
org.apache.parquet.hadoop.thrift.ThriftReadSupport.init(ThriftReadSupport.java:187)
at 
org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:200)
at 
org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:182)
at 
org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:216)
at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:213)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:168)
at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
I am not sure what's the best way to fix it. I am thinking about adding a 
walker in the construct of FileMetaData to fix the schema, is it a good idea?
 

> Apache Arrow can not read a Parquet File written with Parqet-Avro 1.11.0 with 
> a Map field
> -
>
> Key: PARQUET-1879
> URL: https://issues.apache.org/jira/browse/PARQUET-1879
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-avro, parquet-format
>Affects Versions: 1.11.0
>Reporter: Matthew McMahon
>Assignee: Matthew McMahon
>Priority: Critical
> Fix For: 1.12.0, 1.11.1
>
>
> From my 
> [StackOverflow|https://stackoverflow.com/questions/62504757/issue-with-loading-parquet-data-into-snowflake-cloud-database-when-written-with]
>  in relation to an issue I'm having with getting Snowflake (Cloud DB) to load 
> Parquet files written with version 1.11.0
> 
> The problem only appears when using a map schema field in the Avro schema. 
> For example:
> {code:java}
> {
>   "name": "FeatureAmounts",
>   "type": {
> "type": "map",
> "values": "records.MoneyDecimal"
>   }
> }
> {code}
> When using Parquet-Avro to write the file, a bad Parquet schema ends up with, 
> for example
> {code:java}
> message record.ResponseRecord {
>   required binary GroupId (STRING);
>   required int64 EntryTime (TIMESTAMP(MILLIS,true));
>   required int64 HandlingDuration;
>   required binary Id (STRING);
>   optional binary ResponseId (STRING);
>   required binary RequestId (STRING);
>   optional fixed_len_byte_array(12) CostInUSD (DECIMAL(28,15));
>   required group FeatureAmounts (MAP) {
> repeated group map (MAP_KEY_VALUE) {
>   required binary ke

[jira] [Commented] (PARQUET-1963) DeprecatedParquetInputFormat in CombineFileInputFormat throw NPE when the first sub-split is empty

2021-01-20 Thread Daniel Dai (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17268907#comment-17268907
 ] 

Daniel Dai commented on PARQUET-1963:
-

Thanks [~gszadovszky]!

> DeprecatedParquetInputFormat in CombineFileInputFormat throw NPE when the 
> first sub-split is empty
> --
>
> Key: PARQUET-1963
> URL: https://issues.apache.org/jira/browse/PARQUET-1963
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Reporter: Daniel Dai
>Assignee: Daniel Dai
>Priority: Major
>
> A followup of PARQUET-1947, after the fix, when the first sub-split is empty 
> in CombineFileInputFormat, there's a NPE:
> {code}
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetInputFormat$RecordReaderWrapper.next(DeprecatedParquetInputFormat.java:154)
>   at 
> org.apache.parquet.hadoop.mapred.DeprecatedParquetInputFormat$RecordReaderWrapper.next(DeprecatedParquetInputFormat.java:73)
>   at 
> cascading.tap.hadoop.io.CombineFileRecordReaderWrapper.next(CombineFileRecordReaderWrapper.java:70)
>   at 
> org.apache.hadoop.mapred.lib.CombineFileRecordReader.next(CombineFileRecordReader.java:58)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
>   at 
> cascading.tap.hadoop.util.MeasuredRecordReader.next(MeasuredRecordReader.java:61)
>   at 
> org.apache.parquet.cascading.ParquetTupleScheme.source(ParquetTupleScheme.java:160)
>   at 
> cascading.tuple.TupleEntrySchemeIterator.getNext(TupleEntrySchemeIterator.java:163)
>   at 
> cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:136)
>   ... 10 more
> {code}
> The reason is CombineFileInputFormat will use the result of createValue of 
> the first sub-split as the value container. Since the first sub-split is 
> empty, the value container is null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (PARQUET-1963) DeprecatedParquetInputFormat in CombineFileInputFormat throw NPE when the first sub-split is empty

2021-01-18 Thread Daniel Dai (Jira)

Daniel Dai created PARQUET-1963:
---

 Summary: DeprecatedParquetInputFormat in CombineFileInputFormat 
throw NPE when the first sub-split is empty
 Key: PARQUET-1963
 URL: https://issues.apache.org/jira/browse/PARQUET-1963
 Project: Parquet
  Issue Type: Bug
  Components: parquet-mr
Reporter: Daniel Dai
Assignee: Daniel Dai


A followup of PARQUET-1947, after the fix, when the first sub-split is empty in 
CombineFileInputFormat, there's a NPE:
{code}
Caused by: java.lang.NullPointerException
at 
org.apache.parquet.hadoop.mapred.DeprecatedParquetInputFormat$RecordReaderWrapper.next(DeprecatedParquetInputFormat.java:154)
at 
org.apache.parquet.hadoop.mapred.DeprecatedParquetInputFormat$RecordReaderWrapper.next(DeprecatedParquetInputFormat.java:73)
at 
cascading.tap.hadoop.io.CombineFileRecordReaderWrapper.next(CombineFileRecordReaderWrapper.java:70)
at 
org.apache.hadoop.mapred.lib.CombineFileRecordReader.next(CombineFileRecordReader.java:58)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
at 
cascading.tap.hadoop.util.MeasuredRecordReader.next(MeasuredRecordReader.java:61)
at 
org.apache.parquet.cascading.ParquetTupleScheme.source(ParquetTupleScheme.java:160)
at 
cascading.tuple.TupleEntrySchemeIterator.getNext(TupleEntrySchemeIterator.java:163)
at 
cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:136)
... 10 more
{code}

The reason is CombineFileInputFormat will use the result of createValue of the 
first sub-split as the value container. Since the first sub-split is empty, the 
value container is null.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PARQUET-1666) Remove Unused Modules

2020-12-04 Thread Daniel Dai (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17244209#comment-17244209
 ] 

Daniel Dai commented on PARQUET-1666:
-

This sounds good to me. I can also put into old branches since we are not using 
1.12 anyway. And for consuming the patch, we have internal branch and should 
not be a big deal for us.

> Remove Unused Modules 
> --
>
> Key: PARQUET-1666
> URL: https://issues.apache.org/jira/browse/PARQUET-1666
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.12.0
>Reporter: Xinli Shang
>Priority: Major
> Fix For: 1.12.0
>
>
> In the last two meetings, Ryan Blue proposed to remove some unused Parquet 
> modules. This is to open a task to track it. 
> Here are the related meeting notes for the discussion on this. 
> Remove old Parquet modules
> Hive modules - sounds good
> Scooge - Julien will reach out to twitter
> Tools - undecided - Cloudera may still use the parquet-tools according to 
> Gabor.
> Cascading - undecided
> We can change the module as deprecated as description.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (PARQUET-1666) Remove Unused Modules

2020-12-03 Thread Daniel Dai (Jira)



[ 
https://issues.apache.org/jira/browse/PARQUET-1666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243531#comment-17243531
 ] 

Daniel Dai commented on PARQUET-1666:
-

[~gszadovszky] I am fine to remove Cascading from 1.12.0. We only use it for 
legacy application and don't think will upgrade Cascading to use 1.12.

> Remove Unused Modules 
> --
>
> Key: PARQUET-1666
> URL: https://issues.apache.org/jira/browse/PARQUET-1666
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.12.0
>Reporter: Xinli Shang
>Priority: Major
> Fix For: 1.12.0
>
>
> In the last two meetings, Ryan Blue proposed to remove some unused Parquet 
> modules. This is to open a task to track it. 
> Here are the related meeting notes for the discussion on this. 
> Remove old Parquet modules
> Hive modules - sounds good
> Scooge - Julien will reach out to twitter
> Tools - undecided - Cloudera may still use the parquet-tools according to 
> Gabor.
> Cascading - undecided
> We can change the module as deprecated as description.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (PARQUET-1947) DeprecatedParquetInputFormat in CombineFileInputFormat would produce wrong data

2020-11-30 Thread Daniel Dai (Jira)



 [ 
https://issues.apache.org/jira/browse/PARQUET-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PARQUET-1947:

Attachment: Part1.java

> DeprecatedParquetInputFormat in CombineFileInputFormat would produce wrong 
> data
> ---
>
> Key: PARQUET-1947
> URL: https://issues.apache.org/jira/browse/PARQUET-1947
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cascading
>Reporter: Daniel Dai
>Priority: Major
> Attachments: Part1.java
>
>
> When we read parquet file using cascading 2, we observe wrong data in the 
> file boundary when we turn on input combine in cascading (setUseCombinedInput 
> to true).
> This can be reproduced easily with two parquet input files, each containing 
> one record. A simple cascading application (attached) read the two input with 
> setUseCombinedInput(true). What we get is the duplicated record in the first 
> input file and the missing record in the second input file.
> Here is the call sequence to understand what happen after the last record of 
> first input:
> 1. cascading invokes DeprecatedParquetInputFormat.createValue(), that's the 
> last record of first input again
> 2. CombineFileRecordReader invokes RecordReader.next and reach the EOF of 
> first input
> 3. CombineFileRecordReader creates a new 
> DeprecatedParquetInputFormat.RecordReaderWrapper, which creates the new 
> "value" variable containing the first record of second input
> 4. CombineFileRecordReader invokes RecordReader.next on the new 
> RecordReaderWrapper, but since firstRecord flag is on, next does not do 
> anything
> 5. Thus the "value" variable containing the first record of second input is 
> lost, and cascading is reusing the last record of first input



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (PARQUET-1947) DeprecatedParquetInputFormat in CombineFileInputFormat would produce wrong data

2020-11-30 Thread Daniel Dai (Jira)

Daniel Dai created PARQUET-1947:
---

 Summary: DeprecatedParquetInputFormat in CombineFileInputFormat 
would produce wrong data
 Key: PARQUET-1947
 URL: https://issues.apache.org/jira/browse/PARQUET-1947
 Project: Parquet
  Issue Type: Bug
  Components: parquet-cascading
Reporter: Daniel Dai


When we read parquet file using cascading 2, we observe wrong data in the file 
boundary when we turn on input combine in cascading (setUseCombinedInput to 
true).

This can be reproduced easily with two parquet input files, each containing one 
record. A simple cascading application (attached) read the two input with 
setUseCombinedInput(true). What we get is the duplicated record in the first 
input file and the missing record in the second input file.

Here is the call sequence to understand what happen after the last record of 
first input:
1. cascading invokes DeprecatedParquetInputFormat.createValue(), that's the 
last record of first input again
2. CombineFileRecordReader invokes RecordReader.next and reach the EOF of first 
input
3. CombineFileRecordReader creates a new 
DeprecatedParquetInputFormat.RecordReaderWrapper, which creates the new "value" 
variable containing the first record of second input
4. CombineFileRecordReader invokes RecordReader.next on the new 
RecordReaderWrapper, but since firstRecord flag is on, next does not do anything
5. Thus the "value" variable containing the first record of second input is 
lost, and cascading is reusing the last record of first input



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (PARQUET-334) UT TestSummary failed with "java.lang.RuntimeException: Usage: B = FOREACH (GROUP A ALL) GENERATE Summary(A); Can not get schema from null" when Pig >=0.15

2015-09-10 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PARQUET-334:
---
Attachment: PARQUET-334-1.patch

Input Schema is maintained by Pig inside EvalFunc. No need to maintain this in 
Parquet side. Attach patch.

> UT TestSummary failed with "java.lang.RuntimeException: Usage: B = FOREACH 
> (GROUP A ALL) GENERATE Summary(A); Can not get schema from null" when Pig 
> >=0.15
> ---
>
> Key: PARQUET-334
> URL: https://issues.apache.org/jira/browse/PARQUET-334
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.6.0
>Reporter: li xiang
>Priority: Critical
> Attachments: PARQUET-334-1.patch
>
>
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to 
> store alias B
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1694)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:623)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:636)
> at parquet.pig.summary.TestSummary.testMaxIsZero(TestSummary.java:154)
> ...
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 0: 
> java.lang.RuntimeException: Usage: B = FOREACH (GROUP A ALL) GENERATE 
> Summary(A); Can not get schema from null
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:307)
> at org.apache.pig.PigServer.launchPlan(PigServer.java:1390)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1375)
> at org.apache.pig.PigServer.execute(PigServer.java:1364)
> at org.apache.pig.PigServer.access$500(PigServer.java:113)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1689)
> ... 32 more
> Caused by: java.lang.RuntimeException: Usage: B = FOREACH (GROUP A ALL) 
> GENERATE Summary(A); Can not get schema from null
> at parquet.pig.summary.Summary.setInputSchema(Summary.java:266)
> at 
> org.apache.pig.newplan.logical.expression.ExpToPhyTranslationVisitor.visit(ExpToPhyTranslationVisitor.java:530)
> at 
> org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:132)
> at 
> org.apache.pig.newplan.ReverseDependencyOrderWalkerWOSeenChk.walk(ReverseDependencyOrderWalkerWOSeenChk.java:69)
> at 
> org.apache.pig.newplan.logical.relational.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:808)
> at 
> org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:87)
> at 
> org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:75)
> at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:258)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:293)
> ... 37 more
> Caused by: java.lang.NullPointerException
> at parquet.pig.summary.Summary.setInputSchema(Summary.java:261)
> ... 46 more
> It relates to a change on pig side: 
> pig/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java
>  introduced by PIG-3294



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PARQUET-1879) Apache Arrow can not read a Parquet File written with Parqet-Avro 1.11.0 with a Map field

[jira] [Commented] (PARQUET-1879) Apache Arrow can not read a Parquet File written with Parqet-Avro 1.11.0 with a Map field

[jira] [Commented] (PARQUET-1963) DeprecatedParquetInputFormat in CombineFileInputFormat throw NPE when the first sub-split is empty

[jira] [Created] (PARQUET-1963) DeprecatedParquetInputFormat in CombineFileInputFormat throw NPE when the first sub-split is empty

[jira] [Commented] (PARQUET-1666) Remove Unused Modules

[jira] [Commented] (PARQUET-1666) Remove Unused Modules

[jira] [Updated] (PARQUET-1947) DeprecatedParquetInputFormat in CombineFileInputFormat would produce wrong data

[jira] [Created] (PARQUET-1947) DeprecatedParquetInputFormat in CombineFileInputFormat would produce wrong data

[jira] [Updated] (PARQUET-334) UT TestSummary failed with "java.lang.RuntimeException: Usage: B = FOREACH (GROUP A ALL) GENERATE Summary(A); Can not get schema from null" when Pig >=0.15

9 matches

Site Navigation

Mail list logo

Footer information