[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException
[ https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15839306#comment-15839306 ] Swaranga Sarma commented on SPARK-11620: I encountered this issue in Spark 2.0.2 > parquet.hadoop.ParquetOutputCommitter.commitJob() throws > parquet.io.ParquetEncodingException > > > Key: SPARK-11620 > URL: https://issues.apache.org/jira/browse/SPARK-11620 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: swetha k > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException
[ https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557931#comment-15557931 ] Hyukjin Kwon commented on SPARK-11620: -- [~swethakasireddy] Could you please check if this still happens in the current master or latest versions? > parquet.hadoop.ParquetOutputCommitter.commitJob() throws > parquet.io.ParquetEncodingException > > > Key: SPARK-11620 > URL: https://issues.apache.org/jira/browse/SPARK-11620 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: swetha k > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException
[ https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034650#comment-15034650 ] swetha k commented on SPARK-11620: -- [~hyukjin.kwon] I have the following code that saves the parquet files in my hourly batch to hdfs and the code is based on the github link in the end. val job = Job.getInstance() var filePath = "path" val metricsPath: Path = new Path(filePath) //Check if inputFile exists val fs: FileSystem = FileSystem.get(job.getConfiguration) if (fs.exists(metricsPath)) { fs.delete(metricsPath, true) } // Configure the ParquetOutputFormat to use Avro as the serialization format ParquetOutputFormat.setWriteSupportClass(job, classOf[AvroWriteSupport]) // You need to pass the schema to AvroParquet when you are writing objects but not when you // are reading them. The schema is saved in Parquet file for future readers to use. AvroParquetOutputFormat.setSchema(job, Metrics.SCHEMA$) // Create a PairRDD with all keys set to null and wrap each Metrics in serializable objects val metricsToBeSaved = metrics.map(metricRecord => (null, new SerializableMetrics(new Metrics(metricRecord._1, metricRecord._2._1, metricRecord._2._2; metricsToBeSaved.coalesce(1500) // Save the RDD to a Parquet file in our temporary output directory metricsToBeSaved.saveAsNewAPIHadoopFile(filePath, classOf[Void], classOf[Metrics], classOf[ParquetOutputFormat[Metrics]], job.getConfiguration) https://github.com/massie/spark-parquet-example/blob/master/src/main/scala/com/zenfractal/SparkParquetExample.scala > parquet.hadoop.ParquetOutputCommitter.commitJob() throws > parquet.io.ParquetEncodingException > > > Key: SPARK-11620 > URL: https://issues.apache.org/jira/browse/SPARK-11620 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: swetha k > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException
[ https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15019288#comment-15019288 ] swetha k commented on SPARK-11620: -- [~hyukjin.kwon] If I use ParquetInputFormat.setReadSupportClass(job, classOf[AvroReadSupport[PreviousPVTracker]]) with Parquet 1.7.0 , I see the following error. It looks like its is not a part of Parquet 1.7.0. My code is based on http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/. not found: type AvroReadSupport [ERROR] ParquetInputFormat.setReadSupportClass(job, classOf[AvroReadSupport[PreviousPVTracker]]) > parquet.hadoop.ParquetOutputCommitter.commitJob() throws > parquet.io.ParquetEncodingException > > > Key: SPARK-11620 > URL: https://issues.apache.org/jira/browse/SPARK-11620 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: swetha k > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException
[ https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15019284#comment-15019284 ] swetha k commented on SPARK-11620: -- It is not an error. It is a WARNING and I see the following. Nov 8, 2015 11:35:39 PM WARNING: parquet.hadoop.ParquetOutputCommitter: could not write summary file for active_sessions_current parquet.io.ParquetEncodingException: maprfs:/user/testId/active_sessions_current/part-r-00142.parquet invalid: all the files must be contained in the root active_sessions_current at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422) at parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398) at parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1056) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:998) > parquet.hadoop.ParquetOutputCommitter.commitJob() throws > parquet.io.ParquetEncodingException > > > Key: SPARK-11620 > URL: https://issues.apache.org/jira/browse/SPARK-11620 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: swetha k > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException
[ https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15018051#comment-15018051 ] swetha k commented on SPARK-11620: -- [~hyukjin.kwon] We use Spark 1.5.2 now and it still shows the same error. Which version of Parquet-Avro should be used for that? Thanks, Swetha > parquet.hadoop.ParquetOutputCommitter.commitJob() throws > parquet.io.ParquetEncodingException > > > Key: SPARK-11620 > URL: https://issues.apache.org/jira/browse/SPARK-11620 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: swetha k > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException
[ https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15015468#comment-15015468 ] Hyukjin Kwon commented on SPARK-11620: -- [~swethakasireddy] It uses 1.6.0rc3. Hm.. Would please you give me a more detailed description? such as the command you ran and full message of exception > parquet.hadoop.ParquetOutputCommitter.commitJob() throws > parquet.io.ParquetEncodingException > > > Key: SPARK-11620 > URL: https://issues.apache.org/jira/browse/SPARK-11620 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: swetha k > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException
[ https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002384#comment-15002384 ] swetha k commented on SPARK-11620: -- [~hyukjin.kwon] We are using Spark 1.4.1 in one of Clusters. Which parquet version should be used for 1.4.1? > parquet.hadoop.ParquetOutputCommitter.commitJob() throws > parquet.io.ParquetEncodingException > > > Key: SPARK-11620 > URL: https://issues.apache.org/jira/browse/SPARK-11620 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: swetha k > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException
[ https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001746#comment-15001746 ] Hyukjin Kwon commented on SPARK-11620: -- Can you tell me your Spark version? Spark 1.5.1 uses Parquet 1.7.0, which you can use the library from here. http://mvnrepository.com/artifact/org.apache.parquet/parquet-avro > parquet.hadoop.ParquetOutputCommitter.commitJob() throws > parquet.io.ParquetEncodingException > > > Key: SPARK-11620 > URL: https://issues.apache.org/jira/browse/SPARK-11620 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: swetha k > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11620) parquet.hadoop.ParquetOutputCommitter.commitJob() throws parquet.io.ParquetEncodingException
[ https://issues.apache.org/jira/browse/SPARK-11620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14998015#comment-14998015 ] swetha k commented on SPARK-11620: -- I see the following Warning message when I use parquet-avro. Following is the dependency that I use. com.twitter parquet-avro 1.6.0 Nov 8, 2015 11:35:39 PM WARNING: parquet.hadoop.ParquetOutputCommitter: could not write summary file for active_sessions_current parquet.io.ParquetEncodingException: maprfs:/user/testId/active_sessions_current/part-r-00142.parquet invalid: all the files must be contained in the root active_sessions_current at parquet.hadoop.ParquetFileWriter.mergeFooters(ParquetFileWriter.java:422) at parquet.hadoop.ParquetFileWriter.writeMetadataFile(ParquetFileWriter.java:398) at parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:51) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1056) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:998) > parquet.hadoop.ParquetOutputCommitter.commitJob() throws > parquet.io.ParquetEncodingException > > > Key: SPARK-11620 > URL: https://issues.apache.org/jira/browse/SPARK-11620 > Project: Spark > Issue Type: Bug >Reporter: swetha k > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org