[jira] [Commented] (SPARK-9858) Introduce an ExchangeCoordinator to estimate the number of post-shuffle partitions.
[ https://issues.apache.org/jira/browse/SPARK-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813158#comment-16813158 ] ketan kunde commented on SPARK-9858: [~aroberts] : did this exchangecordinator suite test cases pass for your big endian environment, exclusively test cases by the following name test(s"determining the number of reducers: complex query 1 test(s"determining the number of reducers: complex query 2 The above test cases are also seen failing on my big endian environment with the below respective logs * determining the number of reducers: complex query 1 *** FAILED *** Set(1, 2) did not equal Set(2, 3) (ExchangeCoordinatorSuite.scala:424) - determining the number of reducers: complex query 2 *** FAILED *** Set(4, 2) did not equal Set(5, 3) (ExchangeCoordinatorSuite.scala:476) Since this ticket is RESOLVED i would like to know from you what is the change u did to ensure passing of this test cases Also could you also highlight which exact feature of spark does this test case test I would be very greatful for your reply. Regards Ketan > Introduce an ExchangeCoordinator to estimate the number of post-shuffle > partitions. > --- > > Key: SPARK-9858 > URL: https://issues.apache.org/jira/browse/SPARK-9858 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Yin Huai >Assignee: Yin Huai >Priority: Major > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20984) Reading back from ORC format gives error on big endian systems.
[ https://issues.apache.org/jira/browse/SPARK-20984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813126#comment-16813126 ] ketan kunde commented on SPARK-20984: - Hi I understand that ORC file format is not well read on big endian systems. I am looking to build spark as spark standalone, since orc related test cases are exclusive to Hive module which will not be part of spark standalone build Can i neglect all orc related test cases for spark standalone build and ensure that i am not compromising on any of the spark standalone features? Regards Ketan Kunde > Reading back from ORC format gives error on big endian systems. > --- > > Key: SPARK-20984 > URL: https://issues.apache.org/jira/browse/SPARK-20984 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.0.0 > Environment: Redhat 7 on power 7 Big endian platform. > [testuser@soe10-vm12 spark]$ cat /etc/redhat- > redhat-access-insights/ redhat-release > [testuser@soe10-vm12 spark]$ cat /etc/redhat-release > Red Hat Enterprise Linux Server release 7.2 (Maipo) > [testuser@soe10-vm12 spark]$ lscpu > Architecture: ppc64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Big Endian > CPU(s):8 > On-line CPU(s) list: 0-7 > Thread(s) per core:1 > Core(s) per socket:1 > Socket(s): 8 > NUMA node(s): 1 > Model: IBM pSeries (emulated by qemu) > L1d cache: 32K > L1i cache: 32K > NUMA node0 CPU(s): 0-7 > [testuser@soe10-vm12 spark]$ >Reporter: Mahesh >Priority: Major > Labels: big-endian > Attachments: hive_test_failure_log.txt > > > All orc test cases seem to be failing here. Looks like spark is not able to > read back what is written. Following is a way to check it on spark shell. I > am also pasting the test case which probably passes on x86. > All test cases in OrcHadoopFsRelationSuite.scala are failing. > test("SPARK-12218: 'Not' is included in ORC filter pushdown") { > import testImplicits._ > withSQLConf(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key -> "true") { > withTempPath { dir => > val path = s"${dir.getCanonicalPath}/table1" > (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", > "b").write.orc(path) > checkAnswer( > spark.read.orc(path).where("not (a = 2) or not(b in ('1'))"), > (1 to 5).map(i => Row(i, (i % 2).toString))) > checkAnswer( > spark.read.orc(path).where("not (a = 2 and b in ('1'))"), > (1 to 5).map(i => Row(i, (i % 2).toString))) > } > } > } > Same can be reproduced on spark shell > **Create a DF and write it in orc > scala> (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", > "b").write.orc("test") > **Now try to read it back > scala> spark.read.orc("test").where("not (a = 2) or not(b in ('1'))").show > 17/06/05 04:20:48 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > org.iq80.snappy.CorruptionException: Invalid copy offset for opcode starting > at 13 > at > org.iq80.snappy.SnappyDecompressor.decompressAllTags(SnappyDecompressor.java:165) > at > org.iq80.snappy.SnappyDecompressor.uncompress(SnappyDecompressor.java:76) > at org.iq80.snappy.Snappy.uncompress(Snappy.java:43) > at > org.apache.hadoop.hive.ql.io.orc.SnappyCodec.decompress(SnappyCodec.java:71) > at > org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:214) > at > org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:238) > at java.io.InputStream.read(InputStream.java:101) > at > org.apache.hive.com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737) > at > org.apache.hive.com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701) > at > org.apache.hive.com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.(OrcProto.java:10661) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.(OrcProto.java:10625) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10730) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10725) > at > org.apache.hive.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) > at >
[jira] [Commented] (SPARK-26942) spark v 2.3.2 test failure in hive module
[ https://issues.apache.org/jira/browse/SPARK-26942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776808#comment-16776808 ] ketan kunde commented on SPARK-26942: - Logs attached test statistics of LogicalRelation converted from Hive serde tables *** FAILED *** org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 131.0 failed 1 times, most recent failure: Lost task 0.0 in stage 131.0 (TID 191, localhost, executor driver): org.iq80.snappy.CorruptionException: Invalid copy offset for opcode starting at 4841 at org.iq80.snappy.SnappyDecompressor.decompressAllTags(SnappyDecompressor.java:165) at org.iq80.snappy.SnappyDecompressor.uncompress(SnappyDecompressor.java:76) at org.iq80.snappy.Snappy.uncompress(Snappy.java:43) at org.apache.hadoop.hive.ql.io.orc.SnappyCodec.decompress(SnappyCodec.java:71) at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:214) at org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:238) at java.io.InputStream.read(InputStream.java:113) at org.apache.hive.com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737) at org.apache.hive.com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701) at org.apache.hive.com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer.(OrcProto.java:15780) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer.(OrcProto.java:15744) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer$1.parsePartialFrom(OrcProto.java:15886) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer$1.parsePartialFrom(OrcProto.java:15881) at org.apache.hive.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) at org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) at org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) at org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) at org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer.parseFrom(OrcProto.java:16226) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl$MetaInfoObjExtractor.(ReaderImpl.java:479) at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:319) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:187) at org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$getFileReader$2.apply(OrcFileOperator.scala:75) at org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$getFileReader$2.apply(OrcFileOperator.scala:73) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.TraversableOnce$class.collectFirst(TraversableOnce.scala:145) at scala.collection.AbstractIterator.collectFirst(Iterator.scala:1336) at org.apache.spark.sql.hive.orc.OrcFileOperator$.getFileReader(OrcFileOperator.scala:86) at org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$readSchema$1.apply(OrcFileOperator.scala:95) at org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$readSchema$1.apply(OrcFileOperator.scala:95) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.immutable.List.flatMap(List.scala:344) at org.apache.spark.sql.hive.orc.OrcFileOperator$.readSchema(OrcFileOperator.scala:95) at org.apache.spark.sql.hive.orc.OrcFileFormat$$anonfun$buildReader$2.apply(OrcFileFormat.scala:145) at org.apache.spark.sql.hive.orc.OrcFileFormat$$anonfun$buildReader$2.apply(OrcFileFormat.scala:136) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:148) at org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:132) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:128) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:182) at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(generated.java:36) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:64) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at
[jira] [Updated] (SPARK-20984) Reading back from ORC format gives error on big endian systems.
[ https://issues.apache.org/jira/browse/SPARK-20984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ketan kunde updated SPARK-20984: Attachment: hive_test_failure_log.txt > Reading back from ORC format gives error on big endian systems. > --- > > Key: SPARK-20984 > URL: https://issues.apache.org/jira/browse/SPARK-20984 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.0.0 > Environment: Redhat 7 on power 7 Big endian platform. > [testuser@soe10-vm12 spark]$ cat /etc/redhat- > redhat-access-insights/ redhat-release > [testuser@soe10-vm12 spark]$ cat /etc/redhat-release > Red Hat Enterprise Linux Server release 7.2 (Maipo) > [testuser@soe10-vm12 spark]$ lscpu > Architecture: ppc64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Big Endian > CPU(s):8 > On-line CPU(s) list: 0-7 > Thread(s) per core:1 > Core(s) per socket:1 > Socket(s): 8 > NUMA node(s): 1 > Model: IBM pSeries (emulated by qemu) > L1d cache: 32K > L1i cache: 32K > NUMA node0 CPU(s): 0-7 > [testuser@soe10-vm12 spark]$ >Reporter: Mahesh >Priority: Major > Labels: big-endian > Attachments: hive_test_failure_log.txt > > > All orc test cases seem to be failing here. Looks like spark is not able to > read back what is written. Following is a way to check it on spark shell. I > am also pasting the test case which probably passes on x86. > All test cases in OrcHadoopFsRelationSuite.scala are failing. > test("SPARK-12218: 'Not' is included in ORC filter pushdown") { > import testImplicits._ > withSQLConf(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key -> "true") { > withTempPath { dir => > val path = s"${dir.getCanonicalPath}/table1" > (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", > "b").write.orc(path) > checkAnswer( > spark.read.orc(path).where("not (a = 2) or not(b in ('1'))"), > (1 to 5).map(i => Row(i, (i % 2).toString))) > checkAnswer( > spark.read.orc(path).where("not (a = 2 and b in ('1'))"), > (1 to 5).map(i => Row(i, (i % 2).toString))) > } > } > } > Same can be reproduced on spark shell > **Create a DF and write it in orc > scala> (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", > "b").write.orc("test") > **Now try to read it back > scala> spark.read.orc("test").where("not (a = 2) or not(b in ('1'))").show > 17/06/05 04:20:48 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) > org.iq80.snappy.CorruptionException: Invalid copy offset for opcode starting > at 13 > at > org.iq80.snappy.SnappyDecompressor.decompressAllTags(SnappyDecompressor.java:165) > at > org.iq80.snappy.SnappyDecompressor.uncompress(SnappyDecompressor.java:76) > at org.iq80.snappy.Snappy.uncompress(Snappy.java:43) > at > org.apache.hadoop.hive.ql.io.orc.SnappyCodec.decompress(SnappyCodec.java:71) > at > org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:214) > at > org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:238) > at java.io.InputStream.read(InputStream.java:101) > at > org.apache.hive.com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737) > at > org.apache.hive.com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701) > at > org.apache.hive.com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.(OrcProto.java:10661) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.(OrcProto.java:10625) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10730) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10725) > at > org.apache.hive.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223) > at > org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.parseFrom(OrcProto.java:10937) > at > org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:113) > at > org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228) > at >
[jira] [Created] (SPARK-26942) spark v 2.3.2 test failure in hive module
ketan kunde created SPARK-26942: --- Summary: spark v 2.3.2 test failure in hive module Key: SPARK-26942 URL: https://issues.apache.org/jira/browse/SPARK-26942 Project: Spark Issue Type: Test Components: Spark Core Affects Versions: 2.3.2 Environment: ub 16.04 8GB ram 2 core machine .. docker container Reporter: ketan kunde HI, I have build spark 2.3.2 on big endian system. I am now executing test cases in hive i encounter issue related to ORC format on bigendian while runningtest "("test statistics of LogicalRelation converted from Hive serde tables")" I want to know what is support of ORC serde on big endian system and if it is supported then whats the workaround to get this test fixed? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org