[jira] [Commented] (SPARK-9858) Introduce an ExchangeCoordinator to estimate the number of post-shuffle partitions.

2019-04-09 Thread ketan kunde (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-9858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813158#comment-16813158
 ] 

ketan kunde commented on SPARK-9858:


[~aroberts] : did this exchangecordinator suite test cases pass for your big 
endian environment, exclusively test cases by the following name

 

test(s"determining the number of reducers: complex query 1

test(s"determining the number of reducers: complex query 2 

The above test cases are also seen failing on my big endian environment with 
the below respective logs
 * determining the number of reducers: complex query 1 *** FAILED ***
 Set(1, 2) did not equal Set(2, 3) (ExchangeCoordinatorSuite.scala:424)
- determining the number of reducers: complex query 2 *** FAILED ***
 Set(4, 2) did not equal Set(5, 3) (ExchangeCoordinatorSuite.scala:476)

Since this ticket is RESOLVED i would like to know from you what is the change 
u did to ensure passing of this test cases

Also could you also highlight which exact feature of spark does this test case 
test

I would be very greatful for your reply.

 

Regards

Ketan 

 

> Introduce an ExchangeCoordinator to estimate the number of post-shuffle 
> partitions.
> ---
>
> Key: SPARK-9858
> URL: https://issues.apache.org/jira/browse/SPARK-9858
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Yin Huai
>Priority: Major
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20984) Reading back from ORC format gives error on big endian systems.

2019-04-09 Thread ketan kunde (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16813126#comment-16813126
 ] 

ketan kunde commented on SPARK-20984:
-

Hi 

I understand that ORC file format is not well read on big endian systems.

I am looking to build spark as spark standalone, since orc related test cases 
are exclusive to Hive module which will not be part of spark standalone build

Can i neglect all orc related test cases for spark standalone build and ensure 
that i am not compromising on any of the spark standalone features?

 

Regards

Ketan Kunde

> Reading back from ORC format gives error on big endian systems.
> ---
>
> Key: SPARK-20984
> URL: https://issues.apache.org/jira/browse/SPARK-20984
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.0.0
> Environment: Redhat 7 on power 7 Big endian platform.
> [testuser@soe10-vm12 spark]$ cat /etc/redhat-
> redhat-access-insights/ redhat-release
> [testuser@soe10-vm12 spark]$ cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 7.2 (Maipo)
> [testuser@soe10-vm12 spark]$ lscpu
> Architecture:  ppc64
> CPU op-mode(s):32-bit, 64-bit
> Byte Order:Big Endian
> CPU(s):8
> On-line CPU(s) list:   0-7
> Thread(s) per core:1
> Core(s) per socket:1
> Socket(s): 8
> NUMA node(s):  1
> Model: IBM pSeries (emulated by qemu)
> L1d cache: 32K
> L1i cache: 32K
> NUMA node0 CPU(s): 0-7
> [testuser@soe10-vm12 spark]$
>Reporter: Mahesh
>Priority: Major
>  Labels: big-endian
> Attachments: hive_test_failure_log.txt
>
>
> All orc test cases seem to be failing here. Looks like spark is not able to 
> read back what is written. Following is a way to check it on spark shell. I 
> am also pasting the test case which probably passes on x86. 
> All test cases in OrcHadoopFsRelationSuite.scala are failing.
>  test("SPARK-12218: 'Not' is included in ORC filter pushdown") {
> import testImplicits._
> withSQLConf(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key -> "true") {
>   withTempPath { dir =>
> val path = s"${dir.getCanonicalPath}/table1"
> (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", 
> "b").write.orc(path)
> checkAnswer(
>   spark.read.orc(path).where("not (a = 2) or not(b in ('1'))"),
>   (1 to 5).map(i => Row(i, (i % 2).toString)))
> checkAnswer(
>   spark.read.orc(path).where("not (a = 2 and b in ('1'))"),
>   (1 to 5).map(i => Row(i, (i % 2).toString)))
>   }
> }
>   }
> Same can be reproduced on spark shell
> **Create a DF and write it in orc
> scala> (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", 
> "b").write.orc("test")
> **Now try to read it back
> scala> spark.read.orc("test").where("not (a = 2) or not(b in ('1'))").show
> 17/06/05 04:20:48 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
> org.iq80.snappy.CorruptionException: Invalid copy offset for opcode starting 
> at 13
> at 
> org.iq80.snappy.SnappyDecompressor.decompressAllTags(SnappyDecompressor.java:165)
> at 
> org.iq80.snappy.SnappyDecompressor.uncompress(SnappyDecompressor.java:76)
> at org.iq80.snappy.Snappy.uncompress(Snappy.java:43)
> at 
> org.apache.hadoop.hive.ql.io.orc.SnappyCodec.decompress(SnappyCodec.java:71)
> at 
> org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:214)
> at 
> org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:238)
> at java.io.InputStream.read(InputStream.java:101)
> at 
> org.apache.hive.com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737)
> at 
> org.apache.hive.com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701)
> at 
> org.apache.hive.com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.(OrcProto.java:10661)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.(OrcProto.java:10625)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10730)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10725)
> at 
> org.apache.hive.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
> at 
> org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
> at 
> org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
> at 
> 

[jira] [Commented] (SPARK-26942) spark v 2.3.2 test failure in hive module

2019-02-25 Thread ketan kunde (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776808#comment-16776808
 ] 

ketan kunde commented on SPARK-26942:
-

Logs attached

 

test statistics of LogicalRelation converted from Hive serde tables *** FAILED 
***
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 131.0 failed 1 times, most recent failure: Lost task 0.0 in stage 131.0 
(TID 191, localhost, executor driver): org.iq80.snappy.CorruptionException: 
Invalid copy offset for opcode starting at 4841
 at 
org.iq80.snappy.SnappyDecompressor.decompressAllTags(SnappyDecompressor.java:165)
 at org.iq80.snappy.SnappyDecompressor.uncompress(SnappyDecompressor.java:76)
 at org.iq80.snappy.Snappy.uncompress(Snappy.java:43)
 at org.apache.hadoop.hive.ql.io.orc.SnappyCodec.decompress(SnappyCodec.java:71)
 at 
org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:214)
 at 
org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:238)
 at java.io.InputStream.read(InputStream.java:113)
 at 
org.apache.hive.com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737)
 at 
org.apache.hive.com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701)
 at 
org.apache.hive.com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99)
 at org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer.(OrcProto.java:15780)
 at org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer.(OrcProto.java:15744)
 at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer$1.parsePartialFrom(OrcProto.java:15886)
 at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer$1.parsePartialFrom(OrcProto.java:15881)
 at 
org.apache.hive.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
 at 
org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
 at 
org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
 at 
org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
 at 
org.apache.hadoop.hive.ql.io.orc.OrcProto$Footer.parseFrom(OrcProto.java:16226)
 at 
org.apache.hadoop.hive.ql.io.orc.ReaderImpl$MetaInfoObjExtractor.(ReaderImpl.java:479)
 at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.(ReaderImpl.java:319)
 at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:187)
 at 
org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$getFileReader$2.apply(OrcFileOperator.scala:75)
 at 
org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$getFileReader$2.apply(OrcFileOperator.scala:73)
 at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
 at 
scala.collection.TraversableOnce$class.collectFirst(TraversableOnce.scala:145)
 at scala.collection.AbstractIterator.collectFirst(Iterator.scala:1336)
 at 
org.apache.spark.sql.hive.orc.OrcFileOperator$.getFileReader(OrcFileOperator.scala:86)
 at 
org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$readSchema$1.apply(OrcFileOperator.scala:95)
 at 
org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$readSchema$1.apply(OrcFileOperator.scala:95)
 at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
 at 
scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
 at scala.collection.immutable.List.foreach(List.scala:381)
 at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
 at scala.collection.immutable.List.flatMap(List.scala:344)
 at 
org.apache.spark.sql.hive.orc.OrcFileOperator$.readSchema(OrcFileOperator.scala:95)
 at 
org.apache.spark.sql.hive.orc.OrcFileFormat$$anonfun$buildReader$2.apply(OrcFileFormat.scala:145)
 at 
org.apache.spark.sql.hive.orc.OrcFileFormat$$anonfun$buildReader$2.apply(OrcFileFormat.scala:136)
 at 
org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:148)
 at 
org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:132)
 at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:128)
 at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:182)
 at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
 at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doAggregateWithoutKey_0$(generated.java:36)
 at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:64)
 at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
 at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
 at 

[jira] [Updated] (SPARK-20984) Reading back from ORC format gives error on big endian systems.

2019-02-25 Thread ketan kunde (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ketan kunde updated SPARK-20984:

Attachment: hive_test_failure_log.txt

> Reading back from ORC format gives error on big endian systems.
> ---
>
> Key: SPARK-20984
> URL: https://issues.apache.org/jira/browse/SPARK-20984
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.0.0
> Environment: Redhat 7 on power 7 Big endian platform.
> [testuser@soe10-vm12 spark]$ cat /etc/redhat-
> redhat-access-insights/ redhat-release
> [testuser@soe10-vm12 spark]$ cat /etc/redhat-release
> Red Hat Enterprise Linux Server release 7.2 (Maipo)
> [testuser@soe10-vm12 spark]$ lscpu
> Architecture:  ppc64
> CPU op-mode(s):32-bit, 64-bit
> Byte Order:Big Endian
> CPU(s):8
> On-line CPU(s) list:   0-7
> Thread(s) per core:1
> Core(s) per socket:1
> Socket(s): 8
> NUMA node(s):  1
> Model: IBM pSeries (emulated by qemu)
> L1d cache: 32K
> L1i cache: 32K
> NUMA node0 CPU(s): 0-7
> [testuser@soe10-vm12 spark]$
>Reporter: Mahesh
>Priority: Major
>  Labels: big-endian
> Attachments: hive_test_failure_log.txt
>
>
> All orc test cases seem to be failing here. Looks like spark is not able to 
> read back what is written. Following is a way to check it on spark shell. I 
> am also pasting the test case which probably passes on x86. 
> All test cases in OrcHadoopFsRelationSuite.scala are failing.
>  test("SPARK-12218: 'Not' is included in ORC filter pushdown") {
> import testImplicits._
> withSQLConf(SQLConf.ORC_FILTER_PUSHDOWN_ENABLED.key -> "true") {
>   withTempPath { dir =>
> val path = s"${dir.getCanonicalPath}/table1"
> (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", 
> "b").write.orc(path)
> checkAnswer(
>   spark.read.orc(path).where("not (a = 2) or not(b in ('1'))"),
>   (1 to 5).map(i => Row(i, (i % 2).toString)))
> checkAnswer(
>   spark.read.orc(path).where("not (a = 2 and b in ('1'))"),
>   (1 to 5).map(i => Row(i, (i % 2).toString)))
>   }
> }
>   }
> Same can be reproduced on spark shell
> **Create a DF and write it in orc
> scala> (1 to 5).map(i => (i, (i % 2).toString)).toDF("a", 
> "b").write.orc("test")
> **Now try to read it back
> scala> spark.read.orc("test").where("not (a = 2) or not(b in ('1'))").show
> 17/06/05 04:20:48 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
> org.iq80.snappy.CorruptionException: Invalid copy offset for opcode starting 
> at 13
> at 
> org.iq80.snappy.SnappyDecompressor.decompressAllTags(SnappyDecompressor.java:165)
> at 
> org.iq80.snappy.SnappyDecompressor.uncompress(SnappyDecompressor.java:76)
> at org.iq80.snappy.Snappy.uncompress(Snappy.java:43)
> at 
> org.apache.hadoop.hive.ql.io.orc.SnappyCodec.decompress(SnappyCodec.java:71)
> at 
> org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.readHeader(InStream.java:214)
> at 
> org.apache.hadoop.hive.ql.io.orc.InStream$CompressedStream.read(InStream.java:238)
> at java.io.InputStream.read(InputStream.java:101)
> at 
> org.apache.hive.com.google.protobuf.CodedInputStream.refillBuffer(CodedInputStream.java:737)
> at 
> org.apache.hive.com.google.protobuf.CodedInputStream.isAtEnd(CodedInputStream.java:701)
> at 
> org.apache.hive.com.google.protobuf.CodedInputStream.readTag(CodedInputStream.java:99)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.(OrcProto.java:10661)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.(OrcProto.java:10625)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10730)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter$1.parsePartialFrom(OrcProto.java:10725)
> at 
> org.apache.hive.com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
> at 
> org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
> at 
> org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
> at 
> org.apache.hive.com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeFooter.parseFrom(OrcProto.java:10937)
> at 
> org.apache.hadoop.hive.ql.io.orc.MetadataReader.readStripeFooter(MetadataReader.java:113)
> at 
> org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.readStripeFooter(RecordReaderImpl.java:228)
> at 
> 

[jira] [Created] (SPARK-26942) spark v 2.3.2 test failure in hive module

2019-02-20 Thread ketan kunde (JIRA)
ketan kunde created SPARK-26942:
---

 Summary: spark v 2.3.2 test failure in hive module
 Key: SPARK-26942
 URL: https://issues.apache.org/jira/browse/SPARK-26942
 Project: Spark
  Issue Type: Test
  Components: Spark Core
Affects Versions: 2.3.2
 Environment: ub 16.04

8GB ram

2 core machine .. 

docker container
Reporter: ketan kunde


HI,

I have build spark 2.3.2 on big endian system.
I am now executing test cases in hive
i encounter issue related to ORC format on bigendian while runningtest "("test 
statistics of LogicalRelation converted from Hive serde tables")"
I want to know what is support of ORC serde on big endian system and if it is 
supported then whats the workaround to get this test fixed?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org