[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive

2020-03-04 Thread GitBox
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive 
URL: https://github.com/apache/spark/pull/27536#issuecomment-594774176
 
 
   Closing it as we need to use regular orc


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive

2020-02-25 Thread GitBox
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive 
URL: https://github.com/apache/spark/pull/27536#issuecomment-591177652
 
 
   thank you @omalley and @dongjoon-hyun!
   
   btw, are we concerned that hive-common shipped with hive 2.3.6 and 
hive-storage-api 2.6.0 used by orc 1.5.9 share duplicate classes that have 
different versions? I am worried that we may not consistently pick up the right 
version due to class loading order, which can cause confusing runtime 
exception. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive

2020-02-20 Thread GitBox
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive 
URL: https://github.com/apache/spark/pull/27536#issuecomment-589229742
 
 
   @dongjoon-hyun So, hive 2.3 depends on apache orc instead of using orc 
embedded in hive, which means we will need to pull in regular orc instead of 
orc-nohive. Is my understanding correct?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive

2020-02-17 Thread GitBox
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive 
URL: https://github.com/apache/spark/pull/27536#issuecomment-587232690
 
 
   @dongjoon-hyun thank you for looking into it. How did sql/hive work with 
hive 1.2 and orc nohive? Does sql/hive also use hive's orc when hive 1.2 was 
used? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive

2020-02-14 Thread GitBox
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive 
URL: https://github.com/apache/spark/pull/27536#issuecomment-586549374
 
 
   thank you @dongjoon-hyun !


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive

2020-02-14 Thread GitBox
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive 
URL: https://github.com/apache/spark/pull/27536#issuecomment-586398279
 
 
   @dongjoon-hyun @wangyum do you happen to know what happened with 
https://github.com/apache/spark/pull/27536#issuecomment-585042303? Seems in 
hive module, we are sending orc project created VectorizedRowBatch to hive's 
orc data source instead of the data source file inside orc project.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive

2020-02-11 Thread GitBox
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive 
URL: https://github.com/apache/spark/pull/27536#issuecomment-585043898
 
 
   taking 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118207/testReport/org.apache.spark.sql.hive/CompressionCodecSuite/both_table_level_and_session_level_compression_are_set/
 as an example, I am not getting why the table was turned to a hive orc table.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive

2020-02-11 Thread GitBox
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive 
URL: https://github.com/apache/spark/pull/27536#issuecomment-585042303
 
 
   Also, the error cause was
   ```
   Caused by: sbt.ForkMain$ForkError: java.lang.NoSuchMethodError: 
org.apache.orc.TypeDescription.createRowBatch(I)Lorg/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch;
at 
org.apache.hadoop.hive.ql.io.orc.WriterImpl.(WriterImpl.java:96)
at 
org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:320)
at 
org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:103)
at 
org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:156)
at 
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:140)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:273)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411)
at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:281)
... 9 more
   ```
   
   Seems org.apache.hadoop.hive.ql.io.orc.WriterImpl was hive's orc.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive

2020-02-11 Thread GitBox
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive 
URL: https://github.com/apache/spark/pull/27536#issuecomment-585041496
 
 
   hmm. We need to keep hive-storage-api. But I will need to check why we hit 
the runtime exception. Somehow we used hive-storage-api's VectorizedRowBatch 
instead of orc's VectorizedRowBatch for orc code path.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive

2020-02-11 Thread GitBox
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive 
URL: https://github.com/apache/spark/pull/27536#issuecomment-584898607
 
 
   oh hive-storage-api still gets pulled in. Let me check.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org