[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive URL: https://github.com/apache/spark/pull/27536#issuecomment-594774176 Closing it as we need to use regular orc This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive URL: https://github.com/apache/spark/pull/27536#issuecomment-591177652 thank you @omalley and @dongjoon-hyun! btw, are we concerned that hive-common shipped with hive 2.3.6 and hive-storage-api 2.6.0 used by orc 1.5.9 share duplicate classes that have different versions? I am worried that we may not consistently pick up the right version due to class loading order, which can cause confusing runtime exception. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive URL: https://github.com/apache/spark/pull/27536#issuecomment-589229742 @dongjoon-hyun So, hive 2.3 depends on apache orc instead of using orc embedded in hive, which means we will need to pull in regular orc instead of orc-nohive. Is my understanding correct? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive URL: https://github.com/apache/spark/pull/27536#issuecomment-587232690 @dongjoon-hyun thank you for looking into it. How did sql/hive work with hive 1.2 and orc nohive? Does sql/hive also use hive's orc when hive 1.2 was used? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive URL: https://github.com/apache/spark/pull/27536#issuecomment-586549374 thank you @dongjoon-hyun ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive URL: https://github.com/apache/spark/pull/27536#issuecomment-586398279 @dongjoon-hyun @wangyum do you happen to know what happened with https://github.com/apache/spark/pull/27536#issuecomment-585042303? Seems in hive module, we are sending orc project created VectorizedRowBatch to hive's orc data source instead of the data source file inside orc project. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive URL: https://github.com/apache/spark/pull/27536#issuecomment-585043898 taking https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/118207/testReport/org.apache.spark.sql.hive/CompressionCodecSuite/both_table_level_and_session_level_compression_are_set/ as an example, I am not getting why the table was turned to a hive orc table. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive URL: https://github.com/apache/spark/pull/27536#issuecomment-585042303 Also, the error cause was ``` Caused by: sbt.ForkMain$ForkError: java.lang.NoSuchMethodError: org.apache.orc.TypeDescription.createRowBatch(I)Lorg/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch; at org.apache.hadoop.hive.ql.io.orc.WriterImpl.(WriterImpl.java:96) at org.apache.hadoop.hive.ql.io.orc.OrcFile.createWriter(OrcFile.java:320) at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:103) at org.apache.spark.sql.hive.execution.HiveOutputWriter.write(HiveFileFormat.scala:156) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:140) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:273) at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1411) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:281) ... 9 more ``` Seems org.apache.hadoop.hive.ql.io.orc.WriterImpl was hive's orc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive URL: https://github.com/apache/spark/pull/27536#issuecomment-585041496 hmm. We need to keep hive-storage-api. But I will need to check why we hit the runtime exception. Somehow we used hive-storage-api's VectorizedRowBatch instead of orc's VectorizedRowBatch for orc code path. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive
yhuai commented on issue #27536: [SPARK-30784] Use ORC nohive URL: https://github.com/apache/spark/pull/27536#issuecomment-584898607 oh hive-storage-api still gets pulled in. Let me check. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org