[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
SaurabhChawla100 commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-659392227 @cloud-fan / @dongjoon-hyun - The test build is failing with following error ``` Test build #125949 has finished for PR 29045 at commit c0f6209. This patch fails PySpark pip packaging tests. This patch merges cleanly. This patch adds no public classes. ``` I am not sure if its related to change done on this PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
SaurabhChawla100 commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-658561486 Retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
SaurabhChawla100 commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657978027 > Can you be more specific about the problem? Are you saying that the actual file schema doesn't match the table schema specified by the user? So in case of orc data created by the hive no field names in the physical schema. Please find the below code for reference. https://github.com/apache/spark/blob/24be81689cee76e03cd5136dfd089123bbff4595/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala#L133 So from this code we are sending the index of the col from the dataschema. But Where as in the below code , we are passing the input result schema and that result schema will not have that index number that is passed from OrcUtils.scala https://github.com/apache/spark/blob/24be81689cee76e03cd5136dfd089123bbff4595/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala#L211 For example - ``` val u = """select date_dim.d_date_id from date_dim limit 5""" spark.sql(u).collect ``` Here the value of index(d_date_id) returned by the OrcUtils.scala#L133 is 2 where the resultSchema passed in OrcFileFormat.scala#L211 is having only one struct<`d_date_id`:string> This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
SaurabhChawla100 commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657455284 > @dongjoon-hyun / @maropu - Its seems like some issue happened after I rebase this current branch with the master branch. > > Its now showing number of files change is -185. Not sure why its showing this difference Now its fine This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
SaurabhChawla100 commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657451506 @dongjoon-hyun / @maropu - Its seems like some issue happened after I rebase this current branch with the master branch. Its now showing number of files change is -185. Not sure why its showing this difference This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
SaurabhChawla100 commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657186265 > Thank you for updating, but could you update the PR description with the reproducible example? If someone is following the example, it will not fail because they don't have a data in `/Users/test/tpcds_scale5data/date_dim`. Also, please remove irrelevant stuff like `TBLPROPERTIES`. @dongjoon-hyun - Thank you for reviewing this PR . I have updated the PR description with prerequisite create date from Hive '/Users/test/tpcds_scale5data/date_dim`. This is needed since as that is the real time scenario where its failing , Also I have added the second example in the PR description where this error can be reproduce. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
SaurabhChawla100 commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657185061 > @SaurabhChawla100 . Thanks for updating. Apache Spark has two ORC implementations `native` and `hive`. Could you add the following test coverage more? > > 1. A test coverage for `hive` ORC implementation > > * sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala > > 1. A non-vectorized code test coverage for `native` ORC implementaion > > * `spark.sql.orc.enableVectorizedReader=false` Added both the requested unit test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables
SaurabhChawla100 commented on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-655888981 > Thank you for your contribution, @SaurabhChawla100 . > In order to prevent the future regression, could you make a UT with your example, please? Sure I will add the unit test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org