[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-16 Thread GitBox


SaurabhChawla100 commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-659392227


   @cloud-fan / @dongjoon-hyun  - The test build is failing with following 
error 
   ```
   Test build #125949 has finished for PR 29045 at commit c0f6209.
   
   This patch fails PySpark pip packaging tests.
   This patch merges cleanly.
   This patch adds no public classes.
   ```
   I am not sure if its related to change done on this PR



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-14 Thread GitBox


SaurabhChawla100 commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-658561486


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-13 Thread GitBox


SaurabhChawla100 commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-657978027


   > Can you be more specific about the problem? Are you saying that the actual 
file schema doesn't match the table schema specified by the user?
   
   So in case of orc data created by the hive no field names in the physical 
schema. Please find the below code for reference.
   
https://github.com/apache/spark/blob/24be81689cee76e03cd5136dfd089123bbff4595/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala#L133
   
   So from this code we are sending the index of the col from the dataschema.
   
   But Where as in the below code , we are passing the input result schema and 
that result schema will not have that index number that is passed from 
OrcUtils.scala
   
https://github.com/apache/spark/blob/24be81689cee76e03cd5136dfd089123bbff4595/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala#L211
   
   For example - 
   
   ```
   val u = """select date_dim.d_date_id from date_dim limit 5"""
   
   spark.sql(u).collect
   ```
   
   Here the value of index(d_date_id) returned by the OrcUtils.scala#L133 is 2 
   
   where the resultSchema passed in OrcFileFormat.scala#L211 is having only one 
 struct<`d_date_id`:string> 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-13 Thread GitBox


SaurabhChawla100 commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-657455284


   > @dongjoon-hyun / @maropu - Its seems like some issue happened after I 
rebase this current branch with the master branch.
   > 
   > Its now showing number of files change is -185. Not sure why its showing 
this difference
   
   Now its fine



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-13 Thread GitBox


SaurabhChawla100 commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-657451506


   @dongjoon-hyun / @maropu  - Its seems like some issue happened after I 
rebase this current branch with the master branch. 
   
   Its now showing number of files change is -185. Not sure why its showing 
this difference



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-12 Thread GitBox


SaurabhChawla100 commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-657186265


   > Thank you for updating, but could you update the PR description with the 
reproducible example? If someone is following the example, it will not fail 
because they don't have a data in `/Users/test/tpcds_scale5data/date_dim`. 
Also, please remove irrelevant stuff like `TBLPROPERTIES`.
   
   @dongjoon-hyun  - Thank you for reviewing this PR . 
   I have updated the PR description with prerequisite create date from Hive  
'/Users/test/tpcds_scale5data/date_dim`. This is needed since as that is the 
real time scenario where its failing , Also I have added the second example in 
the PR description where this error can be reproduce.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-12 Thread GitBox


SaurabhChawla100 commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-657185061


   > @SaurabhChawla100 . Thanks for updating. Apache Spark has two ORC 
implementations `native` and `hive`. Could you add the following test coverage 
more?
   > 
   > 1. A test coverage for `hive` ORC implementation
   > 
   > * sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileFormat.scala
   > 
   > 1. A non-vectorized code test coverage for `native` ORC implementaion
   > 
   > * `spark.sql.orc.enableVectorizedReader=false`
   
   Added both the requested unit test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-08 Thread GitBox


SaurabhChawla100 commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-655888981


   > Thank you for your contribution, @SaurabhChawla100 .
   > In order to prevent the future regression, could you make a UT with your 
example, please?
   
   Sure I will add the unit test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org