MaxGekk opened a new pull request #30756:
URL: https://github.com/apache/spark/pull/30756


   ### What changes were proposed in this pull request?
   Modify the tests that add partitions with `LOCATION`, and where the number 
of nested folders in `LOCATION` doesn't match to the number of partitioned 
columns. In that case, `ALTER TABLE .. DROP PARTITION` tries to access (delete) 
folder out of the "base" path in `LOCATION`.
   
   The problem belongs to Hive's MetaStore method `drop_partition_common`: 
   
https://github.com/apache/hive/blob/8696c82d07d303b6dbb69b4d443ab6f2b241b251/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java#L4876
   which tries to delete empty partition sub-folders recursively starting from 
the most deeper partition sub-folder up to the base folder. In the case when 
the number of sub-folder is not equal to the number of partitioned columns 
`part_vals.size()`, the method will try to list and delete folders out of the 
base path.  
   
   ### Why are the changes needed?
   To fix test failures like 
https://github.com/apache/spark/pull/30643#issuecomment-743774733:
   ```
   
org.apache.spark.sql.hive.execution.command.AlterTableAddPartitionSuite.ALTER 
TABLE .. ADD PARTITION Hive V1: SPARK-33521: universal type conversions of 
partition values
   sbt.ForkMain$ForkError: org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: File 
file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-832cb19c-65fd-41f3-ae0b-937d76c07897
 does not exist;
        at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:112)
        at 
org.apache.spark.sql.hive.HiveExternalCatalog.dropPartitions(HiveExternalCatalog.scala:1014)
   ...
   Caused by: sbt.ForkMain$ForkError: 
org.apache.hadoop.hive.metastore.api.MetaException: File 
file:/home/jenkins/workspace/SparkPullRequestBuilder/target/tmp/spark-832cb19c-65fd-41f3-ae0b-937d76c07897
 does not exist
        at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_partition_with_environment_context(HiveMetaStore.java:3381)
        at sun.reflect.GeneratedMethodAccessor304.invoke(Unknown Source)
   ```
   
   The issue can be reproduced by the following steps:
   1. Create a base folder, for example: `/Users/maximgekk/tmp/part-location`
   2. Create a sub-folder in the base folder and drop permissions for it:
   ```
   $ mkdir /Users/maximgekk/tmp/part-location/aaa
   $ chmod a-rwx chmod a-rwx /Users/maximgekk/tmp/part-location/aaa
   $ ls -al /Users/maximgekk/tmp/part-location
   total 0
   drwxr-xr-x   3 maximgekk  staff    96 Dec 13 18:42 .
   drwxr-xr-x  33 maximgekk  staff  1056 Dec 13 18:32 ..
   d---------   2 maximgekk  staff    64 Dec 13 18:42 aaa
   ```
   3. Create a table with a partition folder in the base folder:
   ```sql
   spark-sql> create table tbl (id int) partitioned by (part0 int, part1 int);
   spark-sql> alter table tbl add partition (part0=1,part1=2) location 
'/Users/maximgekk/tmp/part-location/tbl';
   ``` 
   4. Try to drop this partition:
   ```
   spark-sql> alter table tbl drop partition (part0=1,part1=2);
   20/12/13 18:46:07 ERROR HiveClientImpl:
   ======================
   Attempt to drop the partition specs in table 'tbl' database 'default':
   Map(part0 -> 1, part1 -> 2)
   In this attempt, the following partitions have been dropped successfully:
   
   The remaining partitions have not been dropped:
   [1, 2]
   ======================
   
   Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: Error 
accessing file:/Users/maximgekk/tmp/part-location/aaa;
   org.apache.spark.sql.AnalysisException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Error accessing 
file:/Users/maximgekk/tmp/part-location/aaa;
   ```
   The command fails because it tries to access to the sub-folder `aaa` that is 
out of the partition path `/Users/maximgekk/tmp/part-location/tbl`.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   By running the affected tests from local IDEA which does not have access to 
folders out of partition paths.
   
   Lead-authored-by: Max Gekk <[email protected]>
   Co-authored-by: Maxim Gekk <[email protected]>
   Signed-off-by: HyukjinKwon <[email protected]>
   (cherry picked from commit 9160d59ae379910ca3bbd04ee25d336afff28abd)
   Signed-off-by: Max Gekk <[email protected]>


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to