Syed Shameerur Rahman created HIVE-23851:
--------------------------------------------

             Summary: MSCK REPAIR Command With Partition Filtering Fails While 
Dropping Partitions
                 Key: HIVE-23851
                 URL: https://issues.apache.org/jira/browse/HIVE-23851
             Project: Hive
          Issue Type: Bug
    Affects Versions: 4.0.0
            Reporter: Syed Shameerur Rahman
            Assignee: Syed Shameerur Rahman


*Steps to reproduce:*
# Create external table
# Run msck command to sync all the partitions with metastore
# Remove one of the partition path
# Run msck repair with partition filtering

*Stack Trace:*
{code:java}
 2020-07-15T02:10:29,045 ERROR [4dad298b-28b1-4e6b-94b6-aa785b60c576 main] 
ppr.PartitionExpressionForMetastore: Failed to deserialize the expression
 java.lang.IndexOutOfBoundsException: Index: 110, Size: 0
 at java.util.ArrayList.rangeCheck(ArrayList.java:657) ~[?:1.8.0_192]
 at java.util.ArrayList.get(ArrayList.java:433) ~[?:1.8.0_192]
 at 
org.apache.hive.com.esotericsoftware.kryo.util.MapReferenceResolver.getReadObject(MapReferenceResolver.java:60)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hive.com.esotericsoftware.kryo.Kryo.readReferenceOrNull(Kryo.java:857)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:707) 
~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities$KryoWithHooks.readObject(SerializationUtilities.java:211)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeObjectFromKryo(SerializationUtilities.java:806)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities.deserializeExpressionFromKryo(SerializationUtilities.java:775)
 ~[hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.deserializeExpr(PartitionExpressionForMetastore.java:96)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionExpressionForMetastore.convertExprToFilter(PartitionExpressionForMetastore.java:52)
 [hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.metastore.PartFilterExprUtil.makeExpressionTree(PartFilterExprUtil.java:48)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsByExprInternal(ObjectStore.java:3593)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
 at 
org.apache.hadoop.hive.metastore.VerifyingObjectStore.getPartitionsByExpr(VerifyingObjectStore.java:80)
 [hive-standalone-metastore-server-4.0.0-SNAPSHOT-tests.jar:4.0.0-SNAPSHOT]
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_192]
 at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_192]
{code}

*Cause:*
In case of msck repair with partition filtering we expect expression proxy 
class to be set as PartitionExpressionForMetastore ( 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/ddl/misc/msck/MsckAnalyzer.java#L78
 ), While dropping partition we serialize the drop partition filter expression 
as ( 
https://github.com/apache/hive/blob/master/standalone-metastore/metastore-server/src/main/java/org/apache/hadoop/hive/metastore/Msck.java#L589
 ) which is incompatible during deserializtion happening in 
PartitionExpressionForMetastore ( 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionExpressionForMetastore.java#L52
 ) hence the query fails with Failed to deserialize the expression.

*Solutions*:
I could think of two approaches to this problem
# Since PartitionExpressionForMetastore is required only during parition 
pruning step, We can switch back the expression proxy class to 
MsckPartitionExpressionProxy once the partition pruning step is done.
# The other solution is to make serialization process in msck drop partition 
filter expression compatible with the one with PartitionExpressionForMetastore, 
We can do this via Reflection since the drop partition serialization happens in 
Msck class (standadlone-metatsore) by this way we can completely remove the 
need for class MsckPartitionExpressionProxy and this also helps to reduce the 
complexity of Msck Repair command with parition filtering to work with ease (no 
need to set the expression proxyClass config).

I am personally inclined to the 2nd approach. Before moving on i want to know 
if this is the best approach or is there any other better/easier approach to 
solve this problem.

PS: qtest added in HIVE-22957 mainly focused on adding missing partition. 
Forgot to add case for dropping partition.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to