Github user kmanamcheri commented on the issue:
https://github.com/apache/spark/pull/22614
> Based on my understanding, the solution of FB team is to retry the
following commands multiple times:
>
> ```
> getPartitionsByFilterMethod.invoke(hive, table,
filter).asInstanceOf[JArrayList[Partition]]
> ```
@gatorsmile hmm my understanding was different. I thought they were
retrying the fetchAllpartitions method. Maybe @tejasapatil can clarify here?
> This really depends on what is the actual errors that fail
`getPartitionsByFilterMethod`. When there are many concurrent users share the
same metastore, `exponential backoff with retries` is very reasonable since
most of errors might be caused by timeout or similar reasons.
Doesn't this apply with every other HMS API as well? If so, shouldn't we be
building a complete solution in HiveShim around this to do an `exponential
backoff with retries` on every single HMS call in HiveShim?
> If it still fails, I would suggest to fail fast or depends on the conf
value of `spark.sql.hive.metastorePartitionPruning.fallback.enabled`
Ok I agree.
I think we need clarification from @tejasapatil on which call they retry.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]