Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/9895#issuecomment-159239968
This has been broken for quite a while ever since we introduced the
isolated Hive client in 1.4. My theory about the reason why people seldom
noticed it is that:
1. Commands executed by the execution Hive client are mostly transient,
they don't touch data stored in the real metastore. Thus logically it doesn't
matter which Hive client execute them.
1. Even if the remote Hive metastore runs a version that is lower than
Spark SQL's execution Hive client, it still works as long as the Thrift
protocols used by involved commands are backwards compatible.
1. Although we've already upgraded to Hive 1.2.1, we haven't implemented
many advanced features that only exist in new Hive versions yet, thus most
commands taken by the execution Hive client are indeed backwards compatible
with lower versions.
Unfortunately the only reliable way I found to verify this change is to
inspect the internal `HiveMetaStoreClient` instance of the execution Hive
client via remote debugging. Because we need a remote Hive metastore here. For
example, we can start the Thrift server using:
```sh
$SPARK_HOME/sbin/start-thriftserver.sh\
--driver-java-options
"-agentlib:jdwp=transport=dt_socket,server=y,address=localhost:5005,suspend=y"
```
Then attach the debugger to the endpoint localhost:5005. (Remote debugging
facilities in IntelliJ IDEA can be quite neat here.)
Also, please refer to the JIRA ticket for more information about how to
reproduce this issue locally.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]