Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/3640#issuecomment-66251380
Appreciate a lot for fixing this case! The serialization wrapper class
makes sense. However, would like to make some refactoring. A summary of my
comments above:
1. Renaming `HiveFunctionCache` to `HiveFunctionWrapper`
2. Moving `HiveFunctionWrapper` to the shim layer, and keep Hive 0.12.0
code path exactly the same as before.
Considering Hive 0.12.0 is not affected by this issue, and 1.2.0 release
is really close, I'd like to lower the possibility of breaking 0.12.0 code
paths as much as possible.
3. Add a `HiveSimpleUdfWrapper`, which inherits from `HiveFunctionWrapper`.
As you've mentioned in the code, `HiveSimpleUdf` is a special case, it
shouldn't be serialized, and should always create a fresh object on executor
side. Currently this special case complicates `HiveFunctionWrapper`
implementation and makes it somewhat confusing. Defining a special subclass for
`HvieSimpleUdf` helps making `HiveFunctionWrapper` simpler (e.g. no need to
serialize the boolean flag any more).
4. Avoid using `Utilities.de/serializePlan` by mimicking
`Utilities.de/serializeObjectByKryo` in the `read/writeExternal` methods of the
wrapper class, so that we can remove the expensive `HiveConf` instantiation.
In a word, we can add two classes, `HiveFunctionWrapper` and
`HiveSImpleUdfWrapper` into the shim layer, and make sure that the 0.12.0
version behaves exactly the same as before.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]