Github user kanzhang commented on the pull request:
https://github.com/apache/spark/pull/1338#issuecomment-49810291
Major changes for the updated patch.
1. Replaced doctests with standalone tests
2. Fixed converter for converting BytesWritables and added read/write tests
for BytesWritable and byte arrays
3. Added HBase and Cassandra output format and converter examples
4. I used to inspect array element types and try to convert Object[] to
array of primitive types whenever possible (so that they get pickled to Python
arrays, whereas Object[] gets pickled to Python tuples). But I removed that
code, since I can't determine element types for empty arrays. Users have to
supply custom converters if they want Java arrays to appear as Python arrays
(and they can do that since they know their array types a priori).
5. No out-of-box support for reading/writing arrays, since ArrayWritable
itself doesn't have a no-arg constructor for creating an empty instance upon
deserializing. Users need to provide ArrayWritable subtypes. Custom converters
for converting arrays to suitable ArrayWritable subtypes are also needed when
writing. When reading, the default converter will convert any custom
ArrayWritable subtypes to Object[] and they get pickled to Python tuples.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---