Github user dwmclary commented on the pull request:
https://github.com/apache/spark/pull/3213#issuecomment-63145286
Happy to help; these changes should be quick.
- Sure, the wrapper for pyspark makes more sense; I hadn't considered
that we'd be shipping the objects back and forth to py4j.
- Jackson should be a simple change; I'll just go hash->json via
objectmapper if that makes sense.
Cheers,
Dan
On Fri, Nov 14, 2014 at 2:27 PM, Michael Armbrust <[email protected]>
wrote:
> Thanks for working on this. I have two high level comments:
>
> - I think it would be better to have a single implementation in Scala
> with a wrapper in python. This way we don't have to serialize / ship
the
> objects to python which seems like it might be expensive, especially
if the
> next thing you are going to do is something like saveAsTextFile
> - It would also be better to use jackson to do the generation of the
> JSON string as there are a lot of tricky edge cases around escaping
that we
> need to handle if we do it ourselves. For example, I think this version
> will fail if a column name contains a quote character.
>
> /cc @yhuai <https://github.com/yhuai>
>
> â
> Reply to this email directly or view it on GitHub
> <https://github.com/apache/spark/pull/3213#issuecomment-63139261>.
>
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]