[GitHub] spark pull request: [WIP] SPARK-1416: PySpark support for Sequence...

mateiz Sat, 19 Apr 2014 15:20:06 -0700

Github user mateiz commented on the pull request:

    https://github.com/apache/spark/pull/455#issuecomment-40882224
  
    Cool, thanks for porting this over! A few notes:
    - I looked at msgpack in the past and one problem with it was that users 
need to install it separately through "pip" to use PySpark. Before this, we had 
no external package dependencies except NumPy for ML. For this reason it would 
be good to investigate Pyrolite instead (which just uses pickling on the Python 
side). If that doesn't work, we should write the code in a way that imports 
msgpack only if you're using one of these methods.
    - There are a bunch of binary test files included, would it be possible to 
generate those programmatically instead (e.g. through saveAsSequenceFile, or 
through a Java-side static method)?
    - The build is failing due to the scalastyle checker; you can run sbt 
scalastyle locally to do the same tests there. (See 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14259/console
 for the current errors).



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP] SPARK-1416: PySpark support for Sequence...

Reply via email to