[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

sun-rui Sun, 15 May 2016 19:33:07 -0700

Github user sun-rui commented on the pull request:

    https://github.com/apache/spark/pull/12836#issuecomment-219336575
  
    @NarineK, 
    1. No need to add gapplymode. You can change isDataFrame from boolean to 
Int which can hold multiple values. 0 - RDD mode, 1 - dapplymode, 2 
-gapplymmode.
    2. No need to serialize Key to R bytes for each group. Just use SerDe to 
write key columns to R worker.
    The format is something like:
      (Number of key columns)(Key1)(key2)(...)(keyn)(Number of Rows in the 
groups)(Rows...)
    Rows in the group are serialized to R bytes to re-use current logic
    3. The R function for gapply takes two parameters: (Keys, ldf). I think it 
is helpful to pass the key in addition to the group.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

Reply via email to