[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

NarineK Mon, 16 May 2016 00:35:31 -0700

Github user NarineK commented on the pull request:

    https://github.com/apache/spark/pull/12836#issuecomment-219366624
  
    3. sure, let's see what others think.
    
    2. Regarding point 2: I want to make sure that I understand it correctly. 
    The format : (Number of key columns)(Key1)(key2)(...)(keyn)(Number of Rows 
in the groups)(Rows...)
    
    2.2  If (Key1)(key2)(...)(keyn) are column names, do we need the  (Number 
of key columns) ?
    Because we could write it similar to : 
    
https://github.com/NarineK/spark/blob/0b1b2558189a72903500500b96e22084120b96c0/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L157
    
    With respect to:
    (Number of Rows in the groups)(Rows...)
    wouldn't it be better to have something like:
    (Row1  Row2 0 Row3 .... 0 Rown),
    0 is practically the boundary separating group.
    
    I'm thinking that in case of (Number of Rows in the groups)(Rows...) I 
still need to do a mapping between the number of rows and groups. E.g.
    (2,3) (row1, row2, row3, row4, row5). After reading first 2 rows I need to 
start the next group ... 
    
    Thank you,
    Narine




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-12922][SparkR][WIP] Implement gapply() ...

Reply via email to