[GitHub] spark issue #10162: [SPARK-11250] [SQL] Generate different alias for columns...

NarineK Thu, 27 Oct 2016 16:09:06 -0700

Github user NarineK commented on the issue:

    https://github.com/apache/spark/pull/10162
  
    Thank you for following up on this, @marmbrus !
    I looked into two places: R and Pandas DataFrames.
    
    In R it seems that they give new names to columns(columns which aren't in 
merge/join condition) and then merge:
    
https://github.com/talgalili/R-code-snippets/blob/master/merge.data.frame.r#L86
    
    In SparkR I've implemented it for the merge: 
https://github.com/apache/spark/pull/9012
    
    In Pandas, DataFrame join has the suffixes too.
    It creates new labels using suffixes:
    https://github.com/pandas-dev/pandas/blob/master/pandas/tools/merge.py#L901
    
    then, uses the new labels, as an axis (I assume column names) of the joined 
DataFrame.
    https://github.com/pandas-dev/pandas/blob/master/pandas/tools/merge.py#L916
     
    Maybe we can have something similar to Pandas ?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #10162: [SPARK-11250] [SQL] Generate different alias for columns...

Reply via email to