Github user NarineK commented on the issue:
https://github.com/apache/spark/pull/10162
Thank you for following up on this, @marmbrus !
I looked into two places: R and Pandas DataFrames.
In R it seems that they give new names to columns(columns which aren't in
merge/join condition) and then merge:
https://github.com/talgalili/R-code-snippets/blob/master/merge.data.frame.r#L86
In SparkR I've implemented it for the merge:
https://github.com/apache/spark/pull/9012
In Pandas, DataFrame join has the suffixes too.
It creates new labels using suffixes:
https://github.com/pandas-dev/pandas/blob/master/pandas/tools/merge.py#L901
then, uses the new labels, as an axis (I assume column names) of the joined
DataFrame.
https://github.com/pandas-dev/pandas/blob/master/pandas/tools/merge.py#L916
Maybe we can have something similar to Pandas ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]