aa1371 opened a new pull request #35083: URL: https://github.com/apache/spark/pull/35083
JIRA: https://issues.apache.org/jira/browse/SPARK-37798 Pandas currently supports a `how="cross"` merge which provides a cartesian product of the left/right tables. This can be achieved by doing a `spark.sql.dataframe.join(..., on=None, how="inner")`. Additionally, I am currently in the middle of adding conditional merging in the pandas API (see PR here: https://github.com/pandas-dev/pandas/pull/42964). This is much easier to achieve in spark, since the functionality is already available, and we can trivially expose it in the pyspark pandas API. Due to the demand of this functionality (countless SO/pandas issues either asking how to do this, or asking questions that would be solved by this), I think that this would be worth adding even before it makes it into the core pandas API. These changes will be purely incremental on top of the existing API, and will be completely backwards compatible. Still need to add tests and docstring examples. **Examples:** **Example DFs:** ``` >>> df1 = pd.DataFrame([['Bill', 23], ['Mary', 33], ['Ted', 36]], columns=['name', 'age']) >>> df2 = pd.DataFrame([['President', 35], ['Senator', 30]], columns=['job', 'min_age']) >>> df1 name age 0 Bill 23 1 Mary 33 2 Ted 36 >>> df2 job min_age 0 President 35 1 Senator 30 ``` **Cross Merge Example:** ``` >>> df1.merge(df2, how="cross") name age job min_age 0 Bill 23 President 35 1 Bill 23 Senator 30 2 Mary 33 President 35 3 Mary 33 Senator 30 4 Ted 36 President 35 5 Ted 36 Senator 30 ``` **Conditional Merge Example:** ``` >>> df1.merge(df2, on=lambda left, right: left.age >= right.min_age) name age job min_age 0 Mary 33 Senator 30 1 Ted 36 President 35 2 Ted 36 Senator 30 ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
