Re: Push-based shuffle SPIP

2020-08-24 Thread mshen
The linked doc with detailed information of the branch does not seem to be shareable publicly. We have created a copy of the doc which should be publicly accessible. https://docs.google.com/document/d/1Q5m7YAp0HyG_TNFL4p_bjQgzzw33ik5i49Vr86UNZgg/edit?usp=sharing - Min Shen Staff Software

Re: Push-based shuffle SPIP

2020-08-24 Thread Mridul Muralidharan
Hi, Thanks for sending out the proposal Min ! For the SPIP requirements, I am willing to act as the shepherd for this proposal. The jira + paper + proposal provides the high level design and implementation details. The vldb paper discusses the performance gains in detail for the inhouse

Push-based shuffle SPIP

2020-08-24 Thread mshen
We raised this SPIP ticket in https://issues.apache.org/jira/browse/SPARK-30602 earlier this year. Since then, we have progressed in multiple fronts, including: * Our work is published in VLDB 2020. The final version of the paper is attached in the SPIP ticket. * We have further enhanced and

PySpark: Un-deprecating inferring DataFrame schema from list of dictionaries

2020-08-24 Thread Nicholas Chammas
https://github.com/apache/spark/pull/29510 I don't think this is a big deal, but since we're removing a deprecation that has been around for ~6 years, I figured it would be good to bring everyone's attention to this change. Hopefully, we are not breaking any hidden assumptions about the

Re: [SparkSql] Casting of Predicate Literals

2020-08-24 Thread Chao Sun
> Currently we can't. This is something we should improve, by either pushing down the cast to the data source, or simplifying the predicates to eliminate the cast. Hi all, I've created https://issues.apache.org/jira/browse/SPARK-32694 to track this. Welcome to comment on the JIRA. On Wed, Aug