Re: [Spark-SQL] Reduce Shuffle Data by pushing filter toward storage

2016-04-21 Thread atootoonchian
I create an issue in Spark project: SPARK-14820 -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-Reduce-Shuffle-Data-by-pushing-filter-toward-storage-tp17297p17306.html Sent from the Apache Spark Developers List mailing list archive

Re: [Spark-SQL] Reduce Shuffle Data by pushing filter toward storage

2016-04-21 Thread Ted Yu
| 100 | 4 | | 11.0 KB | > | | 11.0 KB || > +++--+---++--++---+ > > As rate of read and write a

Re: [Spark-SQL] Reduce Shuffle Data by pushing filter toward storage

2016-04-21 Thread atootoonchian
-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-Reduce-Shuffle-Data-by-pushing-filter-toward-storage-tp17297p17299.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-un

Re: [Spark-SQL] Reduce Shuffle Data by pushing filter toward storage

2016-04-21 Thread Marcin Tustin
; > +++--+---++--++---+ > | 100 | 3 | | 834 MB | 11.0 KB | > | 288 MB | 11.0 KB| > > +----+----+--+-------+---

[Spark-SQL] Reduce Shuffle Data by pushing filter toward storage

2016-04-21 Thread atootoonchian
| 3.00| +----+----------+--------------++ | 10 | 82.611 MB | 28.157 MB | 2.93| ++--+--

[Spark-SQL] Reduce Shuffle Data by pushing filter toward storage

2016-04-21 Thread atootoonchian
11 MB | 2.733 MB | 3.00 | +------------+----------+------++| 10 | 82.611 MB | 28.157 MB | 2.93 | ++--+--++| 100 | 834.311 MB | 288.081 MB | 2.89 | ++--+--++So as you see shuffle read and write can be reduced by factor of 3 if we can push more intelligent toward of storage. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-Reduce-Shuffle-Data-by-pushing-filter-toward-storage-tp17296.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.