I create an issue in Spark project: SPARK-14820
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-Reduce-Shuffle-Data-by-pushing-filter-toward-storage-tp17297p17306.html
Sent from the Apache Spark Developers List mailing list archive
| 100 | 4 | | 11.0 KB |
> | | 11.0 KB ||
> +++--+---++--++---+
>
> As rate of read and write a
-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-Reduce-Shuffle-Data-by-pushing-filter-toward-storage-tp17297p17299.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: dev-un
;
> +++--+---++--++---+
> | 100 | 3 | | 834 MB | 11.0 KB |
> | 288 MB | 11.0 KB|
>
> +----+----+--+-------+---
| 3.00|
+----+----------+--------------++
| 10 | 82.611 MB | 28.157 MB
| 2.93|
++--+--
11 MB | 2.733 MB
| 3.00 |
+------------+----------+------++|
10 | 82.611 MB | 28.157 MB
| 2.93 |
++--+--++|
100 | 834.311 MB | 288.081 MB
| 2.89 |
++--+--++So
as you see shuffle read and write can be reduced by factor of 3 if we can
push more intelligent toward of storage.
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/Spark-SQL-Reduce-Shuffle-Data-by-pushing-filter-toward-storage-tp17296.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.