RE: Shuffle intermidiate results not being cached

2016-12-28 Thread Liang-Chi Hsieh
). >> >> The best solution I found so far (performance wise) was to write a custom >> UDAF which does the window internally. This was still 8 times lower >> throughput than batch and required a lot of coding and is not a general >> solution. >> >> I am looki

RE: Shuffle intermidiate results not being cached

2016-12-28 Thread Liang-Chi Hsieh
does the window internally. This was still 8 times lower > throughput than batch and required a lot of coding and is not a general > solution. > > I am looking for an approach to improve the performance even more > (preferably to either be on par with batch or a relatively low factor >

RE: Shuffle intermidiate results not being cached

2016-12-27 Thread assaf.mendelson
problem is that any attempt to do a streaming like this results in performance which is hundreds of times slower than batch. Is there a correct way to do such an aggregation on streaming data (using dataframes rather than RDD operations). Assaf. From: Liang-Chi Hsieh [via Apa

RE: Shuffle intermidiate results not being cached

2016-12-27 Thread Liang-Chi Hsieh
ch. > Is there a correct way to do such an aggregation on streaming data (using > dataframes rather than RDD operations). > Assaf. > > > > From: Liang-Chi Hsieh [via Apache Spark Developers List] [mailto: > ml-node+s1001551n20361h80@.nabble > ] > Sent: Monday,

RE: Shuffle intermidiate results not being cached

2016-12-26 Thread assaf.mendelson
[via Apache Spark Developers List] [mailto:ml-node+s1001551n20361...@n3.nabble.com] Sent: Monday, December 26, 2016 5:42 PM To: Mendelson, Assaf Subject: Re: Shuffle intermidiate results not being cached Hi, Let me quote your example codes: var totalTime: Long = 0 var allDF

Re: Shuffle intermidiate results not being cached

2016-12-26 Thread Liang-Chi Hsieh
w.spark.tc/ -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Shuffle-intermidiate-results-not-being-cached-tp20358p20361.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Shuffle intermidiate results not being cached

2016-12-26 Thread Mark Hamstra
e with the aggregations > from there. Instead it seems it reads each dataframe from file all over > again. > > Is this a bug? Am I doing something wrong? > > > > Thanks. > > Assaf. > > ---------- > View this message in cont

Shuffle intermidiate results not being cached

2016-12-26 Thread assaf.mendelson
text: http://apache-spark-developers-list.1001551.n3.nabble.com/Shuffle-intermidiate-results-not-being-cached-tp20358.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.