date:20200914

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

2020-09-14 Thread Joseph Torres

+1 On Mon, Sep 14, 2020 at 6:39 PM angers.zhu wrote: > +1 > > angers.zhu > angers@gmail.com > >

[DISCUSS] Time to evaluate "continuous mode" in SS?

2020-09-14 Thread Jungtaek Lim

Hi devs, It was Spark 2.3 in Feb 2018 which introduced continuous mode in Structured Streaming as "experimental". Now we are here at 2.5 years after its release - I feel it would be a good time to evaluate the mode, whether the mode has been widely used or not, and the mode has been making

Re: Performance of VectorizedRleValuesReader

2020-09-14 Thread Chang Chen

I See. In our case, we use SingleBufferInputStream, so time spent is duplicating the backing byte buffer. Thanks Chang Ryan Blue 于2020年9月15日周二上午2:04写道： > Before, the input was a byte array so we could read from it directly. Now, > the input is a `ByteBufferInputStream` so that Parquet can

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

2020-09-14 Thread kalyan

+1 Will positively improve the performance and reliability of spark... Looking fwd to this.. Regards Kalyan. On Tue, Sep 15, 2020, 9:26 AM Joseph Torres wrote: > +1 > > On Mon, Sep 14, 2020 at 6:39 PM angers.zhu wrote: > >> +1 >> >> angers.zhu >> angers@gmail.com >> >>

Re: Performance of VectorizedRleValuesReader

2020-09-14 Thread Sean Owen

Ryan do you happen to have any opinion there? that particular section was introduced in the Parquet 1.10 update: https://github.com/apache/spark/commit/cac9b1dea1bb44fa42abf77829c05bf93f70cf20 It looks like it didn't use to make a ByteBuffer each time, but read from in. On Sun, Sep 13, 2020 at

How to clear spark Shuffle files

2020-09-14 Thread lsn248

Hi, I have a long running application and spark seem to fill up the disk with shuffle files. Eventually the job fails running out of disk space. Is there a way for me to clean the shuffle files ? Thanks -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

2020-09-14 Thread Xingbo Jiang

+1 This is an exciting new feature! On Sun, Sep 13, 2020 at 8:00 PM Mridul Muralidharan wrote: > Hi, > > I'd like to call for a vote on SPARK-30602 - SPIP: Support push-based > shuffle to improve shuffle efficiency. > Please take a look at: > >- SPIP jira:

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

2020-09-14 Thread Venkatakrishnan Sowrirajan

+1. Interesting indeed :) Regards Venkata krishnan On Mon, Sep 14, 2020 at 11:14 AM Xingbo Jiang wrote: > +1 This is an exciting new feature! > > On Sun, Sep 13, 2020 at 8:00 PM Mridul Muralidharan > wrote: > >> Hi, >> >> I'd like to call for a vote on SPARK-30602 - SPIP: Support push-based

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

2020-09-14 Thread Tom Graves

+1 Tom On Sunday, September 13, 2020, 10:00:05 PM CDT, Mridul Muralidharan wrote: Hi, I'd like to call for a vote on SPARK-30602 - SPIP: Support push-based shuffle to improve shuffle efficiency.Please take a look at: - SPIP jira: https://issues.apache.org/jira/browse/SPARK-30602

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

2020-09-14 Thread Chandni Singh

+1 Chandni On Mon, Sep 14, 2020 at 11:41 AM Tom Graves wrote: > +1 > > Tom > > On Sunday, September 13, 2020, 10:00:05 PM CDT, Mridul Muralidharan < > mri...@gmail.com> wrote: > > > Hi, > > I'd like to call for a vote on SPARK-30602 - SPIP: Support push-based > shuffle to improve shuffle

Re: How to clear spark Shuffle files

2020-09-14 Thread Edward Mitchell

We've also had some similar disk fill issues. For Java/Scala RDDs, shuffle file cleanup is done as part of the JVM garbage collection. I've noticed that if RDDs maintain references in the code, and cannot be garbage collected, then immediate shuffle files hang around. Best way to handle this is

Re: Performance of VectorizedRleValuesReader

2020-09-14 Thread Ryan Blue

Before, the input was a byte array so we could read from it directly. Now, the input is a `ByteBufferInputStream` so that Parquet can choose how to allocate buffers. For example, we use vectored reads from S3 that pull back multiple buffers in parallel. Now that the input is a stream based on

Re: How to clear spark Shuffle files

2020-09-14 Thread Holden Karau

There's a second new mechanism which uses TTL for cleanup of shuffle files. Can you share more about your use case? On Mon, Sep 14, 2020 at 1:33 PM Edward Mitchell wrote: > We've also had some similar disk fill issues. > > For Java/Scala RDDs, shuffle file cleanup is done as part of the JVM >

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

2020-09-14 Thread DB Tsai

+1 On Mon, Sep 14, 2020 at 12:30 PM Chandni Singh wrote: > +1 > > Chandni > > On Mon, Sep 14, 2020 at 11:41 AM Tom Graves > wrote: > >> +1 >> >> Tom >> >> On Sunday, September 13, 2020, 10:00:05 PM CDT, Mridul Muralidharan < >> mri...@gmail.com> wrote: >> >> >> Hi, >> >> I'd like to call for a

Re: How to clear spark Shuffle files

2020-09-14 Thread lsn248

Our use case is as follows: We repartition 6 months worth of data for each client on clientId & recordcreationdate, so that it can write one file per partition. Our partition is on client and recordcreationdate. The job fills up the disk after it process say 30 tenants out of 50. I am

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

2020-09-14 Thread Xiao Li

+1 Xiao DB Tsai 于2020年9月14日周一下午4:09写道： > +1 > > On Mon, Sep 14, 2020 at 12:30 PM Chandni Singh wrote: > >> +1 >> >> Chandni >> >> On Mon, Sep 14, 2020 at 11:41 AM Tom Graves >> wrote: >> >>> +1 >>> >>> Tom >>> >>> On Sunday, September 13, 2020, 10:00:05 PM CDT, Mridul Muralidharan < >>>

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

2020-09-14 Thread angers . zhu

+1

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

[DISCUSS] Time to evaluate "continuous mode" in SS?

Re: Performance of VectorizedRleValuesReader

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

Re: Performance of VectorizedRleValuesReader

How to clear spark Shuffle files

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

Re: How to clear spark Shuffle files

Re: Performance of VectorizedRleValuesReader

Re: How to clear spark Shuffle files

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

Re: How to clear spark Shuffle files

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

Re: [VOTE][SPARK-30602] SPIP: Support push-based shuffle to improve shuffle efficiency

17 matches

Site Navigation

Mail list logo

Footer information