+1
On Mon, Sep 14, 2020 at 6:39 PM angers.zhu wrote:
> +1
>
> angers.zhu
> angers@gmail.com
>
>
Hi devs,
It was Spark 2.3 in Feb 2018 which introduced continuous mode in Structured
Streaming as "experimental".
Now we are here at 2.5 years after its release - I feel it would be a good
time to evaluate the mode, whether the mode has been widely used or not,
and the mode has been making
I See.
In our case, we use SingleBufferInputStream, so time spent is duplicating
the backing byte buffer.
Thanks
Chang
Ryan Blue 于2020年9月15日周二 上午2:04写道:
> Before, the input was a byte array so we could read from it directly. Now,
> the input is a `ByteBufferInputStream` so that Parquet can
+1
Will positively improve the performance and reliability of spark...
Looking fwd to this..
Regards
Kalyan.
On Tue, Sep 15, 2020, 9:26 AM Joseph Torres
wrote:
> +1
>
> On Mon, Sep 14, 2020 at 6:39 PM angers.zhu wrote:
>
>> +1
>>
>> angers.zhu
>> angers@gmail.com
>>
>>
Ryan do you happen to have any opinion there? that particular section
was introduced in the Parquet 1.10 update:
https://github.com/apache/spark/commit/cac9b1dea1bb44fa42abf77829c05bf93f70cf20
It looks like it didn't use to make a ByteBuffer each time, but read from in.
On Sun, Sep 13, 2020 at
Hi,
I have a long running application and spark seem to fill up the disk with
shuffle files. Eventually the job fails running out of disk space. Is there
a way for me to clean the shuffle files ?
Thanks
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
+1 This is an exciting new feature!
On Sun, Sep 13, 2020 at 8:00 PM Mridul Muralidharan
wrote:
> Hi,
>
> I'd like to call for a vote on SPARK-30602 - SPIP: Support push-based
> shuffle to improve shuffle efficiency.
> Please take a look at:
>
>- SPIP jira:
+1. Interesting indeed :)
Regards
Venkata krishnan
On Mon, Sep 14, 2020 at 11:14 AM Xingbo Jiang wrote:
> +1 This is an exciting new feature!
>
> On Sun, Sep 13, 2020 at 8:00 PM Mridul Muralidharan
> wrote:
>
>> Hi,
>>
>> I'd like to call for a vote on SPARK-30602 - SPIP: Support push-based
+1
Tom
On Sunday, September 13, 2020, 10:00:05 PM CDT, Mridul Muralidharan
wrote:
Hi,
I'd like to call for a vote on SPARK-30602 - SPIP: Support push-based shuffle
to improve shuffle efficiency.Please take a look at:
- SPIP jira: https://issues.apache.org/jira/browse/SPARK-30602
+1
Chandni
On Mon, Sep 14, 2020 at 11:41 AM Tom Graves
wrote:
> +1
>
> Tom
>
> On Sunday, September 13, 2020, 10:00:05 PM CDT, Mridul Muralidharan <
> mri...@gmail.com> wrote:
>
>
> Hi,
>
> I'd like to call for a vote on SPARK-30602 - SPIP: Support push-based
> shuffle to improve shuffle
We've also had some similar disk fill issues.
For Java/Scala RDDs, shuffle file cleanup is done as part of the JVM
garbage collection. I've noticed that if RDDs maintain references in the
code, and cannot be garbage collected, then immediate shuffle files hang
around.
Best way to handle this is
Before, the input was a byte array so we could read from it directly. Now,
the input is a `ByteBufferInputStream` so that Parquet can choose how to
allocate buffers. For example, we use vectored reads from S3 that pull back
multiple buffers in parallel.
Now that the input is a stream based on
There's a second new mechanism which uses TTL for cleanup of shuffle files.
Can you share more about your use case?
On Mon, Sep 14, 2020 at 1:33 PM Edward Mitchell wrote:
> We've also had some similar disk fill issues.
>
> For Java/Scala RDDs, shuffle file cleanup is done as part of the JVM
>
+1
On Mon, Sep 14, 2020 at 12:30 PM Chandni Singh wrote:
> +1
>
> Chandni
>
> On Mon, Sep 14, 2020 at 11:41 AM Tom Graves
> wrote:
>
>> +1
>>
>> Tom
>>
>> On Sunday, September 13, 2020, 10:00:05 PM CDT, Mridul Muralidharan <
>> mri...@gmail.com> wrote:
>>
>>
>> Hi,
>>
>> I'd like to call for a
Our use case is as follows:
We repartition 6 months worth of data for each client on clientId &
recordcreationdate, so that it can write one file per partition. Our
partition is on client and recordcreationdate.
The job fills up the disk after it process say 30 tenants out of 50. I am
+1
Xiao
DB Tsai 于2020年9月14日周一 下午4:09写道:
> +1
>
> On Mon, Sep 14, 2020 at 12:30 PM Chandni Singh wrote:
>
>> +1
>>
>> Chandni
>>
>> On Mon, Sep 14, 2020 at 11:41 AM Tom Graves
>> wrote:
>>
>>> +1
>>>
>>> Tom
>>>
>>> On Sunday, September 13, 2020, 10:00:05 PM CDT, Mridul Muralidharan <
>>>
+1
17 matches
Mail list logo