Re: [DISCUSS] SPIP: Better Spark UI scalability and Driver stability for large applications

2022-11-16 Thread Gengliang Wang
With the positive feedback from Mridul and Wenchen, I will officially start
the vote.

On Tue, Nov 15, 2022 at 8:57 PM Wenchen Fan  wrote:

> This looks great! UI stability/scalability has been a pain point for a
> long time.
>
> On Sat, Nov 12, 2022 at 5:24 AM Gengliang Wang  wrote:
>
>> Hi Everyone,
>>
>> I want to discuss the "Better Spark UI scalability and Driver stability
>> for large applications" proposal. Please find the links below:
>>
>> *JIRA* - https://issues.apache.org/jira/browse/SPARK-41053
>> *SPIP Document* -
>> https://docs.google.com/document/d/1cuKnFwlTodyVhUQPMuakq2YDaLH05jaY9FRu_aD1zMo/edit?usp=sharing
>>
>> *Excerpt from the document: *
>>
>> After SPARK-18085 ,
>> the Spark history server(SHS) becomes more scalable for processing large
>> applications by supporting a persistent KV-store(LevelDB/RocksDB) as the
>> storage layer.
>>
>> As for the live Spark UI, all the data is still stored in memory, which
>> can bring memory pressures to the Spark driver for large applications.
>>
>> For better Spark UI scalability and Driver stability, I propose to
>>
>>-
>>
>>Support storing all the UI data in a persistent KV store.
>>RocksDB/LevelDB provides low memory overhead. Their write/read performance
>>is fast enough to serve the workloads of live UI. Spark UI can retain more
>>data with the new backend, while SHS can leverage it to fasten its 
>> startup.
>>- Support a new Protobuf serializer for all the UI data. The new
>>serializer is supposed to be faster, according to benchmarks. It will be
>>the default serializer for the persistent KV store of live UI.
>>
>>
>>
>>
>> I appreciate any suggestions you can provide,
>> Gengliang
>>
>


Re: [DISCUSS] SPIP: Better Spark UI scalability and Driver stability for large applications

2022-11-15 Thread Wenchen Fan
This looks great! UI stability/scalability has been a pain point for a long
time.

On Sat, Nov 12, 2022 at 5:24 AM Gengliang Wang  wrote:

> Hi Everyone,
>
> I want to discuss the "Better Spark UI scalability and Driver stability
> for large applications" proposal. Please find the links below:
>
> *JIRA* - https://issues.apache.org/jira/browse/SPARK-41053
> *SPIP Document* -
> https://docs.google.com/document/d/1cuKnFwlTodyVhUQPMuakq2YDaLH05jaY9FRu_aD1zMo/edit?usp=sharing
>
> *Excerpt from the document: *
>
> After SPARK-18085 ,
> the Spark history server(SHS) becomes more scalable for processing large
> applications by supporting a persistent KV-store(LevelDB/RocksDB) as the
> storage layer.
>
> As for the live Spark UI, all the data is still stored in memory, which
> can bring memory pressures to the Spark driver for large applications.
>
> For better Spark UI scalability and Driver stability, I propose to
>
>-
>
>Support storing all the UI data in a persistent KV store.
>RocksDB/LevelDB provides low memory overhead. Their write/read performance
>is fast enough to serve the workloads of live UI. Spark UI can retain more
>data with the new backend, while SHS can leverage it to fasten its startup.
>- Support a new Protobuf serializer for all the UI data. The new
>serializer is supposed to be faster, according to benchmarks. It will be
>the default serializer for the persistent KV store of live UI.
>
>
>
>
> I appreciate any suggestions you can provide,
> Gengliang
>