Hi Athanasios

Thanks for the details.  Since I believe this is Spark streaming, the all
important indicator is the Processing Time defined by Spark GUI as Time
taken to process all jobs of a batch versus the batch interval. The Scheduling
Delay and the Total Delay are additional indicators of health.  Do you have
these stats for both versions?


cheers



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 27 Jan 2023 at 09:03, Athanasios Kordelas <
athanasioskorde...@gmail.com> wrote:

> Hi Mich,
>
> Thank you for your reply. For my benchmark test, I'm only using one
> executor with two cores in both cases.
> I had created a large image with multiple UI screenshots a few days ago,
> so I'm attaching it (please zoom in).
> You can see spark 3 on the left side versus spark 2 on the right.
>
> I can collect more info by triggering new runs if this would help, but I'm
> not sure what is the best way to provide you with all the matrix data,
> maybe from logs?
>
> --Thanasis
>
>
>
> Στις Πέμ 26 Ιαν 2023 στις 10:03 μ.μ., ο/η Mich Talebzadeh <
> mich.talebza...@gmail.com> έγραψε:
>
>> You have given some stats, 5-10 sec vs 60 sec with set-up and systematics
>> being the same for both tests?
>>
>> so let us assume we see with 3.3.1, <10> sec average time versus 60 with
>> the older spark 2.x
>>
>> so that gives us (60-10) = 50*100/60) ~ 80% gain
>>
>> However, that would not tell us why the 3.3,.1 excels in detail. For that
>> you need to look at the Spark GUI matrix.
>>
>> HTH
>>
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Thu, 26 Jan 2023 at 16:51, Mich Talebzadeh <mich.talebza...@gmail.com>
>> wrote:
>>
>>> Please qualify what you mean by* extreme improvements*?
>>>
>>> What matrix are you using?
>>>
>>> HTH
>>>
>>>
>>>
>>>    view my Linkedin profile
>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Thu, 26 Jan 2023 at 13:06, Athanasios Kordelas <
>>> athanasioskorde...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm running some tests on spark streaming (not structured) for my PhD,
>>>> and I'm seeing an extreme improvement when using Spark/Kafka 3.3.1 versus
>>>> Spark/Kafka 2.4.8/Kafka 2.7.0.
>>>>
>>>> My (scala) application code is as follows:
>>>>
>>>> *KafkaStream* => foreachRDD => mapPartitions => repartition => GroupBy
>>>> => .*agg(expr("percentile(value, array(0.25, 0.5, 0.75))")) *=> take(2)
>>>>
>>>> In short, a two core executor could process 600.000 rows of
>>>> key/value pairs in 60 seconds with Spark 2.x, while now, with Spark 3.3.1,
>>>> the same processing (same code) can be achieved in 5-10 seconds.
>>>>
>>>> @apache-spark, @spark-streaming, @spark-mllib, @spark-ml, is there a
>>>> significant optimization that could explain this improvement?
>>>>
>>>> BR,
>>>> Athanasios Kordelas
>>>>
>>>>
>
> --
> Athanasios Kordelas
> Staff SW Engineer
> T: +30 6972053674 | Skype: athanasios.korde...@outlook.com.gr
> athanasioskorde...@gmail.com
>
>

Reply via email to