Re: Help with Shuffle Read performance

2022-09-30 Thread Igor Calabria
Thanks a lot for the answers foks. It turned out that spark was just IOPs starved. Using better disks solved my issue, so nothing related to kubernetes at all. Have a nice weekend everyone On Fri, Sep 30, 2022 at 4:27 PM Artemis User wrote: > The reduce phase is always more resource-intensive

Re: Help with Shuffle Read performance

2022-09-30 Thread Artemis User
The reduce phase is always more resource-intensive than the map phase.  Couple of suggestions you may want to consider: 1. Setting the number of partitions to 18K may be way too high (the default number is only 200).  You may want to just use the default and the scheduler will

Re: Help with Shuffle Read performance

2022-09-30 Thread Leszek Reimus
Hi Sungwoo, I tend to agree - for a new system, I would probably not go that route, as Spark on Kubernetes is getting there and can do a lot already. Issue I mentioned before can be fixed with proper node fencing - it is a typical stateful set problem Kubernetes has without fencing - node goes

Re: Help with Shuffle Read performance

2022-09-30 Thread Sungwoo Park
Hi Leszek, For running YARN on Kubernetes and then running Spark on YARN, is there a lot of overhead for maintaining YARN on Kubernetes? I thought people usually want to move from YARN to Kubernetes because of the overhead of maintaining Hadoop. Thanks, --- Sungwoo On Fri, Sep 30, 2022 at