Re: AQE effectiveness

2020-08-20 Thread Koert Kuipers
i see. it makes sense to maximize re-use of cached data. i didn't realize we have two potentially conflicting goals here. On Thu, Aug 20, 2020 at 12:41 PM Maryann Xue wrote: > AQE has been turned off deliberately so that the `outputPartitioning` of > the cached relation won't be changed by AQE

AQE effectiveness

2020-08-20 Thread Koert Kuipers
we tend to have spark.sql.shuffle.partitions set very high by default simply because some jobs need it to be high and it's easier to then just set the default high instead of having people tune it manually per job. the main downside is lots of part files which leads to pressure on the driver, and

Re: AQE effectiveness

2020-08-20 Thread Maryann Xue
AQE has been turned off deliberately so that the `outputPartitioning` of the cached relation won't be changed by AQE partition coalescing or skew join optimization and the outputPartitioning can potentially be used by relations built on top of the cache. On a second thought, we should probably

Re: AQE effectiveness

2020-08-20 Thread Maryann Xue
No. The worst case of enabling AQE in cached data is not losing the opportunity of using/reusing the cache, but rather just an extra shuffle if the outputPartitioning happens to match without AQE and not match after AQE. The chance of this happening is rather low. On Thu, Aug 20, 2020 at 12:09 PM

Re: Running K8s integration tests for changes in core?

2020-08-20 Thread shane knapp ☠
fyi, i won't be making this change until the 1st week of september. i'll be out, off the grid all next week! :) i will send an announcement out tomorrow on how to contact my team here @ uc berkeley if jenkins goes down. shane On Thu, Aug 20, 2020 at 4:40 AM Prashant Sharma wrote: > Another

Re: Running K8s integration tests for changes in core?

2020-08-20 Thread Holden Karau
Sounds good, thanks for the heads up. I hope you get some time to relax :) On Thu, Aug 20, 2020 at 2:26 PM shane knapp ☠ wrote: > fyi, i won't be making this change until the 1st week of september. i'll > be out, off the grid all next week! :) > > i will send an announcement out tomorrow on

[build system] shane out all next week (aug 22-29), support instructions

2020-08-20 Thread shane knapp ☠
i will be disappearing off in to the wilderness for a few days of backpacking, and am handing off basic support duties to my team. if, and only if, jenkins goes down, please email research-supp...@cs.berkeley.edu and open a ticket. if you open a ticket, please let dev@ know to minimize the

Re: [PySpark] Revisiting PySpark type annotations

2020-08-20 Thread Hyukjin Kwon
Yeah, we had a short meeting. I had to check a few other things so some delays happened. I will share soon. 2020년 8월 20일 (목) 오후 7:14, Driesprong, Fokko 님이 작성: > Hi Maciej, Hyukjin, > > Did you find any time to discuss adding the types to the Python > repository? Would love to know what came out

Re: [PySpark] Revisiting PySpark type annotations

2020-08-20 Thread Driesprong, Fokko
Hi Maciej, Hyukjin, Did you find any time to discuss adding the types to the Python repository? Would love to know what came out of it. Cheers, Fokko Op wo 5 aug. 2020 om 10:14 schreef Driesprong, Fokko : > Mostly echoing stuff that we've discussed in >

Re: Running K8s integration tests for changes in core?

2020-08-20 Thread Prashant Sharma
Another option is, if we could have something like "presubmit" PR build. In other words, running the entire 4 H + K8s integration on each commit pushed is too much at the same time and there are chances that one thing can inadvertently affect other components(as you just said). A presubmit(which

Re: [PySpark] Revisiting PySpark type annotations

2020-08-20 Thread Driesprong, Fokko
No worries, thanks for the update! Op do 20 aug. 2020 om 12:50 schreef Hyukjin Kwon > Yeah, we had a short meeting. I had to check a few other things so some > delays happened. I will share soon. > > 2020년 8월 20일 (목) 오후 7:14, Driesprong, Fokko 님이 작성: > >> Hi Maciej, Hyukjin, >> >> Did you find