Re: repartition(n) should be deprecated/alerted

2022-06-22 Thread Igor Berman
I'd argue it's strange and unexpected. I understand there is precision issues here, but I'm fine that result might be slightly different each time for the specific column What I'm not expecting(as end user for sure) is that presumably trivial computation might under retries scenarios cause few hund

Re: repartition(n) should be deprecated/alerted

2022-06-22 Thread Sean Owen
Eh, there is a huge caveat - you are making your input non-deterministic, where determinism is assumed. I don't think that supports such a drastic statement. On Wed, Jun 22, 2022 at 12:39 PM Igor Berman wrote: > Hi All > tldr; IMHO repartition(n) should be deprecated or red-flagged, so that > ev

repartition(n) should be deprecated/alerted

2022-06-22 Thread Igor Berman
Hi All tldr; IMHO repartition(n) should be deprecated or red-flagged, so that everybody will understand consequences of usage of this method Following conversation in https://issues.apache.org/jira/browse/SPARK-38388 (still relevant for recent versions of spark) I think it's very important to mark