Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/21698
  
    Ok,  it seems like the proposal @squito had to sort on the 
binary/serialized data seems like at least a good short term solution.  any 
sorting is going to definitely add overhead but at least its not 
dataloss/corruption.    Did anyone see issues with that?
    
    another solution would be to have another partitioner that somehow deals 
with the skew, not sure on the details of that though as it might need sampling 
or something else to work.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to