[GitHub] [spark] HeartSaVioR commented on pull request #28830: [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates

2020-06-15 Thread GitBox


HeartSaVioR commented on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-643942982


   Btw now we know it is broken in Spark 3.0.0, and we will fix it again in 
Spark 3.0.1. Do we have some best practice to follow on guiding such change to 
end users?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #28830: [SPARK-31990][SS] Use toSet.toSeq in Dataset.dropDuplicates

2020-06-14 Thread GitBox


HeartSaVioR commented on pull request #28830:
URL: https://github.com/apache/spark/pull/28830#issuecomment-64318


   How we plan to consolidate both? How we will write JIRA title/description 
and PR title/description? Which is the type of the consolidated issue? Is the 
consolidated issue a blocker?
   
   Things would be simple if we merge the partial fix as it is, and spend our 
efforts to discuss how to guide known issue - this is one of candidates for 
Spark 3.0.0. This is clearly a bugfix which is a "blocker" preventing some of 
end users migrate to Spark 3.0.0.
   
   It'd be no harm for #28707 to wait for this patch to be merged, and rebase 
to fix the test failure.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org