[GitHub] [spark] AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489473545 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10423/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489473542 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
shahidki31 commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#discussion_r281043577 ## File path: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala ## @@ -417,6 +435,17 @@ object PowerIterationClustering extends Logging { .setK(k) .setSeed(0L) .run(points.values) -points.mapValues(p => model.predict(p)).cache() + +val predict = points.mapValues(p => model.predict(p)).cache() Review comment: modified to `.mapValues(model.predict(_))` also for Vectors.dense This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
shahidki31 commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#discussion_r281043528 ## File path: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala ## @@ -417,6 +435,17 @@ object PowerIterationClustering extends Logging { .setK(k) .setSeed(0L) .run(points.values) -points.mapValues(p => model.predict(p)).cache() + +val predict = points.mapValues(p => model.predict(p)).cache() +predict.count() +points.unpersist() +predict Review comment: Yes. removed the caching of `predict` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
shahidki31 commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#discussion_r281043541 ## File path: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala ## @@ -226,11 +226,14 @@ class PowerIterationClustering private[clustering] ( */ private def pic(w: Graph[Double, Double]): PowerIterationClusteringModel = { val v = powerIter(w, maxIterations) -val assignments = kMeans(v, k).mapPartitions({ iter => +val kMeansModel = kMeans(v, k) +val assignments = kMeansModel.mapPartitions({ iter => iter.map { case (id, cluster) => Assignment(id, cluster) } -}, preservesPartitioning = true) +}, preservesPartitioning = true).cache() +assignments.count() +kMeansModel.unpersist() Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
shahidki31 commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#discussion_r281043504 ## File path: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala ## @@ -296,12 +299,15 @@ object PowerIterationClustering extends Logging { }, mergeMsg = _ + _, TripletFields.EdgeOnly) -Graph(vD, gA.edges) - .mapTriplets( -e => e.attr / math.max(e.srcAttr, MLUtils.EPSILON), -new TripletFields(/* useSrc */ true, - /* useDst */ false, - /* useEdge */ true)) +val graph = Graph(vD, gA.edges).mapTriplets( + e => e.attr / math.max(e.srcAttr, MLUtils.EPSILON), + new TripletFields(/* useSrc */ true, +/* useDst */ false, +/* useEdge */ true)) +materialize(graph) +gA.unpersist(true) Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
shahidki31 commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#discussion_r281043493 ## File path: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala ## @@ -226,11 +226,14 @@ class PowerIterationClustering private[clustering] ( */ private def pic(w: Graph[Double, Double]): PowerIterationClusteringModel = { val v = powerIter(w, maxIterations) -val assignments = kMeans(v, k).mapPartitions({ iter => +val kMeansModel = kMeans(v, k) +val assignments = kMeansModel.mapPartitions({ iter => Review comment: Thank you @srowen for the review. Yes. I have updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489472949 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105133/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489472949 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105133/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
SparkQA removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468832 **[Test build #105133 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105133/testReport)** for PR 24531 at commit [`99396f7`](https://github.com/apache/spark/commit/99396f7d289612446744279661b79f9730d26b02). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489472946 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489472946 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
SparkQA commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489472880 **[Test build #105133 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105133/testReport)** for PR 24531 at commit [`99396f7`](https://github.com/apache/spark/commit/99396f7d289612446744279661b79f9730d26b02). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
srowen commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#discussion_r281042435 ## File path: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala ## @@ -417,6 +435,17 @@ object PowerIterationClustering extends Logging { .setK(k) .setSeed(0L) .run(points.values) -points.mapValues(p => model.predict(p)).cache() + +val predict = points.mapValues(p => model.predict(p)).cache() +predict.count() +points.unpersist() +predict Review comment: Why not just not cache `predict` here? that avoids dealing with unpersisting in the single caller above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
srowen commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#discussion_r281042329 ## File path: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala ## @@ -417,6 +435,17 @@ object PowerIterationClustering extends Logging { .setK(k) .setSeed(0L) .run(points.values) -points.mapValues(p => model.predict(p)).cache() + +val predict = points.mapValues(p => model.predict(p)).cache() Review comment: Can just be `.mapValues(model.predict)` while we're here. Same with `Vectors.dense` above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
srowen commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#discussion_r281042299 ## File path: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala ## @@ -296,12 +299,15 @@ object PowerIterationClustering extends Logging { }, mergeMsg = _ + _, TripletFields.EdgeOnly) -Graph(vD, gA.edges) - .mapTriplets( -e => e.attr / math.max(e.srcAttr, MLUtils.EPSILON), -new TripletFields(/* useSrc */ true, - /* useDst */ false, - /* useEdge */ true)) +val graph = Graph(vD, gA.edges).mapTriplets( + e => e.attr / math.max(e.srcAttr, MLUtils.EPSILON), + new TripletFields(/* useSrc */ true, +/* useDst */ false, +/* useEdge */ true)) +materialize(graph) +gA.unpersist(true) Review comment: No need to set blocking = true. Just don't specify. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
srowen commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#discussion_r281042277 ## File path: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala ## @@ -226,11 +226,14 @@ class PowerIterationClustering private[clustering] ( */ private def pic(w: Graph[Double, Double]): PowerIterationClusteringModel = { val v = powerIter(w, maxIterations) -val assignments = kMeans(v, k).mapPartitions({ iter => +val kMeansModel = kMeans(v, k) +val assignments = kMeansModel.mapPartitions({ iter => Review comment: I'm actually not sure why it calls `.mapPartitions` here. There's nothing partition-wise about it. I think this could also just be: ``` val assignments = kMeansModel.map { case (id, cluster) => Assignment(id, cluster) } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24516: [SPARK-27624][CORE] Fix CalenderInterval to show an empty interval correctly
dongjoon-hyun commented on a change in pull request #24516: [SPARK-27624][CORE] Fix CalenderInterval to show an empty interval correctly URL: https://github.com/apache/spark/pull/24516#discussion_r281042412 ## File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java ## @@ -321,6 +321,10 @@ public String toString() { appendUnit(sb, rest, "microsecond"); } +if (months == 0 && microseconds == 0) { + sb.append(" 0 microseconds"); Review comment: Thank you for review, @srowen . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
srowen commented on a change in pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#discussion_r281042454 ## File path: mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala ## @@ -226,11 +226,14 @@ class PowerIterationClustering private[clustering] ( */ private def pic(w: Graph[Double, Double]): PowerIterationClusteringModel = { val v = powerIter(w, maxIterations) -val assignments = kMeans(v, k).mapPartitions({ iter => +val kMeansModel = kMeans(v, k) +val assignments = kMeansModel.mapPartitions({ iter => iter.map { case (id, cluster) => Assignment(id, cluster) } -}, preservesPartitioning = true) +}, preservesPartitioning = true).cache() +assignments.count() +kMeansModel.unpersist() Review comment: I think we can just avoid having the result of kMeans be persisted to begin with, see below This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
shahidki31 commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489470365 cc @srowen @felixcheung Kindly review This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on a change in pull request #24398: [SPARK-27468][Core][WEBUI] BlockUpdate replication event shouldn't overwrite storage level description in the UI
shahidki31 commented on a change in pull request #24398: [SPARK-27468][Core][WEBUI] BlockUpdate replication event shouldn't overwrite storage level description in the UI URL: https://github.com/apache/spark/pull/24398#discussion_r281041842 ## File path: core/src/main/scala/org/apache/spark/status/AppStatusListener.scala ## @@ -917,8 +917,24 @@ private[spark] class AppStatusListener( // Update the block entry in the RDD info, keeping track of the deltas above so that we // can update the executor information too. liveRDDs.get(block.rddId).foreach { rdd => + if (updatedStorageLevel.isDefined) { -rdd.setStorageLevel(updatedStorageLevel.get) +// Replicated block update events will have `storageLevel.replication=1`. +// To avoid overwriting the block replicated event in the store, we need to +// have a check for whether the event is block replication or not. +// Default value of `storageInfo.replication = 1` and hence if +// `storeLevel.replication = 2`, the replicated events won't overwrite in the store. +val storageInfo = rdd.storageInfo +val isReplicatedBlockUpdateEvent = storageLevel.replication < storageInfo.replication && Review comment: Hi, This line checks the storageLevel is valid or not. https://github.com/apache/spark/blob/d9bcacf94b93fe76542b5c1fd852559075ef6faa/core/src/main/scala/org/apache/spark/status/AppStatusListener.scala#L916-L920 If not valid, then the `updatedStorageLevel` will be `None`. So, it won't come to this line (L-928). Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468512 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105132/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
SparkQA commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468832 **[Test build #105133 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105133/testReport)** for PR 24531 at commit [`99396f7`](https://github.com/apache/spark/commit/99396f7d289612446744279661b79f9730d26b02). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468728 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468729 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10422/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
SparkQA removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468392 **[Test build #105132 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105132/testReport)** for PR 24531 at commit [`fa6465d`](https://github.com/apache/spark/commit/fa6465d68fb8daa67d766d8bee7116aeb7ab41c4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
SparkQA commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468507 **[Test build #105132 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105132/testReport)** for PR 24531 at commit [`fa6465d`](https://github.com/apache/spark/commit/fa6465d68fb8daa67d766d8bee7116aeb7ab41c4). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468729 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10422/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468728 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468510 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468510 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468512 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105132/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
SparkQA commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468392 **[Test build #105132 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105132/testReport)** for PR 24531 at commit [`fa6465d`](https://github.com/apache/spark/commit/fa6465d68fb8daa67d766d8bee7116aeb7ab41c4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10421/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468303 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
shahidki31 commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468295 Jenkins, test this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468303 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins removed a comment on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468244 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468305 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10421/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
AmplabJenkins commented on issue #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531#issuecomment-489468244 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 opened a new pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution
shahidki31 opened a new pull request #24531: [SPARK-27636][MLLIB]Remove cached RDD blocks after PIC execution URL: https://github.com/apache/spark/pull/24531 ## What changes were proposed in this pull request? Test steps to reproduce: 1) bin/spark-shell ``` val dataset = spark.createDataFrame(Seq( (0L, 1L, 1.0), (1L,2L,1.0), (3L, 4L,1.0), (4L,0L,0.1))).toDF("src", "dst", "weight") val model = new PowerIterationClustering(). setMaxIter(10). setInitMode("degree"). setWeightCol("weight") val prediction = model.assignClusters(dataset).select("id", "cluster") ``` 2) Open storage tab of the UI. We can see many RDD block cached, even after running the PIC. ## How was this patch tested? Manually tested and existing UTs. Before patch: ![Screenshot from 2019-05-06 02-53-45](https://user-images.githubusercontent.com/23054875/57201033-daf61b80-6fb0-11e9-97ff-7534909ce2d3.png) After patch: ![Screenshot from 2019-05-06 03-41-04](https://user-images.githubusercontent.com/23054875/57201043-07aa3300-6fb1-11e9-855b-f63ee18ea371.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24502: [SPARK-27610][YARN] Shade netty native libraries
AmplabJenkins removed a comment on issue #24502: [SPARK-27610][YARN] Shade netty native libraries URL: https://github.com/apache/spark/pull/24502#issuecomment-489447531 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105130/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24502: [SPARK-27610][YARN] Shade netty native libraries
AmplabJenkins removed a comment on issue #24502: [SPARK-27610][YARN] Shade netty native libraries URL: https://github.com/apache/spark/pull/24502#issuecomment-489447530 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24502: [SPARK-27610][YARN] Shade netty native libraries
SparkQA removed a comment on issue #24502: [SPARK-27610][YARN] Shade netty native libraries URL: https://github.com/apache/spark/pull/24502#issuecomment-489436882 **[Test build #105130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105130/testReport)** for PR 24502 at commit [`311d98f`](https://github.com/apache/spark/commit/311d98f22bd5ffa5e3961043d4848a6483869b61). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24502: [SPARK-27610][YARN] Shade netty native libraries
AmplabJenkins commented on issue #24502: [SPARK-27610][YARN] Shade netty native libraries URL: https://github.com/apache/spark/pull/24502#issuecomment-489447530 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24502: [SPARK-27610][YARN] Shade netty native libraries
AmplabJenkins commented on issue #24502: [SPARK-27610][YARN] Shade netty native libraries URL: https://github.com/apache/spark/pull/24502#issuecomment-489447531 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105130/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24502: [SPARK-27610][YARN] Shade netty native libraries
SparkQA commented on issue #24502: [SPARK-27610][YARN] Shade netty native libraries URL: https://github.com/apache/spark/pull/24502#issuecomment-489447375 **[Test build #105130 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105130/testReport)** for PR 24502 at commit [`311d98f`](https://github.com/apache/spark/commit/311d98f22bd5ffa5e3961043d4848a6483869b61). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on a change in pull request #24516: [SPARK-27624][CORE] Fix CalenderInterval to show an empty interval correctly
srowen commented on a change in pull request #24516: [SPARK-27624][CORE] Fix CalenderInterval to show an empty interval correctly URL: https://github.com/apache/spark/pull/24516#discussion_r281030111 ## File path: common/unsafe/src/main/java/org/apache/spark/unsafe/types/CalendarInterval.java ## @@ -321,6 +321,10 @@ public String toString() { appendUnit(sb, rest, "microsecond"); } +if (months == 0 && microseconds == 0) { + sb.append(" 0 microseconds"); Review comment: This seems fine; even just "0" seems reasonable as 0 seconds, microseconds, etc are all the same This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration
AmplabJenkins removed a comment on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration URL: https://github.com/apache/spark/pull/24530#issuecomment-489442062 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105131/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration
AmplabJenkins removed a comment on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration URL: https://github.com/apache/spark/pull/24530#issuecomment-489442059 Build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration
SparkQA removed a comment on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration URL: https://github.com/apache/spark/pull/24530#issuecomment-489440334 **[Test build #105131 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105131/testReport)** for PR 24530 at commit [`d217d95`](https://github.com/apache/spark/commit/d217d95d4ea5031949ed831e8d6b64a853b79c1f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration
AmplabJenkins commented on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration URL: https://github.com/apache/spark/pull/24530#issuecomment-489442062 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/105131/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration
SparkQA commented on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration URL: https://github.com/apache/spark/pull/24530#issuecomment-489442055 **[Test build #105131 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105131/testReport)** for PR 24530 at commit [`d217d95`](https://github.com/apache/spark/commit/d217d95d4ea5031949ed831e8d6b64a853b79c1f). * This patch **fails MiMa tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `class SparkHadoopConf(conf: Configuration) ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration
AmplabJenkins commented on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration URL: https://github.com/apache/spark/pull/24530#issuecomment-489442059 Build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration
AmplabJenkins removed a comment on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration URL: https://github.com/apache/spark/pull/24530#issuecomment-489440093 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration
SparkQA commented on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration URL: https://github.com/apache/spark/pull/24530#issuecomment-489440334 **[Test build #105131 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105131/testReport)** for PR 24530 at commit [`d217d95`](https://github.com/apache/spark/commit/d217d95d4ea5031949ed831e8d6b64a853b79c1f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration
AmplabJenkins removed a comment on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration URL: https://github.com/apache/spark/pull/24530#issuecomment-489440183 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration
AmplabJenkins removed a comment on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration URL: https://github.com/apache/spark/pull/24530#issuecomment-489440190 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10420/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration
AmplabJenkins commented on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration URL: https://github.com/apache/spark/pull/24530#issuecomment-489440183 Build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration
AmplabJenkins commented on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration URL: https://github.com/apache/spark/pull/24530#issuecomment-489440190 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10420/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration
AmplabJenkins commented on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration URL: https://github.com/apache/spark/pull/24530#issuecomment-489440093 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration
Ngone51 commented on issue #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration URL: https://github.com/apache/spark/pull/24530#issuecomment-489440106 cc @jiangxb1987 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 opened a new pull request #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration
Ngone51 opened a new pull request #24530: [SPARK-27520][CORE][WIP] Introduce a global config system to replace hadoopConfiguration URL: https://github.com/apache/spark/pull/24530 ## What changes were proposed in this pull request? hadoopConf can be accessed via `SparkContext.hadoopConfiguration` from both user code and Spark internal. The configuration is mainly used to read files from hadoop-supported file system(eg. get URI/get FileSystem/add security credentials/get metastore connect url/etc.) We shall keep a global config that users can set and use that to track the hadoop configurations. This pr implements it with three main features showed below: * using ThreadLocal to track Hadoop Configuration, so that concurrent jobs could use their own Hadoop Configurations * provide set method by wrapping a Hadoop Configuration to allow user to modify Hadoop Configuration globally * provide SparkContext.withHadoopConf(){} method to allow user to modify Hadoop Configuration temporary ## How was this patch tested? TODO This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24527: [SPARK-27635][SQL] Prevent from splitting too many partitions smaller than row group size in Parquet file format
dongjoon-hyun commented on a change in pull request #24527: [SPARK-27635][SQL] Prevent from splitting too many partitions smaller than row group size in Parquet file format URL: https://github.com/apache/spark/pull/24527#discussion_r281027718 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala ## @@ -420,8 +420,12 @@ case class FileSourceScanExec( selectedPartitions: Seq[PartitionDirectory], fsRelation: HadoopFsRelation): RDD[InternalRow] = { val openCostInBytes = fsRelation.sparkSession.sessionState.conf.filesOpenCostInBytes -val maxSplitBytes = - FilePartition.maxSplitBytes(fsRelation.sparkSession, selectedPartitions) +val maxSplitBytes = relation.fileFormat match { + case _ : ParquetSource => +fsRelation.sparkSession.sessionState.conf.filesMaxPartitionBytes // parquet.block.size + case _ => +FilePartition.maxSplitBytes(fsRelation.sparkSession, selectedPartitions) +} Review comment: Hi, @LantaoJin . It would be very helpful if you provide a test case for your following claim. > Splitting RDD to too many small pieces doesn't make sense. Jobs will launch too many partitions and never complete. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24470: [SPARK-27577][MLlib] Correct thresholds downsampled in BinaryClassificationMetrics
SparkQA removed a comment on issue #24470: [SPARK-27577][MLlib] Correct thresholds downsampled in BinaryClassificationMetrics URL: https://github.com/apache/spark/pull/24470#issuecomment-489433239 **[Test build #4776 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4776/testReport)** for PR 24470 at commit [`fa2eae6`](https://github.com/apache/spark/commit/fa2eae671ceac14cc450ce2ccf23ea24aa39e184). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24470: [SPARK-27577][MLlib] Correct thresholds downsampled in BinaryClassificationMetrics
SparkQA commented on issue #24470: [SPARK-27577][MLlib] Correct thresholds downsampled in BinaryClassificationMetrics URL: https://github.com/apache/spark/pull/24470#issuecomment-489438572 **[Test build #4776 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4776/testReport)** for PR 24470 at commit [`fa2eae6`](https://github.com/apache/spark/commit/fa2eae671ceac14cc450ce2ccf23ea24aa39e184). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks when executors are lost
Ngone51 commented on issue #24462: [SPARK-26268][CORE] Do not resubmit tasks when executors are lost URL: https://github.com/apache/spark/pull/24462#issuecomment-489437889 Hi @jealous Can you give any link or source file name for that part ? I'd like to learn it more for implementation details. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24502: [SPARK-27610][YARN] Shade netty native libraries
SparkQA commented on issue #24502: [SPARK-27610][YARN] Shade netty native libraries URL: https://github.com/apache/spark/pull/24502#issuecomment-489436882 **[Test build #105130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/105130/testReport)** for PR 24502 at commit [`311d98f`](https://github.com/apache/spark/commit/311d98f22bd5ffa5e3961043d4848a6483869b61). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24502: [SPARK-27610][YARN] Shade netty native libraries
AmplabJenkins removed a comment on issue #24502: [SPARK-27610][YARN] Shade netty native libraries URL: https://github.com/apache/spark/pull/24502#issuecomment-489436744 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24502: [SPARK-27610][YARN] Shade netty native libraries
AmplabJenkins removed a comment on issue #24502: [SPARK-27610][YARN] Shade netty native libraries URL: https://github.com/apache/spark/pull/24502#issuecomment-489436745 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10419/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Ngone51 commented on issue #24497: [SPARK-27630][CORE]Stage retry causes totalRunningTasks calculation to be negative
Ngone51 commented on issue #24497: [SPARK-27630][CORE]Stage retry causes totalRunningTasks calculation to be negative URL: https://github.com/apache/spark/pull/24497#issuecomment-489436811 I agree with @squito 's opinion that running tasks in zombie TaskSet should also be counted for `ExecutorAllocationManager#totalRunningTasks`. Basing on this, I'm wondering that would it be ok if we just changing `stageIdToNumRunningTask` to `stageAttemptIdToNumRunningTask` ? As `ExecutorAllocationManager` does not really care about the stage is active or not, but care about the running tasks across the stage attempts. @squito @cxzl25 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24502: [SPARK-27610][YARN] Shade netty native libraries
AmplabJenkins commented on issue #24502: [SPARK-27610][YARN] Shade netty native libraries URL: https://github.com/apache/spark/pull/24502#issuecomment-489436745 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/10419/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24502: [SPARK-27610][YARN] Shade netty native libraries
AmplabJenkins commented on issue #24502: [SPARK-27610][YARN] Shade netty native libraries URL: https://github.com/apache/spark/pull/24502#issuecomment-489436744 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amuraru commented on issue #24502: [SPARK-27610][YARN] Shade netty native libraries
amuraru commented on issue #24502: [SPARK-27610][YARN] Shade netty native libraries URL: https://github.com/apache/spark/pull/24502#issuecomment-489436266 one last try to fix Mima. @vanzin if this won't fix the issue - I can try to constrain the fix in the yarn package only This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24470: [SPARK-27577][MLlib] Correct thresholds downsampled in BinaryClassificationMetrics
SparkQA commented on issue #24470: [SPARK-27577][MLlib] Correct thresholds downsampled in BinaryClassificationMetrics URL: https://github.com/apache/spark/pull/24470#issuecomment-489433239 **[Test build #4776 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4776/testReport)** for PR 24470 at commit [`fa2eae6`](https://github.com/apache/spark/commit/fa2eae671ceac14cc450ce2ccf23ea24aa39e184). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24470: [SPARK-27577][MLlib] Correct thresholds downsampled in BinaryClassificationMetrics
AmplabJenkins removed a comment on issue #24470: [SPARK-27577][MLlib] Correct thresholds downsampled in BinaryClassificationMetrics URL: https://github.com/apache/spark/pull/24470#issuecomment-486991359 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #24470: [SPARK-27577][MLlib] Correct thresholds downsampled in BinaryClassificationMetrics
srowen commented on issue #24470: [SPARK-27577][MLlib] Correct thresholds downsampled in BinaryClassificationMetrics URL: https://github.com/apache/spark/pull/24470#issuecomment-489433166 Ah OK I agree with you, I see the argument now. It comes from the fact that the scores are sorted _descending_. The score of each bin is currently its maximum, not minimum. The precision / recall for each bin is calculated as if all of the instances in the bin were classified as positive. This only makes sense if the score is the minimum. You might mention something to this effect in the comment in the code. Also I think this may change some test results; let's see. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] IgorBerman commented on issue #20640: [SPARK-19755][Mesos] Blacklist is always active for MesosCoarseGrainedSchedulerBackend
IgorBerman commented on issue #20640: [SPARK-19755][Mesos] Blacklist is always active for MesosCoarseGrainedSchedulerBackend URL: https://github.com/apache/spark/pull/20640#issuecomment-489430972 @swevrywhere this PR is not merged, so the problem persist(you can always create custom distro) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #24529: [SPARK-27634][Structured Streaming] deleteCheckpointOnStop should be configurable
HeartSaVioR commented on issue #24529: [SPARK-27634][Structured Streaming] deleteCheckpointOnStop should be configurable URL: https://github.com/apache/spark/pull/24529#issuecomment-489429213 Migrating my own comment here again: `deleteCheckpointOnStop` should be true only when the query uses temporary checkpoint. For other case, you can remove checkpoint directory manually if needed, as it is normal to not delete checkpoint for streaming query and let next run continues previous one. Do you have any actual use case for this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] IgorBerman commented on issue #10921: [SPARK-12265][Mesos] Spark calls System.exit inside driver instead of throwing exception
IgorBerman commented on issue #10921: [SPARK-12265][Mesos] Spark calls System.exit inside driver instead of throwing exception URL: https://github.com/apache/spark/pull/10921#issuecomment-489428542 @dragos @andrewor14 @srowen I believe I have some corner case connected to this fix: Suppose I have spark driver embedded in java service and mesosDriver.run() exits for any reason(in my case due to temporary un-availability of external shuffle service on one of the worker nodes) currently there is no way to get this information besides checking if "-mesos-driver" exists somehow. I.e. the initialisation sequence went smoothly but due to dynamic allocation quirks mesos-driver exits(so countDown latch will be released already at this point) I'm wondering if I can get this signal with some spark listener? Should I create additional jira for this? WDYT? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24529: [SPARK-27634][Structured Streaming] deleteCheckpointOnStop should be configurable
AmplabJenkins commented on issue #24529: [SPARK-27634][Structured Streaming] deleteCheckpointOnStop should be configurable URL: https://github.com/apache/spark/pull/24529#issuecomment-489420366 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24529: [SPARK-27634][Structured Streaming] deleteCheckpointOnStop should be configurable
AmplabJenkins removed a comment on issue #24529: [SPARK-27634][Structured Streaming] deleteCheckpointOnStop should be configurable URL: https://github.com/apache/spark/pull/24529#issuecomment-489420192 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gentlewangyu commented on issue #24529: [SPARK-27634][Structured Streaming] deleteCheckpointOnStop should be configurable
gentlewangyu commented on issue #24529: [SPARK-27634][Structured Streaming] deleteCheckpointOnStop should be configurable URL: https://github.com/apache/spark/pull/24529#issuecomment-489420266 Now the streaming app is end, we can only manually delete the checkpoint file. deleteCheckpointOnStop should be configurable , we can choose any one This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24529: [SPARK-27634][Structured Streaming] deleteCheckpointOnStop should be configurable
AmplabJenkins removed a comment on issue #24529: [SPARK-27634][Structured Streaming] deleteCheckpointOnStop should be configurable URL: https://github.com/apache/spark/pull/24529#issuecomment-489420152 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24529: [SPARK-27634][Structured Streaming] deleteCheckpointOnStop should be configurable
AmplabJenkins commented on issue #24529: [SPARK-27634][Structured Streaming] deleteCheckpointOnStop should be configurable URL: https://github.com/apache/spark/pull/24529#issuecomment-489420192 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24529: [SPARK-27634][Structured Streaming] deleteCheckpointOnStop should be configurable
AmplabJenkins commented on issue #24529: [SPARK-27634][Structured Streaming] deleteCheckpointOnStop should be configurable URL: https://github.com/apache/spark/pull/24529#issuecomment-489420152 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ConeyLiu commented on issue #24278: [SPARK-27350][SQL] Support create table on data source V2
ConeyLiu commented on issue #24278: [SPARK-27350][SQL] Support create table on data source V2 URL: https://github.com/apache/spark/pull/24278#issuecomment-489419879 Hi @uncleGen, I'm sorry for the later reply. Will address the comments recently. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gentlewangyu opened a new pull request #24529: [SPARK-27634][Structured Streaming] deleteCheckpointOnStop should be configurable
gentlewangyu opened a new pull request #24529: [SPARK-27634][Structured Streaming] deleteCheckpointOnStop should be configurable URL: https://github.com/apache/spark/pull/24529 ## What changes were proposed in this pull request? we need to delete checkpoint file after running the stream application multiple times, so deleteCheckpointOnStop should be configurable ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shishaochen commented on issue #24470: [SPARK-27577][MLlib] Correct thresholds downsampled in BinaryClassificationMetrics
shishaochen commented on issue #24470: [SPARK-27577][MLlib] Correct thresholds downsampled in BinaryClassificationMetrics URL: https://github.com/apache/spark/pull/24470#issuecomment-489418353 @srowen Get your point! Actually, if we choose score of the last element in each chunk as threshold, the calculated Recall, Precision, FMeasure on each threshold are exactly the same as those when no sampling (`numBins=0`). In other words, **they are accurate metrics**. The only difference is the count of thresholds when printing precision/recall/f1 curve compared to downsampling. Thus, why not return the correct metrics of full data set but approximate values? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #24528: deleteCheckpointOnStop should be configurable
HyukjinKwon commented on issue #24528: deleteCheckpointOnStop should be configurable URL: https://github.com/apache/spark/pull/24528#issuecomment-489417143 Please file a JIRA and target it to the master. Please review http://spark.apache.org/contributing.html before opening a pull request. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #24528: deleteCheckpointOnStop should be configurable
HyukjinKwon closed pull request #24528: deleteCheckpointOnStop should be configurable URL: https://github.com/apache/spark/pull/24528 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #24524: Improve inner merge logic
HyukjinKwon closed pull request #24524: Improve inner merge logic URL: https://github.com/apache/spark/pull/24524 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #24515: [SPARK-14083][WIP] Basic bytecode analyzer to speed up Datasets
HeartSaVioR commented on issue #24515: [SPARK-14083][WIP] Basic bytecode analyzer to speed up Datasets URL: https://github.com/apache/spark/pull/24515#issuecomment-489416954 In some APIs there's no choice but leverages typed API, and for structured streaming (flat)MapGroupsWithState does, which is recommended on all stateful use cases except basic cases. Would it be better if we present untyped API as well for APIs which are only available in typed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #24528: deleteCheckpointOnStop should be configurable
HeartSaVioR commented on issue #24528: deleteCheckpointOnStop should be configurable URL: https://github.com/apache/spark/pull/24528#issuecomment-489415916 deleteCheckpointOnStop should be true only when the query uses temporary checkpoint. For other case, you can remove checkpoint directory manually if needed. What's actual use case for this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on issue #24372: [SPARK-27462][SQL] Enhance insert into hive table that could choose some columns in target table flexibly.
beliefer commented on issue #24372: [SPARK-27462][SQL] Enhance insert into hive table that could choose some columns in target table flexibly. URL: https://github.com/apache/spark/pull/24372#issuecomment-489413223 cc @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24528: deleteCheckpointOnStop should be configurable
AmplabJenkins removed a comment on issue #24528: deleteCheckpointOnStop should be configurable URL: https://github.com/apache/spark/pull/24528#issuecomment-489412343 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24528: deleteCheckpointOnStop should be configurable
AmplabJenkins commented on issue #24528: deleteCheckpointOnStop should be configurable URL: https://github.com/apache/spark/pull/24528#issuecomment-489412484 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24528: deleteCheckpointOnStop should be configurable
AmplabJenkins commented on issue #24528: deleteCheckpointOnStop should be configurable URL: https://github.com/apache/spark/pull/24528#issuecomment-489412343 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24528: deleteCheckpointOnStop should be configurable
AmplabJenkins removed a comment on issue #24528: deleteCheckpointOnStop should be configurable URL: https://github.com/apache/spark/pull/24528#issuecomment-489412313 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24528: deleteCheckpointOnStop should be configurable
AmplabJenkins commented on issue #24528: deleteCheckpointOnStop should be configurable URL: https://github.com/apache/spark/pull/24528#issuecomment-489412313 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org