[GitHub] [spark] AmplabJenkins removed a comment on pull request #28960: [SPARK-32140][ML][PySpark] Add training summary to FMClassificationModel

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28960:
URL: https://github.com/apache/spark/pull/28960#issuecomment-658006936







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] huaxingao commented on a change in pull request #28960: [SPARK-32140][ML][PySpark] Add training summary to FMClassificationModel

2020-07-14 Thread GitBox


huaxingao commented on a change in pull request #28960:
URL: https://github.com/apache/spark/pull/28960#discussion_r454144678



##
File path: 
mllib/src/main/scala/org/apache/spark/mllib/optimization/GradientDescent.scala
##
@@ -226,45 +239,48 @@ object GradientDescent extends Logging {
 
 var converged = false // indicates whether converged based on 
convergenceTol
 var i = 1
-while (!converged && i <= numIterations) {
-  val bcWeights = data.context.broadcast(weights)
-  // Sample a subset (fraction miniBatchFraction) of the total data
-  // compute and sum up the subgradients on this subset (this is one 
map-reduce)
-  val (gradientSum, lossSum, miniBatchSize) = data.sample(false, 
miniBatchFraction, 42 + i)
-.treeAggregate((BDV.zeros[Double](n), 0.0, 0L))(
-  seqOp = (c, v) => {
-// c: (grad, loss, count), v: (label, features)
-val l = gradient.compute(v._2, v._1, bcWeights.value, 
Vectors.fromBreeze(c._1))
-(c._1, c._2 + l, c._3 + 1)
-  },
-  combOp = (c1, c2) => {
-// c: (grad, loss, count)
-(c1._1 += c2._1, c1._2 + c2._2, c1._3 + c2._3)
-  })
-  bcWeights.destroy()
-
-  if (miniBatchSize > 0) {
-/**
- * lossSum is computed using the weights from the previous iteration
- * and regVal is the regularization value computed in the previous 
iteration as well.
- */
-stochasticLossHistory += lossSum / miniBatchSize + regVal
-val update = updater.compute(
-  weights, Vectors.fromBreeze(gradientSum / miniBatchSize.toDouble),
-  stepSize, i, regParam)
-weights = update._1
-regVal = update._2
-
-previousWeights = currentWeights
-currentWeights = Some(weights)
-if (previousWeights != None && currentWeights != None) {
-  converged = isConverged(previousWeights.get,
-currentWeights.get, convergenceTol)
+breakable {
+  while (i <= numIterations + 1) {
+val bcWeights = data.context.broadcast(weights)
+// Sample a subset (fraction miniBatchFraction) of the total data
+// compute and sum up the subgradients on this subset (this is one 
map-reduce)
+val (gradientSum, lossSum, miniBatchSize) = data.sample(false, 
miniBatchFraction, 42 + i)
+  .treeAggregate((BDV.zeros[Double](n), 0.0, 0L))(
+seqOp = (c, v) => {

Review comment:
   Fixed. Thanks!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28960: [SPARK-32140][ML][PySpark] Add training summary to FMClassificationModel

2020-07-14 Thread GitBox


SparkQA commented on pull request #28960:
URL: https://github.com/apache/spark/pull/28960#issuecomment-658009295







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel

2020-07-14 Thread GitBox


SparkQA commented on pull request #29088:
URL: https://github.com/apache/spark/pull/29088#issuecomment-658009294


   **[Test build #125807 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125807/testReport)**
 for PR 29088 at commit 
[`6111a0a`](https://github.com/apache/spark/commit/6111a0a495fc1c0650a472d985ea221f8008f81f).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-07-14 Thread GitBox


SparkQA commented on pull request #28917:
URL: https://github.com/apache/spark/pull/28917#issuecomment-658009292


   **[Test build #125808 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125808/testReport)**
 for PR 28917 at commit 
[`ec0d8d0`](https://github.com/apache/spark/commit/ec0d8d00b64662343dc6b3945dc5999343b699a7).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #29088:
URL: https://github.com/apache/spark/pull/29088#issuecomment-658009403







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster

2020-07-14 Thread GitBox


SparkQA commented on pull request #28939:
URL: https://github.com/apache/spark/pull/28939#issuecomment-658009291


   **[Test build #125803 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125803/testReport)**
 for PR 28939 at commit 
[`449df2b`](https://github.com/apache/spark/commit/449df2b92e5ad0dac6ea8dd83233450946a39df2).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table

2020-07-14 Thread GitBox


SparkQA commented on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-658009289


   **[Test build #125805 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125805/testReport)**
 for PR 28901 at commit 
[`9b11aac`](https://github.com/apache/spark/commit/9b11aace28be8169e8eff1ce61810bc8250fc37d).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28960: [SPARK-32140][ML][PySpark] Add training summary to FMClassificationModel

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #28960:
URL: https://github.com/apache/spark/pull/28960#issuecomment-658009391







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox


SparkQA commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658009287


   **[Test build #125806 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125806/testReport)**
 for PR 28708 at commit 
[`5a0cd2a`](https://github.com/apache/spark/commit/5a0cd2abd316aacc601b9e8fa6e1406b67c55fb7).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `public final class MapOutputCommitMessage `
 * `  case class IsExecutorAlive(executorId: String) extends 
CoarseGrainedClusterMessage`
 * `sealed trait LogisticRegressionSummary extends ClassificationSummary `
 * `sealed trait RandomForestClassificationSummary extends 
ClassificationSummary `
 * `class _ClassificationSummary(JavaWrapper):`
 * `class _TrainingSummary(JavaWrapper):`
 * `class _BinaryClassificationSummary(_ClassificationSummary):`
 * `class LinearSVCModel(_JavaClassificationModel, _LinearSVCParams, 
JavaMLWritable, JavaMLReadable,`
 * `class LinearSVCSummary(_BinaryClassificationSummary):`
 * `class LinearSVCTrainingSummary(LinearSVCSummary, _TrainingSummary):`
 * `class LogisticRegressionSummary(_ClassificationSummary):`
 * `class LogisticRegressionTrainingSummary(LogisticRegressionSummary, 
_TrainingSummary):`
 * `class BinaryLogisticRegressionSummary(_BinaryClassificationSummary,`
 * `class RandomForestClassificationSummary(_ClassificationSummary):`
 * `class 
RandomForestClassificationTrainingSummary(RandomForestClassificationSummary,`
 * `class 
BinaryRandomForestClassificationSummary(_BinaryClassificationSummary):`
 * `class 
BinaryRandomForestClassificationTrainingSummary(BinaryRandomForestClassificationSummary,`
 * `  class DisableHints(conf: SQLConf) extends RemoveAllHints(conf: 
SQLConf) `
 * `case class WithFields(`
 * `case class Hour(child: Expression, timeZoneId: Option[String] = None) 
extends GetTimeField `
 * `case class Minute(child: Expression, timeZoneId: Option[String] = None) 
extends GetTimeField `
 * `case class Second(child: Expression, timeZoneId: Option[String] = None) 
extends GetTimeField `
 * `trait GetDateField extends UnaryExpression with ImplicitCastInputTypes 
with NullIntolerant `
 * `case class DayOfYear(child: Expression) extends GetDateField `
 * `case class SecondsToTimestamp(child: Expression) extends 
UnaryExpression`
 * `case class Year(child: Expression) extends GetDateField `
 * `case class YearOfWeek(child: Expression) extends GetDateField `
 * `case class Quarter(child: Expression) extends GetDateField `
 * `case class Month(child: Expression) extends GetDateField `
 * `case class DayOfMonth(child: Expression) extends GetDateField `
 * `case class DayOfWeek(child: Expression) extends GetDateField `
 * `case class WeekDay(child: Expression) extends GetDateField `
 * `case class WeekOfYear(child: Expression) extends GetDateField `
 * `sealed trait UTCTimestamp extends BinaryExpression with 
ImplicitCastInputTypes with NullIntolerant `
 * `case class FromUTCTimestamp(left: Expression, right: Expression) 
extends UTCTimestamp `
 * `case class ToUTCTimestamp(left: Expression, right: Expression) extends 
UTCTimestamp `
 * `sealed abstract class MergeAction extends Expression with Unevaluable `
 * `case class DeleteAction(condition: Option[Expression]) extends 
MergeAction`
 * `trait BaseScriptTransformationExec extends UnaryExecNode `
 * `abstract class BaseScriptTransformationWriterThread(`
 * `abstract class BaseScriptTransformIOSchema extends Serializable `
 * `case class CoalesceBucketsInSortMergeJoin(conf: SQLConf) extends 
Rule[SparkPlan] `
 * `class StateStoreConf(`
 * `case class HiveScriptTransformationExec(`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation

2020-07-14 Thread GitBox


SparkQA commented on pull request #29087:
URL: https://github.com/apache/spark/pull/29087#issuecomment-658009286


   **[Test build #125797 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125797/testReport)**
 for PR 29087 at commit 
[`5d85160`](https://github.com/apache/spark/commit/5d85160abca388a53054551ad7ce9e48e363dcd5).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28960: [SPARK-32140][ML][PySpark] Add training summary to FMClassificationModel

2020-07-14 Thread GitBox


SparkQA removed a comment on pull request #28960:
URL: https://github.com/apache/spark/pull/28960#issuecomment-658003670


   **[Test build #125809 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125809/testReport)**
 for PR 28960 at commit 
[`0767117`](https://github.com/apache/spark/commit/07671170b7dac6227e4c1a98f58bf24f9be9ad25).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #28917:
URL: https://github.com/apache/spark/pull/28917#issuecomment-658009351







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29086: [SPARK-32292][SPARK-32252][INFRA] Run the relevant tests only in GitHub Actions

2020-07-14 Thread GitBox


HyukjinKwon commented on a change in pull request #29086:
URL: https://github.com/apache/spark/pull/29086#discussion_r454146710



##
File path: dev/run-tests.py
##
@@ -589,43 +627,74 @@ def main():
 # /home/jenkins/anaconda2/envs/py36/bin
 os.environ["PATH"] = "/home/anaconda/envs/py36/bin:" + 
os.environ.get("PATH")
 else:
-# else we're running locally and can use local settings
+# else we're running locally or Github Actions.
 build_tool = "sbt"
 hadoop_version = os.environ.get("HADOOP_PROFILE", "hadoop2.7")
 hive_version = os.environ.get("HIVE_PROFILE", "hive2.3")
-test_env = "local"
+if "GITHUB_ACTIONS" in os.environ:
+test_env = "github_actions"
+else:
+test_env = "local"
 
 print("[info] Using build tool", build_tool, "with Hadoop profile", 
hadoop_version,
   "and Hive profile", hive_version, "under environment", test_env)
 extra_profiles = get_hadoop_profiles(hadoop_version) + 
get_hive_profiles(hive_version)
 
 changed_modules = None
+test_modules = None
 changed_files = None
-should_only_test_modules = "TEST_ONLY_MODULES" in os.environ
+should_only_test_modules = opts.modules is not None
 included_tags = []
+excluded_tags = []
 if should_only_test_modules:
-str_test_modules = [m.strip() for m in 
os.environ.get("TEST_ONLY_MODULES").split(",")]
+str_test_modules = [m.strip() for m in opts.modules.split(",")]
 test_modules = [m for m in modules.all_modules if m.name in 
str_test_modules]
-# Directly uses test_modules as changed modules to apply tags and 
environments
-# as if all specified test modules are changed.
+
+# If we're running the tests in Github Actions, attempt to detect and 
test
+# only the affected modules.
+if test_env == "github_actions":
+if os.environ["GITHUB_BASE_REF"] != "":
+# Pull requests
+changed_files = identify_changed_files_from_git_commits(
+os.environ["GITHUB_SHA"], 
target_branch=os.environ["GITHUB_BASE_REF"])

Review comment:
   This is an example of the merge commit: 
https://github.com/HyukjinKwon/spark/commit/8f36ec455e19dbfb10195d872a9ccaeb2de8ceca
 at https://github.com/HyukjinKwon/spark/pull/7





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-07-14 Thread GitBox


SparkQA removed a comment on pull request #28917:
URL: https://github.com/apache/spark/pull/28917#issuecomment-658000826


   **[Test build #125808 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125808/testReport)**
 for PR 28917 at commit 
[`ec0d8d0`](https://github.com/apache/spark/commit/ec0d8d00b64662343dc6b3945dc5999343b699a7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel

2020-07-14 Thread GitBox


SparkQA removed a comment on pull request #29088:
URL: https://github.com/apache/spark/pull/29088#issuecomment-657995711


   **[Test build #125807 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125807/testReport)**
 for PR 29088 at commit 
[`6111a0a`](https://github.com/apache/spark/commit/6111a0a495fc1c0650a472d985ea221f8008f81f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28960: [SPARK-32140][ML][PySpark] Add training summary to FMClassificationModel

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28960:
URL: https://github.com/apache/spark/pull/28960#issuecomment-658009391


   Build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox


SparkQA removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-657987750


   **[Test build #125806 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125806/testReport)**
 for PR 28708 at commit 
[`5a0cd2a`](https://github.com/apache/spark/commit/5a0cd2abd316aacc601b9e8fa6e1406b67c55fb7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28939:
URL: https://github.com/apache/spark/pull/28939#issuecomment-658009729


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28917:
URL: https://github.com/apache/spark/pull/28917#issuecomment-658009351


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster

2020-07-14 Thread GitBox


SparkQA removed a comment on pull request #28939:
URL: https://github.com/apache/spark/pull/28939#issuecomment-657963838


   **[Test build #125803 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125803/testReport)**
 for PR 28939 at commit 
[`449df2b`](https://github.com/apache/spark/commit/449df2b92e5ad0dac6ea8dd83233450946a39df2).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-658009628


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #28939:
URL: https://github.com/apache/spark/pull/28939#issuecomment-658009729







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658009462







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #29087:
URL: https://github.com/apache/spark/pull/29087#issuecomment-658009765







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation

2020-07-14 Thread GitBox


SparkQA removed a comment on pull request #29087:
URL: https://github.com/apache/spark/pull/29087#issuecomment-657914322


   **[Test build #125797 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125797/testReport)**
 for PR 29087 at commit 
[`5d85160`](https://github.com/apache/spark/commit/5d85160abca388a53054551ad7ce9e48e363dcd5).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-658009628







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658000525







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-658009630


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125805/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28939: [SPARK-32119][CORE] ExecutorPlugin doesn't work with Standalone Cluster

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28939:
URL: https://github.com/apache/spark/pull/28939#issuecomment-658009737


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125803/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29087:
URL: https://github.com/apache/spark/pull/29087#issuecomment-658009765


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29088:
URL: https://github.com/apache/spark/pull/29088#issuecomment-658009412


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125807/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658009471


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125806/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28917:
URL: https://github.com/apache/spark/pull/28917#issuecomment-658009362


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125808/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29088:
URL: https://github.com/apache/spark/pull/29088#issuecomment-658009403


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28901: [SPARK-32064][SQL] Supporting create temporary table

2020-07-14 Thread GitBox


SparkQA removed a comment on pull request #28901:
URL: https://github.com/apache/spark/pull/28901#issuecomment-657973941


   **[Test build #125805 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125805/testReport)**
 for PR 28901 at commit 
[`9b11aac`](https://github.com/apache/spark/commit/9b11aace28be8169e8eff1ce61810bc8250fc37d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29087: [SPARK-28227][SQL] Support TRANSFORM with aggregation

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29087:
URL: https://github.com/apache/spark/pull/29087#issuecomment-658009772


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125797/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #29086: [SPARK-32292][SPARK-32252][INFRA] Run the relevant tests only in GitHub Actions

2020-07-14 Thread GitBox


viirya commented on a change in pull request #29086:
URL: https://github.com/apache/spark/pull/29086#discussion_r454151281



##
File path: dev/run-tests.py
##
@@ -589,43 +627,74 @@ def main():
 # /home/jenkins/anaconda2/envs/py36/bin
 os.environ["PATH"] = "/home/anaconda/envs/py36/bin:" + 
os.environ.get("PATH")
 else:
-# else we're running locally and can use local settings
+# else we're running locally or Github Actions.
 build_tool = "sbt"
 hadoop_version = os.environ.get("HADOOP_PROFILE", "hadoop2.7")
 hive_version = os.environ.get("HIVE_PROFILE", "hive2.3")
-test_env = "local"
+if "GITHUB_ACTIONS" in os.environ:
+test_env = "github_actions"
+else:
+test_env = "local"
 
 print("[info] Using build tool", build_tool, "with Hadoop profile", 
hadoop_version,
   "and Hive profile", hive_version, "under environment", test_env)
 extra_profiles = get_hadoop_profiles(hadoop_version) + 
get_hive_profiles(hive_version)
 
 changed_modules = None
+test_modules = None
 changed_files = None
-should_only_test_modules = "TEST_ONLY_MODULES" in os.environ
+should_only_test_modules = opts.modules is not None
 included_tags = []
+excluded_tags = []
 if should_only_test_modules:
-str_test_modules = [m.strip() for m in 
os.environ.get("TEST_ONLY_MODULES").split(",")]
+str_test_modules = [m.strip() for m in opts.modules.split(",")]
 test_modules = [m for m in modules.all_modules if m.name in 
str_test_modules]
-# Directly uses test_modules as changed modules to apply tags and 
environments
-# as if all specified test modules are changed.
+
+# If we're running the tests in Github Actions, attempt to detect and 
test
+# only the affected modules.
+if test_env == "github_actions":
+if os.environ["GITHUB_BASE_REF"] != "":
+# Pull requests
+changed_files = identify_changed_files_from_git_commits(
+os.environ["GITHUB_SHA"], 
target_branch=os.environ["GITHUB_BASE_REF"])

Review comment:
   Okay. Thanks for clarifying. Looks good.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon opened a new pull request #29096: [WIP][TESTS] Enable test-dependencies.sh and Unidoc test in Jenkins jobs

2020-07-14 Thread GitBox


HyukjinKwon opened a new pull request #29096:
URL: https://github.com/apache/spark/pull/29096


   ### What changes were proposed in this pull request?
   
   Seems like Jenkins machines came back to normal. Maybe we should just 
re-enable dependency test and Javadoc/Scaladoc build in Jenkins for simplicity. 
   
   Now, without corner case exceptions, we can merge if Jenkins or GitHub 
Actions build pass without depending on each other for dependency testing or 
Unidoc.
   
   ### Why are the changes needed?
   
   For simplicity.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, dev-only.
   
   ### How was this patch tested?
   
   Jenkins will test it here.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29096: [WIP][TESTS] Enable test-dependencies.sh and Unidoc test in Jenkins jobs

2020-07-14 Thread GitBox


HyukjinKwon commented on pull request #29096:
URL: https://github.com/apache/spark/pull/29096#issuecomment-658016315


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29096: [WIP][TESTS] Enable test-dependencies.sh and Unidoc test in Jenkins jobs

2020-07-14 Thread GitBox


SparkQA commented on pull request #29096:
URL: https://github.com/apache/spark/pull/29096#issuecomment-658018293


   **[Test build #125811 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125811/testReport)**
 for PR 29096 at commit 
[`cc298d6`](https://github.com/apache/spark/commit/cc298d61f45dec1712e350adba4c078ef15841e1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29096: [WIP][TESTS] Enable test-dependencies.sh and Unidoc test in Jenkins jobs

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29096:
URL: https://github.com/apache/spark/pull/29096#issuecomment-658018964







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29096: [WIP][TESTS] Enable test-dependencies.sh and Unidoc test in Jenkins jobs

2020-07-14 Thread GitBox


SparkQA removed a comment on pull request #29096:
URL: https://github.com/apache/spark/pull/29096#issuecomment-658018293


   **[Test build #125811 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125811/testReport)**
 for PR 29096 at commit 
[`cc298d6`](https://github.com/apache/spark/commit/cc298d61f45dec1712e350adba4c078ef15841e1).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29096: [WIP][TESTS] Enable test-dependencies.sh and Unidoc test in Jenkins jobs

2020-07-14 Thread GitBox


SparkQA commented on pull request #29096:
URL: https://github.com/apache/spark/pull/29096#issuecomment-658019061


   **[Test build #125811 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125811/testReport)**
 for PR 29096 at commit 
[`cc298d6`](https://github.com/apache/spark/commit/cc298d61f45dec1712e350adba4c078ef15841e1).
* This patch **fails build dependency tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29096: [WIP][TESTS] Enable test-dependencies.sh and Unidoc test in Jenkins jobs

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #29096:
URL: https://github.com/apache/spark/pull/29096#issuecomment-658018964







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29096: [WIP][TESTS] Enable test-dependencies.sh and Unidoc test in Jenkins jobs

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29096:
URL: https://github.com/apache/spark/pull/29096#issuecomment-658019090


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125811/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] joanjiao2016 commented on pull request #21038: [SPARK-22968][DStream] Throw an exception on partition revoking issue

2020-07-14 Thread GitBox


joanjiao2016 commented on pull request #21038:
URL: https://github.com/apache/spark/pull/21038#issuecomment-658020895


   @koeninger Hi, we have prepared two spark streaming applications with the 
same group id  to run respectively on different cluster for disaster 
recovery,the first application will failed when the second application started 
a few minutes later, and threw exception as: 
   java.lang.IllegalStateException: No current assignment for partition xxx




This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29096: [WIP][TESTS] Enable test-dependencies.sh and Unidoc test in Jenkins jobs

2020-07-14 Thread GitBox


SparkQA commented on pull request #29096:
URL: https://github.com/apache/spark/pull/29096#issuecomment-658021416


   **[Test build #125812 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125812/testReport)**
 for PR 29096 at commit 
[`f86a96f`](https://github.com/apache/spark/commit/f86a96fb483ffa08c0c84859b1b77c710c776e27).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29096: [WIP][TESTS] Enable test-dependencies.sh and Unidoc test in Jenkins jobs

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #29096:
URL: https://github.com/apache/spark/pull/29096#issuecomment-658021945







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29096: [WIP][TESTS] Enable test-dependencies.sh and Unidoc test in Jenkins jobs

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29096:
URL: https://github.com/apache/spark/pull/29096#issuecomment-658021945







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a change in pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-14 Thread GitBox


ulysses-you commented on a change in pull request #28840:
URL: https://github.com/apache/spark/pull/28840#discussion_r454162497



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
##
@@ -236,6 +236,45 @@ case class ShowFunctionsCommand(
   }
 }
 
+
+/**
+ * A command for users to refresh the persistent function.
+ * The syntax of using this command in SQL is:
+ * {{{
+ *REFRESH FUNCTION functionName
+ * }}}
+ */
+case class RefreshFunctionCommand(
+databaseName: Option[String],
+functionName: String)
+  extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+if 
(FunctionRegistry.builtin.functionExists(FunctionIdentifier(functionName))) {
+  throw new AnalysisException(s"Cannot refresh builtin function 
$functionName")
+}
+if (catalog.isTemporaryFunction(FunctionIdentifier(functionName, 
databaseName))) {
+  throw new AnalysisException(s"Cannot refresh temporary function 
$functionName")
+}
+
+val identifier = FunctionIdentifier(
+  functionName, Some(databaseName.getOrElse(catalog.getCurrentDatabase)))
+// we only refresh the permanent function.
+if (catalog.isPersistentFunction(identifier)) {
+  // register overwrite function.
+  val func = catalog.getFunctionMetadata(identifier)
+  catalog.registerFunction(func, true)
+} else {
+  // function is not exists, clear cached function.
+  catalog.unregisterFunction(identifier, true)
+  throw new NoSuchFunctionException(identifier.database.get, functionName)

Review comment:
   `REFRESH TABLE` doesn't do the side-effects, it always check the table 
if exist first.
   
   I think it's necessary to have both of invalid cache and throw exception.
   * It's confused that we can still use or desc a not exist function if we 
just throw exception. 
   * It's also confused that we can refresh any function name without an 
exception if we just clear cache.
   
   BTW current `REFRESH TABLE` exists a minor memory leak in this case
   ```
   -- client a execute
   create table t1(c1 int);
   cache table t1;
   
   -- client b execute
   drop table t1;
   create table t1(c1 int, c2 int);
   uncache table t1.
   
   -- client a.t1 produce a memory leak
   -- the reason is spark think it's a plan cache but user may think it's a 
table cache
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a change in pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel

2020-07-14 Thread GitBox


wangyum commented on a change in pull request #29088:
URL: https://github.com/apache/spark/pull/29088#discussion_r454167871



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
##
@@ -2353,6 +2355,53 @@ abstract class CSVSuite extends QueryTest with 
SharedSparkSession with TestCsvDa
   assert(df.schema.last == StructField("col_mixed_types", StringType, 
true))
 }
   }
+
+  test("Some characters are garbled when opening csv files with Excel") {
+// scalastyle:off nonascii
+val chinese = "我爱中文"
+val korean = "나는 한국인을 좋아한다"

Review comment:
   Is it correct? I'm not sure.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] beliefer commented on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-07-14 Thread GitBox


beliefer commented on pull request #28917:
URL: https://github.com/apache/spark/pull/28917#issuecomment-658027966


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #28917:
URL: https://github.com/apache/spark/pull/28917#issuecomment-658028313







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] peter-toth commented on pull request #29053: [SPARK-32241][SQL] Remove empty children of union

2020-07-14 Thread GitBox


peter-toth commented on pull request #29053:
URL: https://github.com/apache/spark/pull/29053#issuecomment-658028307


   Thanks for the review.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29096: [WIP][TESTS] Enable test-dependencies.sh and Unidoc test in Jenkins jobs

2020-07-14 Thread GitBox


HyukjinKwon commented on pull request #29096:
URL: https://github.com/apache/spark/pull/29096#issuecomment-658028639


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel

2020-07-14 Thread GitBox


HyukjinKwon commented on a change in pull request #29088:
URL: https://github.com/apache/spark/pull/29088#discussion_r454170678



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
##
@@ -2353,6 +2355,53 @@ abstract class CSVSuite extends QueryTest with 
SharedSparkSession with TestCsvDa
   assert(df.schema.last == StructField("col_mixed_types", StringType, 
true))
 }
   }
+
+  test("Some characters are garbled when opening csv files with Excel") {
+// scalastyle:off nonascii
+val chinese = "我爱中文"
+val korean = "나는 한국인을 좋아한다"

Review comment:
   Yup!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28917:
URL: https://github.com/apache/spark/pull/28917#issuecomment-658028313







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29053: [SPARK-32241][SQL] Remove empty children of union

2020-07-14 Thread GitBox


dongjoon-hyun commented on pull request #29053:
URL: https://github.com/apache/spark/pull/29053#issuecomment-658030131


   Thank you, all!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29096: [WIP][TESTS] Enable test-dependencies.sh and Unidoc test in Jenkins jobs

2020-07-14 Thread GitBox


SparkQA commented on pull request #29096:
URL: https://github.com/apache/spark/pull/29096#issuecomment-658031266


   **[Test build #125813 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125813/testReport)**
 for PR 29096 at commit 
[`f86a96f`](https://github.com/apache/spark/commit/f86a96fb483ffa08c0c84859b1b77c710c776e27).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28917: [SPARK-31847][CORE][TESTS] DAGSchedulerSuite: Rewrite the test framework to support apply specified spark configurations.

2020-07-14 Thread GitBox


SparkQA commented on pull request #28917:
URL: https://github.com/apache/spark/pull/28917#issuecomment-658031305


   **[Test build #125814 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125814/testReport)**
 for PR 28917 at commit 
[`ec0d8d0`](https://github.com/apache/spark/commit/ec0d8d00b64662343dc6b3945dc5999343b699a7).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29096: [WIP][TESTS] Enable test-dependencies.sh and Unidoc test in Jenkins jobs

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #29096:
URL: https://github.com/apache/spark/pull/29096#issuecomment-658031929







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #29093: [SPARK-32220][SQL][3.0][FOLLOW-UP]SHUFFLE_REPLICATE_NL Hint should not change Non-Cartesian Product join result

2020-07-14 Thread GitBox


dongjoon-hyun closed pull request #29093:
URL: https://github.com/apache/spark/pull/29093


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29096: [WIP][TESTS] Enable test-dependencies.sh and Unidoc test in Jenkins jobs

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29096:
URL: https://github.com/apache/spark/pull/29096#issuecomment-658031929







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28960: [SPARK-32140][ML][PySpark] Add training summary to FMClassificationModel

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28960:
URL: https://github.com/apache/spark/pull/28960#issuecomment-658009398







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28960: [SPARK-32140][ML][PySpark] Add training summary to FMClassificationModel

2020-07-14 Thread GitBox


SparkQA removed a comment on pull request #28960:
URL: https://github.com/apache/spark/pull/28960#issuecomment-658009354


   **[Test build #125810 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125810/testReport)**
 for PR 28960 at commit 
[`9a58603`](https://github.com/apache/spark/commit/9a58603ce88b2c3116f6ce77a5144151cffab4ad).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28960: [SPARK-32140][ML][PySpark] Add training summary to FMClassificationModel

2020-07-14 Thread GitBox


SparkQA commented on pull request #28960:
URL: https://github.com/apache/spark/pull/28960#issuecomment-658035984


   **[Test build #125810 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125810/testReport)**
 for PR 28960 at commit 
[`9a58603`](https://github.com/apache/spark/commit/9a58603ce88b2c3116f6ce77a5144151cffab4ad).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28960: [SPARK-32140][ML][PySpark] Add training summary to FMClassificationModel

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #28960:
URL: https://github.com/apache/spark/pull/28960#issuecomment-658036270







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] mayurdb opened a new pull request #29097: Spark 32299

2020-07-14 Thread GitBox


mayurdb opened a new pull request #29097:
URL: https://github.com/apache/spark/pull/29097


   
   
   ### What changes were proposed in this pull request?
   To change SortMergeJoin orientation at runtime using adaptive query execution
   
   
   ### Why are the changes needed?
   For SortMerge join of type EquiJoin, the left and right side of the joins 
are decided on the basis of the user order. In SMJ, the left side of the join 
is streamed and the right side is buffered (matching values). Because of this, 
B SMJ A would perform better than A SMJ B if, sizeOf(B) > sizeOf(A)
   
   With adaptive query execution, once both ShuffleQueryStages corresponding to 
the join have completed and if none of them have sizes lesser than the 
broadcast threshold (the join will not be converted to BroadcastHashJoin), join 
orientation can be changed at run time.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   -->
   
   
   ### How was this patch tested?
   - Added unit tests
   - Ran AdaptiveQueryExecSuite
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29097: Spark 32299

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #29097:
URL: https://github.com/apache/spark/pull/29097#issuecomment-658038551


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-14 Thread GitBox


cloud-fan commented on a change in pull request #28840:
URL: https://github.com/apache/spark/pull/28840#discussion_r454182771



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
##
@@ -236,6 +236,45 @@ case class ShowFunctionsCommand(
   }
 }
 
+
+/**
+ * A command for users to refresh the persistent function.
+ * The syntax of using this command in SQL is:
+ * {{{
+ *REFRESH FUNCTION functionName
+ * }}}
+ */
+case class RefreshFunctionCommand(
+databaseName: Option[String],
+functionName: String)
+  extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+if 
(FunctionRegistry.builtin.functionExists(FunctionIdentifier(functionName))) {
+  throw new AnalysisException(s"Cannot refresh builtin function 
$functionName")
+}
+if (catalog.isTemporaryFunction(FunctionIdentifier(functionName, 
databaseName))) {
+  throw new AnalysisException(s"Cannot refresh temporary function 
$functionName")
+}
+
+val identifier = FunctionIdentifier(
+  functionName, Some(databaseName.getOrElse(catalog.getCurrentDatabase)))
+// we only refresh the permanent function.
+if (catalog.isPersistentFunction(identifier)) {
+  // register overwrite function.
+  val func = catalog.getFunctionMetadata(identifier)
+  catalog.registerFunction(func, true)
+} else {
+  // function is not exists, clear cached function.
+  catalog.unregisterFunction(identifier, true)
+  throw new NoSuchFunctionException(identifier.database.get, functionName)

Review comment:
   It depends on how you define "function exists". If users can still use 
this function in SQL queries, why do we throw NoSuchFunctionException in 
REFRESH FUNCTION?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gaborgsomogyi commented on a change in pull request #29024: [WIP][SPARK-32001][SQL]Create JDBC authentication provider developer API

2020-07-14 Thread GitBox


gaborgsomogyi commented on a change in pull request #29024:
URL: https://github.com/apache/spark/pull/29024#discussion_r454183194



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala
##
@@ -23,12 +23,15 @@ import java.util.{Locale, Properties}
 import org.apache.commons.io.FilenameUtils
 
 import org.apache.spark.SparkFiles
+import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.internal.Logging
 import org.apache.spark.sql.catalyst.util.CaseInsensitiveMap
 
 /**
+ * ::DeveloperApi::
  * Options for the JDBC data source.
  */
+@DeveloperApi

Review comment:
   We could pass the 2 params but then we limit further implementation 
possibilities so I would vote on the map.
   At the moment there is no need other params other than `keytab` and 
`principal` but later providers may need further things. It's not a strong 
opinion, just don't want to close later possibilities. If we agree on the way 
I'll do the changes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29097: Spark 32299

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #29097:
URL: https://github.com/apache/spark/pull/29097#issuecomment-658041588


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29097: Spark 32299

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29097:
URL: https://github.com/apache/spark/pull/29097#issuecomment-658038551


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-14 Thread GitBox


cloud-fan commented on a change in pull request #28676:
URL: https://github.com/apache/spark/pull/28676#discussion_r454196515



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
##
@@ -60,6 +62,67 @@ case class BroadcastHashJoinExec(
 }
   }
 
+  override lazy val outputPartitioning: Partitioning = {
+joinType match {
+  case _: InnerLike =>
+streamedPlan.outputPartitioning match {
+  case h: HashPartitioning => expandOutputPartitioning(h)
+  case c: PartitioningCollection => expandOutputPartitioning(c)
+  case other => other
+}
+  case _ => streamedPlan.outputPartitioning
+}
+  }
+
+  // An one-to-many mapping from a streamed key to build keys.
+  private lazy val streamedKeyToBuildKeyMapping = {
+val mapping = mutable.Map.empty[Expression, Seq[Expression]]
+streamedKeys.zip(buildKeys).foreach {
+  case (streamedKey, buildKey) =>
+val key = streamedKey.canonicalized
+mapping.get(key) match {
+  case Some(v) => mapping.put(key, v :+ buildKey)
+  case None => mapping.put(key, Seq(buildKey))
+}
+}
+mapping.toMap
+  }
+
+  // Expands the given partitioning collection recursively.
+  private def expandOutputPartitioning(
+  partitioning: PartitioningCollection): PartitioningCollection = {
+PartitioningCollection(partitioning.partitionings.flatMap {
+  case h: HashPartitioning => expandOutputPartitioning(h).partitionings
+  case c: PartitioningCollection => Seq(expandOutputPartitioning(c))
+  case other => Seq(other)
+})
+  }
+
+  // Expands the given hash partitioning by substituting streamed keys with 
build keys.
+  // For example, if the expressions for the given partitioning are Seq("a", 
"b", "c")
+  // where the streamed keys are Seq("b", "c") and the build keys are Seq("x", 
"y"),
+  // the expanded partitioning will have the following expressions:
+  // Seq("a", "b", "c"), Seq("a", "b", "y"), Seq("a", "x", "c"), Seq("a", "x", 
"y").
+  // The expanded expressions are returned as PartitioningCollection.
+  private def expandOutputPartitioning(partitioning: HashPartitioning): 
PartitioningCollection = {
+def generateExprCombinations(
+current: Seq[Expression],
+accumulated: Seq[Expression]): Seq[Seq[Expression]] = {
+  if (current.isEmpty) {
+Seq(accumulated)
+  } else {
+val buildKeys = 
streamedKeyToBuildKeyMapping.get(current.head.canonicalized)
+generateExprCombinations(current.tail, accumulated :+ current.head) ++
+  buildKeys.map { _.flatMap(b => 
generateExprCombinations(current.tail, accumulated :+ b))

Review comment:
   shall we add an upper bound to avoid creating a too big 
`PartitioningCollection`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #28676: [SPARK-31869][SQL] BroadcastHashJoinExec can utilize the build side for its output partitioning

2020-07-14 Thread GitBox


cloud-fan commented on a change in pull request #28676:
URL: https://github.com/apache/spark/pull/28676#discussion_r454196797



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoinExec.scala
##
@@ -60,6 +62,67 @@ case class BroadcastHashJoinExec(
 }
   }
 
+  override lazy val outputPartitioning: Partitioning = {
+joinType match {
+  case _: InnerLike =>
+streamedPlan.outputPartitioning match {
+  case h: HashPartitioning => expandOutputPartitioning(h)
+  case c: PartitioningCollection => expandOutputPartitioning(c)
+  case other => other
+}
+  case _ => streamedPlan.outputPartitioning
+}
+  }
+
+  // An one-to-many mapping from a streamed key to build keys.
+  private lazy val streamedKeyToBuildKeyMapping = {
+val mapping = mutable.Map.empty[Expression, Seq[Expression]]
+streamedKeys.zip(buildKeys).foreach {
+  case (streamedKey, buildKey) =>
+val key = streamedKey.canonicalized
+mapping.get(key) match {
+  case Some(v) => mapping.put(key, v :+ buildKey)
+  case None => mapping.put(key, Seq(buildKey))
+}
+}
+mapping.toMap
+  }
+
+  // Expands the given partitioning collection recursively.
+  private def expandOutputPartitioning(
+  partitioning: PartitioningCollection): PartitioningCollection = {
+PartitioningCollection(partitioning.partitionings.flatMap {
+  case h: HashPartitioning => expandOutputPartitioning(h).partitionings
+  case c: PartitioningCollection => Seq(expandOutputPartitioning(c))
+  case other => Seq(other)
+})
+  }
+
+  // Expands the given hash partitioning by substituting streamed keys with 
build keys.
+  // For example, if the expressions for the given partitioning are Seq("a", 
"b", "c")
+  // where the streamed keys are Seq("b", "c") and the build keys are Seq("x", 
"y"),
+  // the expanded partitioning will have the following expressions:
+  // Seq("a", "b", "c"), Seq("a", "b", "y"), Seq("a", "x", "c"), Seq("a", "x", 
"y").
+  // The expanded expressions are returned as PartitioningCollection.
+  private def expandOutputPartitioning(partitioning: HashPartitioning): 
PartitioningCollection = {
+def generateExprCombinations(
+current: Seq[Expression],
+accumulated: Seq[Expression]): Seq[Seq[Expression]] = {
+  if (current.isEmpty) {
+Seq(accumulated)
+  } else {
+val buildKeys = 
streamedKeyToBuildKeyMapping.get(current.head.canonicalized)
+generateExprCombinations(current.tail, accumulated :+ current.head) ++
+  buildKeys.map { _.flatMap(b => 
generateExprCombinations(current.tail, accumulated :+ b))

Review comment:
   Or we can create a special `HashPartitioning` which works like a lazy 
`PartitioningCollection`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon opened a new pull request #29098: [SPARK-32300][PYTHON][2.4] toPandas should work from a Spark DataFrame with no partitions

2020-07-14 Thread GitBox


HyukjinKwon opened a new pull request #29098:
URL: https://github.com/apache/spark/pull/29098


   ### What changes were proposed in this pull request?
   
   This PR proposes to just simply by-pass the case when the number of array 
size is negative, when it collects data from Spark DataFrame with no partitions 
for `toPandas`.
   
   ```python
   spark.sparkContext.emptyRDD().toDF("col1 int").toPandas()
   ```
   
   In the master and branch-3.0, this was fixed together at 
https://github.com/apache/spark/commit/ecaa495b1fe532c36e952ccac42f4715809476af 
but it's legitimately not ported back.
   
   ### Why are the changes needed?
   
   To make empty Spark DataFrame able to be a pandas DataFrame.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes,
   
   ```python
   spark.sparkContext.emptyRDD().toDF("col1 int").toPandas()
   ```
   
   **Before:**
   
   ```
   ...
   Caused by: java.lang.NegativeArraySizeException
at 
org.apache.spark.sql.Dataset$$anonfun$collectAsArrowToPython$1$$anonfun$apply$17.apply(Dataset.scala:3293)
at 
org.apache.spark.sql.Dataset$$anonfun$collectAsArrowToPython$1$$anonfun$apply$17.apply(Dataset.scala:3287)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
   ...
   ```
   
   **After:**
   
   ```
   Empty DataFrame
   Columns: [col1]
   Index: []
   ```
   
   ### How was this patch tested?
   
   Manually tested and unittest were added.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29098: [SPARK-32300][PYTHON][2.4] toPandas should work from a Spark DataFrame with no partitions

2020-07-14 Thread GitBox


SparkQA commented on pull request #29098:
URL: https://github.com/apache/spark/pull/29098#issuecomment-658053443


   **[Test build #125815 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125815/testReport)**
 for PR 29098 at commit 
[`c3a7f7e`](https://github.com/apache/spark/commit/c3a7f7ea780799541bba869f65fd0fa275b84974).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29098: [SPARK-32300][PYTHON][2.4] toPandas should work from a Spark DataFrame with no partitions

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #29098:
URL: https://github.com/apache/spark/pull/29098#issuecomment-658054104







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29098: [SPARK-32300][PYTHON][2.4] toPandas should work from a Spark DataFrame with no partitions

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29098:
URL: https://github.com/apache/spark/pull/29098#issuecomment-658054104







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-14 Thread GitBox


cloud-fan commented on a change in pull request #29045:
URL: https://github.com/apache/spark/pull/29045#discussion_r454201305



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala
##
@@ -116,47 +116,53 @@ object OrcUtils extends Logging {
   }
 
   /**
-   * Returns the requested column ids from the given ORC file. Column id can 
be -1, which means the
-   * requested column doesn't exist in the ORC file. Returns None if the given 
ORC file is empty.
+   * @return Returns the requested column ids from the given ORC file and 
Boolean flag to use actual
+   * schema or result schema. Column id can be -1, which means the requested 
column doesn't
+   * exist in the ORC file. Returns None if the given ORC file is empty.
*/
   def requestedColumnIds(
   isCaseSensitive: Boolean,
   dataSchema: StructType,
   requiredSchema: StructType,
   reader: Reader,
-  conf: Configuration): Option[Array[Int]] = {
+  conf: Configuration): (Option[Array[Int]], Boolean) = {
+var sendActualSchema = false
 val orcFieldNames = reader.getSchema.getFieldNames.asScala

Review comment:
   Please correct me if I'm wrong:
   1. the physical orc file schema is `_col0`, ...
   2. the table schema in metastore is `d_date_sk`, ...
   3. the query only requires only `d_year`
   
   I don't know why the query fails. The `requestedColumnIds` will be `[6]` and 
the orc reader will read the `_col6` column. Everything should be fine.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-14 Thread GitBox


cloud-fan commented on pull request #29085:
URL: https://github.com/apache/spark/pull/29085#issuecomment-658057950


   Can we use `Cast` to turn catalyst value to string and pass to the script?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29098: [SPARK-32300][PYTHON][2.4] toPandas should work from a Spark DataFrame with no partitions

2020-07-14 Thread GitBox


SparkQA commented on pull request #29098:
URL: https://github.com/apache/spark/pull/29098#issuecomment-658057520


   **[Test build #125816 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125816/testReport)**
 for PR 29098 at commit 
[`8074075`](https://github.com/apache/spark/commit/80740755c822715e8e8956517ee4ecb73c962348).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29098: [SPARK-32300][PYTHON][2.4] toPandas should work from a Spark DataFrame with no partitions

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #29098:
URL: https://github.com/apache/spark/pull/29098#issuecomment-658058126







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29098: [SPARK-32300][PYTHON][2.4] toPandas should work from a Spark DataFrame with no partitions

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29098:
URL: https://github.com/apache/spark/pull/29098#issuecomment-658058126







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ulysses-you commented on a change in pull request #28840: [SPARK-31999][SQL] Add REFRESH FUNCTION command

2020-07-14 Thread GitBox


ulysses-you commented on a change in pull request #28840:
URL: https://github.com/apache/spark/pull/28840#discussion_r454204568



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala
##
@@ -236,6 +236,45 @@ case class ShowFunctionsCommand(
   }
 }
 
+
+/**
+ * A command for users to refresh the persistent function.
+ * The syntax of using this command in SQL is:
+ * {{{
+ *REFRESH FUNCTION functionName
+ * }}}
+ */
+case class RefreshFunctionCommand(
+databaseName: Option[String],
+functionName: String)
+  extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+if 
(FunctionRegistry.builtin.functionExists(FunctionIdentifier(functionName))) {
+  throw new AnalysisException(s"Cannot refresh builtin function 
$functionName")
+}
+if (catalog.isTemporaryFunction(FunctionIdentifier(functionName, 
databaseName))) {
+  throw new AnalysisException(s"Cannot refresh temporary function 
$functionName")
+}
+
+val identifier = FunctionIdentifier(
+  functionName, Some(databaseName.getOrElse(catalog.getCurrentDatabase)))
+// we only refresh the permanent function.
+if (catalog.isPersistentFunction(identifier)) {
+  // register overwrite function.
+  val func = catalog.getFunctionMetadata(identifier)
+  catalog.registerFunction(func, true)
+} else {
+  // function is not exists, clear cached function.
+  catalog.unregisterFunction(identifier, true)
+  throw new NoSuchFunctionException(identifier.database.get, functionName)

Review comment:
   how about this 
   ```
   if (catalog.isPersistentFunction(identifier)) {
 // register overwrite function.
 val func = catalog.getFunctionMetadata(identifier)
 catalog.registerFunction(func, true)
   } else if (catalog.isRegisteredFunction(identifier)) {
 // clear cached function.
 catalog.unregisterFunction(identifier, true)
   } else {
 throw new NoSuchFunctionException(identifier.database.get, functionName)
   }
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon opened a new pull request #29099: [SPARK-32301][PYTHON][TESTS] Add a test case for toPandas to work with empty partitioned Spark DataFrame

2020-07-14 Thread GitBox


HyukjinKwon opened a new pull request #29099:
URL: https://github.com/apache/spark/pull/29099


   ### What changes were proposed in this pull request?
   
   This PR proposes to port the test case from 
https://github.com/apache/spark/pull/29098 to branch-3.0 and master.  In the 
master and branch-3.0, this was fixed together at 
https://github.com/apache/spark/commit/ecaa495b1fe532c36e952ccac42f4715809476af 
but no partition case is not being tested.
   
   ### Why are the changes needed?
   
   To improve test coverage.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, test-only.
   
   ### How was this patch tested?
   
   Unit test was forward-ported.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29099: [SPARK-32301][PYTHON][TESTS] Add a test case for toPandas to work with empty partitioned Spark DataFrame

2020-07-14 Thread GitBox


SparkQA commented on pull request #29099:
URL: https://github.com/apache/spark/pull/29099#issuecomment-658061150


   **[Test build #125817 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125817/testReport)**
 for PR 29099 at commit 
[`e986c65`](https://github.com/apache/spark/commit/e986c65f4e968bf58d16569055eda13414f5ec33).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29099: [SPARK-32301][PYTHON][TESTS] Add a test case for toPandas to work with empty partitioned Spark DataFrame

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #29099:
URL: https://github.com/apache/spark/pull/29099#issuecomment-658061766







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29098: [SPARK-32300][PYTHON][2.4] toPandas should work from a Spark DataFrame with no partitions

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29098:
URL: https://github.com/apache/spark/pull/29098#issuecomment-658061866







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29098: [SPARK-32300][PYTHON][2.4] toPandas should work from a Spark DataFrame with no partitions

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #29098:
URL: https://github.com/apache/spark/pull/29098#issuecomment-658061866







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29099: [SPARK-32301][PYTHON][TESTS] Add a test case for toPandas to work with empty partitioned Spark DataFrame

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29099:
URL: https://github.com/apache/spark/pull/29099#issuecomment-658061766







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on pull request #29064: [SPARK-32272][SQL] Add SQL standard command SET TIME ZONE

2020-07-14 Thread GitBox


cloud-fan commented on pull request #29064:
URL: https://github.com/apache/spark/pull/29064#issuecomment-658062375


   We should also add a document page in SQL reference for it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] adjordan commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox


adjordan commented on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658063483


   This is ready for review!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29078: [SPARK-29292][STREAMING][SQL][BUILD] Get streaming, catalyst, sql compiling for Scala 2.13

2020-07-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #29078:
URL: https://github.com/apache/spark/pull/29078#discussion_r454210666



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetPrimitiveSuite.scala
##
@@ -223,16 +223,6 @@ class DatasetPrimitiveSuite extends QueryTest with 
SharedSparkSession {
 checkDataset(Seq(Queue(true)).toDS(), Queue(true))
 checkDataset(Seq(Queue("test")).toDS(), Queue("test"))
 checkDataset(Seq(Queue(Tuple1(1))).toDS(), Queue(Tuple1(1)))
-
-checkDataset(Seq(ArrayBuffer(1)).toDS(), ArrayBuffer(1))

Review comment:
   Although this means the removal of test coverage in Scala 2.12, I'm +1 
for now. We can add back later after we finished everything in Scala 2.13.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gaborgsomogyi commented on a change in pull request #29024: [WIP][SPARK-32001][SQL]Create JDBC authentication provider developer API

2020-07-14 Thread GitBox


gaborgsomogyi commented on a change in pull request #29024:
URL: https://github.com/apache/spark/pull/29024#discussion_r454212309



##
File path: 
core/src/main/scala/org/apache/spark/security/SecurityConfigurationLock.scala
##
@@ -0,0 +1,28 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.security
+
+import org.apache.spark.annotation.DeveloperApi
+
+/**
+ * ::DeveloperApi::
+ * There are cases when global JVM security configuration must be modified.
+ * In order to avoid race the modification must be synchronized with this.
+ */
+@DeveloperApi
+object SecurityConfigurationLock

Review comment:
   `Considering this I would not add synchronization into the framework` = 
Adding synchronization into a central place (like `ConnectionProvider.create`) 
and allowing 3rd-party developers not to care about this is not something where 
I see the gain (I see cases where such change would do unnecessary 
synchronization). That said physically it's not an issue but could be 
misleading.
   An example:
   ```
 def create(driver: Driver, options: JDBCOptions): Connection = {
   val filteredProviders = providers.filter(_.canHandle(driver, options))
   logDebug(s"Filtered providers: $filteredProviders")
   require(filteredProviders.size == 1,
 "JDBC connection initiated but not exactly one connection provider 
found which can handle it")
   var conn: Connection = null
   // This would synchronize but for nothing in some cases
   SecurityConfigurationLock.synchronized {
 conn = filteredProviders.head.getConnection(driver, options)
   }
   conn
 }
   ```
   An imaginary provider implemented by 3rd-party:
   ```
   class OracleConnectionProviderTGT {
 override def canHandle(driver: Driver, options: JDBCOptions): Boolean = {
   // Example content of tgtCache: "/tmp/krb5cc_5088"
   options.tgtCache != null ...
 }
   
 override def getConnection(driver: Driver, options: JDBCOptions): 
Connection = {
   ...
   // No need to modify global JVM configuration
   
prop.setProperty(OracleConnection.CONNECTION_PROPERTY_THIN_NET_AUTHENTICATION_KRB5_CC_NAME,
options.tgtCache)
   ...
   driver.connect(url, prop)
 }
   }
   ```
   Overall if we would like to add such change then I would mention that 
`getConnection` is synchronized under any circustances which may or may not 
needed. This is not suggested from my perspective but no strong opinion.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29098: [SPARK-32300][PYTHON][2.4] toPandas should work from a Spark DataFrame with no partitions

2020-07-14 Thread GitBox


SparkQA commented on pull request #29098:
URL: https://github.com/apache/spark/pull/29098#issuecomment-658065020


   **[Test build #125818 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125818/testReport)**
 for PR 29098 at commit 
[`070ea46`](https://github.com/apache/spark/commit/070ea46dcfb6521d43f107e509fbb5dd520ec9c8).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on pull request #29018: [SPARK-32202][ML][WIP] tree models auto infer compact integer type

2020-07-14 Thread GitBox


zhengruifeng commented on pull request #29018:
URL: https://github.com/apache/spark/pull/29018#issuecomment-658066066


   @viirya Thanks for reviewing!
   
   > This win only happens when maxBins is less
   
   Yes, but in most cases, maxBin(default=32) < 128
   
   > the perf regression happens for all cases
   
   Yes, I think so.
   
   > I'm also not sure how often memory is an issue when training the model
   
   It will make sense if there is no enough memory for orginal 
treePoint(Array[Int]). I personally think it maybe worthwhile if the regression 
is small enough, but I am not sure whether current performance results are OK.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon opened a new pull request #29100: [MINOR][R] Match collectAsArrowToR with non-streaming collectAsArrowToPython

2020-07-14 Thread GitBox


HyukjinKwon opened a new pull request #29100:
URL: https://github.com/apache/spark/pull/29100


   ### What changes were proposed in this pull request?
   
   This PR proposes to port forward #29098 to `collectAsArrowToR`. 
`collectAsArrowToR` follows `collectAsArrowToPython` in branch-2.4 due to the 
limitation of ARROW-4512. SparkR vectorization currently cannot use streaming 
format.
   
   Note that you cannot create no partition Spark DataFrame in SparkR if I am 
not wrong. So there is no behaviour changes to end users.
   
   ### Why are the changes needed?
   
   For simplicity and consistency.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   The same code is being tested in `collectAsArrowToPython` of branch-2.4.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gaborgsomogyi commented on pull request #29024: [WIP][SPARK-32001][SQL]Create JDBC authentication provider developer API

2020-07-14 Thread GitBox


gaborgsomogyi commented on pull request #29024:
URL: https://github.com/apache/spark/pull/29024#issuecomment-658066311


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >