[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202743580 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -1024,26 +1033,29 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToIntervalCode(from: DataType): CastFunction = from match { case StringType => (c, evPrim, evNull) => -s"""$evPrim = CalendarInterval.fromString($c.toString()); +code"""$evPrim = CalendarInterval.fromString($c.toString()); if(${evPrim} == null) { ${evNull} = true; } """.stripMargin } - private[this] def decimalToTimestampCode(d: String): String = -s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()" - private[this] def longToTimeStampCode(l: String): String = s"$l * 100L" - private[this] def timestampToIntegerCode(ts: String): String = -s"java.lang.Math.floor((double) $ts / 100L)" - private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 100.0" + private[this] def decimalToTimestampCode(d: ExprValue): Block = { +val block = code"new java.math.BigDecimal(100L)" --- End diff -- nit: why isn't this a literal? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19449 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1015/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21786: [SPARK-23901][SQL] Removing masking functions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21786 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20636 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20636 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21765: [MINOR][CORE] Add test cases for RDD.cartesian
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21765 **[Test build #4215 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4215/testReport)** for PR 21765 at commit [`9df4c3b`](https://github.com/apache/spark/commit/9df4c3b4a71082181aa979c3bddf2c3d99db256e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21468: [SPARK-22151] : PYTHONPATH not picked up from the...
Github user pgandhi999 commented on a diff in the pull request: https://github.com/apache/spark/pull/21468#discussion_r202782358 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala --- @@ -813,8 +813,20 @@ private[spark] class Client( if (pythonPath.nonEmpty) { val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath) .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR) - env("PYTHONPATH") = pythonPathStr - sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr) + val newValue = +if (env.contains("PYTHONPATH")) { + env("PYTHONPATH") + ApplicationConstants.CLASS_PATH_SEPARATOR + pythonPathStr +} else { + pythonPathStr +} + env("PYTHONPATH") = newValue + if (!sparkConf.getExecutorEnv.toMap.contains("PYTHONPATH")) { +sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr) + } else { +val pythonPathExecutorEnv = sparkConf.getExecutorEnv.toMap.get("PYTHONPATH").get + + ApplicationConstants.CLASS_PATH_SEPARATOR + pythonPathStr +sparkConf.setExecutorEnv("PYTHONPATH", pythonPathExecutorEnv) --- End diff -- Done --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21769: [SPARK-24805][SQL] Do not ignore avro files without exte...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21769 @gengliangwang @gatorsmile Please, have a look at the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21764: [SPARK-24802] Optimization Rule Exclusion
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/21764#discussion_r202759884 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -46,7 +47,23 @@ abstract class Optimizer(sessionCatalog: SessionCatalog) protected def fixedPoint = FixedPoint(SQLConf.get.optimizerMaxIterations) - def batches: Seq[Batch] = { + protected def postAnalysisBatches: Seq[Batch] = { +Batch("Eliminate Distinct", Once, EliminateDistinct) :: +// Technically some of the rules in Finish Analysis are not optimizer rules and belong more +// in the analyzer, because they are needed for correctness (e.g. ComputeCurrentTime). +// However, because we also use the analyzer to canonicalized queries (for view definition), +// we do not eliminate subqueries or compute current time in the analyzer. +Batch("Finish Analysis", Once, + EliminateSubqueryAliases, + EliminateView, + ReplaceExpressions, + ComputeCurrentTime, + GetCurrentDatabase(sessionCatalog), + RewriteDistinctAggregates, + ReplaceDeduplicateWithAggregate) :: Nil + } + + protected def optimizationBatches: Seq[Batch] = { --- End diff -- So can I do black list of batches? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20838: [SPARK-23698] Resolve undefined names in Python 3
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20838 **[Test build #93125 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93125/testReport)** for PR 20838 at commit [`2c4f15c`](https://github.com/apache/spark/commit/2c4f15c13efa8b181c8c53bd9a90f4f578a40169). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21748: [SPARK-23146][K8S] Support client mode.
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21748#discussion_r202781388 --- Diff: docs/running-on-kubernetes.md --- @@ -603,8 +603,8 @@ specific to Spark on Kubernetes. Name of the driver pod. In cluster mode, if this is not set, the driver pod name is set to "spark.app.name" suffixed by the current timestamp to avoid name conflicts. In client mode, if your application is running inside a pod, it is highly recommended to set this to the name of the pod your driver is running in. Setting this -value in client mode allows the driver to inform the cluster that your application's executor pods should be -deleted when the driver pod is deleted. +value in client mode allows the driver to become the owner of its executor pods, which in turn allows the executor +pods to be garbage collecfted by the cluster. --- End diff -- Typo `collecfted`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1017/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/1017/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21748: [SPARK-23146][K8S] Support client mode.
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21748#discussion_r202782024 --- Diff: docs/running-on-kubernetes.md --- @@ -120,8 +120,8 @@ This URI is the location of the example jar that is already in the Docker image. ## Client Mode Starting with Spark 2.4.0, it is possible to run Spark applications on Kubernetes in client mode. When running a Spark -application in client mode, a separate pod is not deployed to run the driver. When running an application in -client mode, it is recommended to account for the following factors: +application in client mode, a separate pod is not deployed to run the driver. Your Spark driver does not need to run in +a Kubernetes pod. When running an application in client mode, it is recommended to account for the following factors: --- End diff -- How about changing `a separate pod is not deployed to run the driver. Your Spark driver does not need to run in +a Kubernetes pod. When running an application in client mode` into `you may run your Spark driver outside the Kubernetes cluster or in a pod inside the cluster`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21786: [SPARK-23901][SQL] Removing masking functions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21786 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93120/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21102 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1018/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21102 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21786: [SPARK-23901][SQL] Removing masking functions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21786 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21442: [SPARK-24402] [SQL] Optimize `In` expression when only o...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21442 **[Test build #93130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93130/testReport)** for PR 21442 at commit [`5079833`](https://github.com/apache/spark/commit/5079833cc25949c806575f365f62f423a3205282). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202800594 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -1024,26 +1033,29 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToIntervalCode(from: DataType): CastFunction = from match { case StringType => (c, evPrim, evNull) => -s"""$evPrim = CalendarInterval.fromString($c.toString()); +code"""$evPrim = CalendarInterval.fromString($c.toString()); if(${evPrim} == null) { ${evNull} = true; } """.stripMargin } - private[this] def decimalToTimestampCode(d: String): String = -s"($d.toBigDecimal().bigDecimal().multiply(new java.math.BigDecimal(100L))).longValue()" - private[this] def longToTimeStampCode(l: String): String = s"$l * 100L" - private[this] def timestampToIntegerCode(ts: String): String = -s"java.lang.Math.floor((double) $ts / 100L)" - private[this] def timestampToDoubleCode(ts: String): String = s"$ts / 100.0" + private[this] def decimalToTimestampCode(d: ExprValue): Block = { +val block = code"new java.math.BigDecimal(100L)" --- End diff -- It looks like a statement to create `BigDecimal` object instead of just a `BigDecimal` literal? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21537 **[Test build #93117 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93117/testReport)** for PR 21537 at commit [`f1f2180`](https://github.com/apache/spark/commit/f1f218068bc1e4c147c14dbc56c874c4c7d7cc4b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21352: [SPARK-24305][SQL][FOLLOWUP] Avoid serialization of priv...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21352 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93116/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202800647 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/javaCode.scala --- @@ -196,7 +221,7 @@ object Block { EmptyBlock } else { args.foreach { - case _: ExprValue => + case _: ExprValue | _: Inline => --- End diff -- Moved. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21537 **[Test build #93132 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93132/testReport)** for PR 21537 at commit [`807d8d4`](https://github.com/apache/spark/commit/807d8d44f950b8a588065b15bb7fa6a5db753075). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21537 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93117/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21719: [SPARK-24747][ML] Make Instrumentation class more...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21719#discussion_r202805710 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/Instrumentation.scala --- @@ -19,45 +19,60 @@ package org.apache.spark.ml.util import java.util.UUID -import scala.reflect.ClassTag +import scala.util.{Failure, Success, Try} +import scala.util.control.NonFatal import org.json4s._ import org.json4s.JsonDSL._ import org.json4s.jackson.JsonMethods._ import org.apache.spark.internal.Logging -import org.apache.spark.ml.{Estimator, Model} -import org.apache.spark.ml.param.Param +import org.apache.spark.ml.{Estimator, Model, PipelineStage} +import org.apache.spark.ml.param.{Param, Params} import org.apache.spark.rdd.RDD import org.apache.spark.sql.Dataset import org.apache.spark.util.Utils /** * A small wrapper that defines a training session for an estimator, and some methods to log * useful information during this session. - * - * A new instance is expected to be created within fit(). - * - * @param estimator the estimator that is being fit - * @param dataset the training dataset - * @tparam E the type of the estimator */ -private[spark] class Instrumentation[E <: Estimator[_]] private ( -val estimator: E, -val dataset: RDD[_]) extends Logging { +private[spark] class Instrumentation extends Logging { private val id = UUID.randomUUID() - private val prefix = { + private val shortId = id.toString.take(8) + private val prefix = s"[$shortId] " + + // TODO: update spark.ml to use new Instrumentation APIs and remove this constructor + var stage: Params = _ --- End diff -- I'd recommend we either plan to remove "stage" or change "logPipelineStage" so it only allows setting "stage" once. If we go with the former, how about leaving a note to remove "stage" once spark.ml code is migrated to use the new logParams() method? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20636 **[Test build #93123 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93123/testReport)** for PR 20636 at commit [`a134091`](https://github.com/apache/spark/commit/a134091aad0c3f8e3674f6cd751c2b8d5d83e39e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21719: [SPARK-24747][ML] Make Instrumentation class more...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/21719#discussion_r202800969 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -488,9 +488,10 @@ class LogisticRegression @Since("1.2.0") ( train(dataset, handlePersistence) } + import Instrumentation.instrumented --- End diff -- Put import at top of file with the other imports (just to make imports easier to track). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/21589 jenkins, retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19449: [SPARK-22219][SQL] Refactor code to get a value for "spa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19449 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user mccheah commented on the issue: https://github.com/apache/spark/pull/21748 Comments are addressed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21442: [SPARK-24402] [SQL] Optimize `In` expression when only o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21442 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21442: [SPARK-24402] [SQL] Optimize `In` expression when only o...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21442 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/1019/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21785: [SPARK-24529][BUILD][FOLLOW-UP] Set spotbugs-maven-plugi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21785 **[Test build #93114 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93114/testReport)** for PR 21785 at commit [`9d87160`](https://github.com/apache/spark/commit/9d87160bc2c01321280d43f655c256c30d9fbc14). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21352: [SPARK-24305][SQL][FOLLOWUP] Avoid serialization of priv...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21352 **[Test build #93116 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93116/testReport)** for PR 21352 at commit [`294ac69`](https://github.com/apache/spark/commit/294ac69e618bb8d8b2f988540338d27534b560e9). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21352: [SPARK-24305][SQL][FOLLOWUP] Avoid serialization of priv...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21352 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21748: [SPARK-23146][K8S] Support client mode.
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21748#discussion_r202802089 --- Diff: docs/running-on-kubernetes.md --- @@ -120,8 +120,8 @@ This URI is the location of the example jar that is already in the Docker image. ## Client Mode Starting with Spark 2.4.0, it is possible to run Spark applications on Kubernetes in client mode. When running a Spark -application in client mode, a separate pod is not deployed to run the driver. When running an application in -client mode, it is recommended to account for the following factors: +application in client mode, a separate pod is not deployed to run the driver. Your Spark driver does not need to run in +a Kubernetes pod. When running an application in client mode, it is recommended to account for the following factors: --- End diff -- I think you can even remove `the driver process is run locally`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongHashedRelation in ex...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21772 **[Test build #93112 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93112/testReport)** for PR 21772 at commit [`a72fe61`](https://github.com/apache/spark/commit/a72fe61863e119c0e902cef3054d9140b6d04f77). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21403 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93106/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21403 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21729: [SPARK-24755][Core] Executor loss can cause task to not ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21729 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93122/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21102: [SPARK-23913][SQL] Add array_intersect function
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/21102#discussion_r202782954 --- Diff: core/src/test/scala/org/apache/spark/util/collection/OpenHashSetSuite.scala --- @@ -73,6 +73,46 @@ class OpenHashSetSuite extends SparkFunSuite with Matchers { assert(set.contains(50)) assert(set.contains(999)) assert(!set.contains(1)) + +set.add(1132) // Cause hash contention with 999 +assert(set.size === 4) +assert(set.contains(10)) +assert(set.contains(50)) +assert(set.contains(999)) +assert(set.contains(1132)) +assert(!set.contains(1)) + +set.remove(1132) +assert(set.size === 3) +assert(set.contains(10)) +assert(set.contains(50)) +assert(set.contains(999)) +assert(!set.contains(1132)) +assert(!set.contains(1)) + +set.remove(999) --- End diff -- good catch, I addressed this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21764: [SPARK-24802] Optimization Rule Exclusion
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/21764#discussion_r202786530 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -127,6 +127,14 @@ object SQLConf { } } + val OPTIMIZER_EXCLUDED_RULES = buildConf("spark.sql.optimizer.excludedRules") --- End diff -- Are you talking about SQL cache? I don't think optimizer has anything to do with SQL cache though, since the logical plans used to match cache entries are "analyzed" plans not "optimized" plans. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21769: [SPARK-24805][SQL] Do not ignore avro files without exte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21769 **[Test build #93126 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93126/testReport)** for PR 21769 at commit [`85cdf87`](https://github.com/apache/spark/commit/85cdf871ab31d9fada6280917a66557c98938f3c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21352: [SPARK-24305][SQL][FOLLOWUP] Avoid serialization of priv...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21352 **[Test build #93115 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93115/testReport)** for PR 21352 at commit [`62c55ad`](https://github.com/apache/spark/commit/62c55ada0e23eb47eb9d3b717f9a9fbc8155a05f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20636 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93123/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20636 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21589 **[Test build #93119 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93119/testReport)** for PR 21589 at commit [`128f6f0`](https://github.com/apache/spark/commit/128f6f0c3fc3b89b32554bdd40dddf784d274079). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21786: [SPARK-23901][SQL] Removing masking functions
Github user mn-mikke commented on the issue: https://github.com/apache/spark/pull/21786 @rxin @cloud-fan @gatorsmile @mgaido91 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21403: [SPARK-24341][SQL] Support only IN subqueries with the s...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21403 **[Test build #93106 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93106/testReport)** for PR 21403 at commit [`a5771b8`](https://github.com/apache/spark/commit/a5771b8a0a4f00d95bb6f882f40aa6dd17d0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21786: [SPARK-23901][SQL] Removing masking functions
GitHub user mn-mikke opened a pull request: https://github.com/apache/spark/pull/21786 [SPARK-23901][SQL] Removing masking functions The PR reverts #21246. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mn-mikke/spark SPARK-23901 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21786.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21786 commit eb34d46ad8450a777a5405f2e1f91149962e4f23 Author: Marek Novotny Date: 2018-07-16T16:53:38Z [SPARK-23901][SQL] Removing masking functions --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21786: [SPARK-23901][SQL] Removing masking functions
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21786 **[Test build #93120 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93120/testReport)** for PR 21786 at commit [`eb34d46`](https://github.com/apache/spark/commit/eb34d46ad8450a777a5405f2e1f91149962e4f23). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18714: [SPARK-20236][SQL] dynamic partition overwrite
Github user koertkuipers commented on the issue: https://github.com/apache/spark/pull/18714 @cloud-fan OK, that works just as well --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20949: [SPARK-19018][SQL] Add support for custom encoding on cs...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20949 **[Test build #93113 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93113/testReport)** for PR 20949 at commit [`91f4750`](https://github.com/apache/spark/commit/91f4750ff2f4781cea2fb23b1339659abf65009a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21102 **[Test build #93129 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93129/testReport)** for PR 21102 at commit [`28e0c45`](https://github.com/apache/spark/commit/28e0c45441348c89c627770059d03f7228d0f94b). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21468: [SPARK-22151] : PYTHONPATH not picked up from the spark....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21468 **[Test build #93128 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93128/testReport)** for PR 21468 at commit [`5423bef`](https://github.com/apache/spark/commit/5423befa2c27affc0a5a54f02144a34a77af34c4). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21729: [SPARK-24755][Core] Executor loss can cause task to not ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21729 **[Test build #93127 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93127/testReport)** for PR 21729 at commit [`a67bebc`](https://github.com/apache/spark/commit/a67bebcf304a7f0129f44586152490c7192efbe3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21748 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93124/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21352: [SPARK-24305][SQL][FOLLOWUP] Avoid serialization of priv...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21352 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93115/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21352: [SPARK-24305][SQL][FOLLOWUP] Avoid serialization of priv...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21352 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21748: [SPARK-23146][K8S] Support client mode.
Github user mccheah commented on a diff in the pull request: https://github.com/apache/spark/pull/21748#discussion_r202801155 --- Diff: docs/running-on-kubernetes.md --- @@ -120,8 +120,8 @@ This URI is the location of the example jar that is already in the Docker image. ## Client Mode Starting with Spark 2.4.0, it is possible to run Spark applications on Kubernetes in client mode. When running a Spark -application in client mode, a separate pod is not deployed to run the driver. When running an application in -client mode, it is recommended to account for the following factors: +application in client mode, a separate pod is not deployed to run the driver. Your Spark driver does not need to run in +a Kubernetes pod. When running an application in client mode, it is recommended to account for the following factors: --- End diff -- Think that's a bit wordy - perhaps, "When your application runs in client mode, the driver process is run locally. The driver can run inside a pod or on a physical host." --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21537 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21514: [SPARK-22860] [SPARK-24621] [Core] [WebUI] - hide key pa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21514 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93104/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongHashedRelation in ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21772 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93112/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21772: [SPARK-24809] [SQL] Serializing LongHashedRelation in ex...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21772 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21514: [SPARK-22860] [SPARK-24621] [Core] [WebUI] - hide key pa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21514 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21222: [SPARK-24161][SS] Enable debug package feature on struct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21222 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21222: [SPARK-24161][SS] Enable debug package feature on struct...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21222 **[Test build #93110 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93110/testReport)** for PR 21222 at commit [`b9bad1a`](https://github.com/apache/spark/commit/b9bad1ae6618606e91df35f854025bc32c8178de). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21222: [SPARK-24161][SS] Enable debug package feature on struct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21222 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93110/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStoreProvider ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21357 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93109/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21782: [SPARK-24816][SQL] SQL interface support repartitionByRa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21782 **[Test build #93105 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93105/testReport)** for PR 21782 at commit [`3e3bbad`](https://github.com/apache/spark/commit/3e3bbada6e9b02f4fd5d8db216bdc2ce4a397d12). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStoreProvider ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21357 **[Test build #93109 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93109/testReport)** for PR 21357 at commit [`55256b5`](https://github.com/apache/spark/commit/55256b59f7803b55d791299c3f801a73944893aa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21764: [SPARK-24802] Optimization Rule Exclusion
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/21764#discussion_r202762054 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -127,6 +127,14 @@ object SQLConf { } } + val OPTIMIZER_EXCLUDED_RULES = buildConf("spark.sql.optimizer.excludedRules") +.doc("Configures a list of rules to be disabled in the optimizer, in which the rules are " + + "specified by their rule names and separated by comma. It is not guaranteed that all the " + + "rules in this configuration will eventually be excluded, as some rules are necessary " + --- End diff -- Nice suggestion! @gatorsmile's other suggestion was to introduce a blacklist, in which case this enumeration of rules that cannot be excluded can be made possible. I can do a warning as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21764: [SPARK-24802] Optimization Rule Exclusion
Github user maryannxue commented on a diff in the pull request: https://github.com/apache/spark/pull/21764#discussion_r202760924 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -175,6 +179,35 @@ abstract class Optimizer(sessionCatalog: SessionCatalog) * Override to provide additional rules for the operator optimization batch. */ def extendedOperatorOptimizationRules: Seq[Rule[LogicalPlan]] = Nil + + override def batches: Seq[Batch] = { +val excludedRules = + SQLConf.get.optimizerExcludedRules.toSeq.flatMap(_.split(",").map(_.trim).filter(!_.isEmpty)) +val filteredOptimizationBatches = if (excludedRules.isEmpty) { + optimizationBatches +} else { + optimizationBatches.flatMap { batch => +val filteredRules = + batch.rules.filter { rule => +val exclude = excludedRules.contains(rule.ruleName) +if (exclude) { + logInfo(s"Optimization rule '${rule.ruleName}' is excluded from the optimizer.") +} +!exclude + } +if (batch.rules == filteredRules) { --- End diff -- It is to: 1) avoid unnecessary object creation if all rules have been preserved. 2) avoid empty batches if all rules in the batch have been removed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202778928 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -825,43 +832,43 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private[this] def castToStringCode(from: DataType, ctx: CodegenContext): CastFunction = { from match { case BinaryType => -(c, evPrim, evNull) => s"$evPrim = UTF8String.fromBytes($c);" +(c, evPrim, evNull) => code"$evPrim = UTF8String.fromBytes($c);" case DateType => -(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( +(c, evPrim, evNull) => code"""$evPrim = UTF8String.fromString( org.apache.spark.sql.catalyst.util.DateTimeUtils.dateToString($c));""" case TimestampType => -val tz = ctx.addReferenceObj("timeZone", timeZone) -(c, evPrim, evNull) => s"""$evPrim = UTF8String.fromString( +val tz = JavaCode.global(ctx.addReferenceObj("timeZone", timeZone), timeZone.getClass) --- End diff -- It should be the same case as `ctx.addNewFunction`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 **[Test build #93124 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93124/testReport)** for PR 21748 at commit [`bd102b3`](https://github.com/apache/spark/commit/bd102b359bc462409b9c17b28dc8e67c238ac5a6). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21537: [SPARK-24505][SQL] Convert strings in codegen to ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21537#discussion_r202778833 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala --- @@ -740,31 +739,37 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String private def writeMapToStringBuilder( kt: DataType, vt: DataType, - map: String, - buffer: String, - ctx: CodegenContext): String = { + map: ExprValue, + buffer: ExprValue, + ctx: CodegenContext): Block = { def dataToStringFunc(func: String, dataType: DataType) = { val funcName = ctx.freshName(func) val dataToStringCode = castToStringCode(dataType, ctx) - ctx.addNewFunction(funcName, + val data = JavaCode.variable("data", dataType) + val dataStr = JavaCode.variable("dataStr", StringType) + val functionCall = ctx.addNewFunction(funcName, --- End diff -- We need to do it later. `ctx.addNewFunction` is used by too many places. We need to change its return type at all. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21769: [SPARK-24805][SQL] Do not ignore avro files without exte...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21769 **[Test build #93126 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93126/testReport)** for PR 21769 at commit [`85cdf87`](https://github.com/apache/spark/commit/85cdf871ab31d9fada6280917a66557c98938f3c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20949: [SPARK-19018][SQL] Add support for custom encodin...
Github user MaxGekk commented on a diff in the pull request: https://github.com/apache/spark/pull/20949#discussion_r202781709 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -512,6 +513,44 @@ class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with Te } } + test("Save csv with custom charset") { --- End diff -- Could you prepend `SPARK-19018` to the test title. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21748 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/1017/ --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21786: [SPARK-23901][SQL] Removing masking functions
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21786 **[Test build #93120 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93120/testReport)** for PR 21786 at commit [`eb34d46`](https://github.com/apache/spark/commit/eb34d46ad8450a777a5405f2e1f91149962e4f23). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21769: [SPARK-24805][SQL] Do not ignore avro files without exte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21769 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93126/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21769: [SPARK-24805][SQL] Do not ignore avro files without exte...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21769 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21442: [SPARK-24402] [SQL] Optimize `In` expression when only o...
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/21442 @HyukjinKwon thanks for bringing this to my attention. @gatorsmile I thought the bug is found by this PR, and not in this PR. This PR is blocked until SPARK-24443 is addressed. I'll unblocck this PR by turning `In` to `EqualTo` if the `list` is not a `ListQuery` suggested by @cloud-fan . Thanks all. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21468: [SPARK-22151] : PYTHONPATH not picked up from the spark....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21468 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93128/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21765: [MINOR][CORE] Add test cases for RDD.cartesian
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21765 **[Test build #4216 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4216/testReport)** for PR 21765 at commit [`9df4c3b`](https://github.com/apache/spark/commit/9df4c3b4a71082181aa979c3bddf2c3d99db256e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21468: [SPARK-22151] : PYTHONPATH not picked up from the spark....
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21468 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21468: [SPARK-22151] : PYTHONPATH not picked up from the spark....
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21468 **[Test build #93128 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93128/testReport)** for PR 21468 at commit [`5423bef`](https://github.com/apache/spark/commit/5423befa2c27affc0a5a54f02144a34a77af34c4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21786: [SPARK-23901][SQL] Removing masking functions
Github user mn-mikke commented on the issue: https://github.com/apache/spark/pull/21786 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21514: [SPARK-22860] [SPARK-24621] [Core] [WebUI] - hide key pa...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21514 **[Test build #93104 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/93104/testReport)** for PR 21514 at commit [`050b226`](https://github.com/apache/spark/commit/050b226df1340c0c1478bf9f75cdc1fac4142731). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21766: [SPARK-24803][SQL] add support for numeric
Github user rxin commented on the issue: https://github.com/apache/spark/pull/21766 Why did you need this change? Given it's very difficult to revert the change (or introduce a proper numeric type if ever needed in the future), I would not merge this pull request unless there are sufficient justification. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStoreProvider ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21357 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21748: [SPARK-23146][K8S] Support client mode.
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21748#discussion_r202752192 --- Diff: docs/running-on-kubernetes.md --- @@ -117,6 +117,37 @@ If the local proxy is running at localhost:8001, `--master k8s://http://127.0.0. spark-submit. Finally, notice that in the above example we specify a jar with a specific URI with a scheme of `local://`. This URI is the location of the example jar that is already in the Docker image. +## Client Mode + +Starting with Spark 2.4.0, it is possible to run Spark applications on Kubernetes in client mode. When running a Spark +application in client mode, a separate pod is not deployed to run the driver. When running an application in +client mode, it is recommended to account for the following factors: + +### Client Mode Networking + +Spark executors must be able to connect to the Spark driver over a hostname and a port that is routable from the Spark +executors. The specific network configuration that will be required for Spark to work in client mode will vary per +setup. If you run your driver inside a Kubernetes pod, you can use a +[headless service](https://kubernetes.io/docs/concepts/services-networking/service/#headless-services) to allow your +driver pod to be routable from the executors by a stable hostname. Specify the driver's hostname via `spark.driver.host` +and your spark driver's port to `spark.driver.port`. + +### Client Mode Garbage Collection --- End diff -- Can this be renamed to `Executor Pod Garbage Collection in Client Mode`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21782: [SPARK-24816][SQL] SQL interface support repartitionByRa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21782 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/93105/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21748: [SPARK-23146][K8S] Support client mode.
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21748#discussion_r202754518 --- Diff: docs/running-on-kubernetes.md --- @@ -529,8 +600,11 @@ specific to Spark on Kubernetes. spark.kubernetes.driver.pod.name (none) -Name of the driver pod. If not set, the driver pod name is set to "spark.app.name" suffixed by the current timestamp -to avoid name conflicts. +Name of the driver pod. In cluster mode, if this is not set, the driver pod name is set to "spark.app.name" +suffixed by the current timestamp to avoid name conflicts. In client mode, if your application is running +inside a pod, it is highly recommended to set this to the name of the pod your driver is running in. Setting this +value in client mode allows the driver to inform the cluster that your application's executor pods should be --- End diff -- Instead of saying `inform the cluster that your application's executor pods should be deleted ...`, it's better just say `to allows the driver pod to become the owner of your application's executor pods for garbage collection to work`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21782: [SPARK-24816][SQL] SQL interface support repartitionByRa...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21782 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21748: [SPARK-23146][K8S] Support client mode.
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21748#discussion_r202753060 --- Diff: resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterManager.scala --- @@ -35,26 +35,39 @@ private[spark] class KubernetesClusterManager extends ExternalClusterManager wit override def canCreate(masterURL: String): Boolean = masterURL.startsWith("k8s") override def createTaskScheduler(sc: SparkContext, masterURL: String): TaskScheduler = { -if (masterURL.startsWith("k8s") && - sc.deployMode == "client" && - !sc.conf.get(KUBERNETES_DRIVER_SUBMIT_CHECK).getOrElse(false)) { - throw new SparkException("Client mode is currently not supported for Kubernetes.") -} - new TaskSchedulerImpl(sc) } override def createSchedulerBackend( sc: SparkContext, masterURL: String, scheduler: TaskScheduler): SchedulerBackend = { +val wasSparkSubmittedInClusterMode = sc.conf.get(KUBERNETES_DRIVER_SUBMIT_CHECK) +val (authConfPrefix, + apiServerUri, + defaultServiceAccountToken, + defaultServiceAccountCaCrt) = if (wasSparkSubmittedInClusterMode) { + require(sc.conf.get(KUBERNETES_DRIVER_POD_NAME).isDefined, +"If the application is deployed using spark-submit in cluster mode, the driver pod name" + + " must be provided.") --- End diff -- Empty space should be after the previous line. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21748: [SPARK-23146][K8S] Support client mode.
Github user liyinan926 commented on a diff in the pull request: https://github.com/apache/spark/pull/21748#discussion_r202754673 --- Diff: docs/running-on-kubernetes.md --- @@ -486,7 +516,48 @@ specific to Spark on Kubernetes. Service account that is used when running the driver pod. The driver pod uses this service account when requesting executor pods from the API server. Note that this cannot be specified alongside a CA cert file, client key file, -client cert file, and/or OAuth token. +client cert file, and/or OAuth token. In client mode, use spark.kubernetes.authenticate.serviceAccountName instead. + + + + spark.kubernetes.authenticate.caCertFile + (none) + +In client mode, path to the CA cert file for connecting to the Kubernetes API server over TLS when +requesting executors. Specify this as a path as opposed to a URI (i.e. do not provide a scheme). + + + + spark.kubernetes.authenticate.clientKeyFile + (none) + +In client mode, path to the client key file for authenticating against the Kubernetes API server +when requesting executors. Specify this as a path as opposed to a URI (i.e. do not provide a scheme). + + + + spark.kubernetes.authenticate.clientCertFile + (none) + +In client mode, path to the client cert file for authenticating against the Kubernetes API server +when requesting executors. Specify this as a path as opposed to a URI (i.e. do not provide a scheme). + + + + spark.kubernetes.authenticate.oauthToken + (none) + +In client mode, the OAuth token to use when authenticating against the Kubernetes API server when +requesting executors. Note that unlike the other authentication options, this must be the exact string value of +the token to use for the authentication. + + + + spark.kubernetes.authenticate.oauthTokenFile + (none) + +In client mode, path to the file containing the OAuth token to use when authenticating against the Kubernetes API +server from the driver pod when requesting executors. --- End diff -- Delete `from the driver pod` as there might not be a driver pod. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21786: [SPARK-23901][SQL] Removing masking functions
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21786 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org