[GitHub] spark issue #20034: [SPARK-22846][SQL] Fix table owner is null when creating...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20034 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20034: [SPARK-22846][SQL] Fix table owner is null when c...
GitHub user BruceXu1991 opened a pull request: https://github.com/apache/spark/pull/20034 [SPARK-22846][SQL] Fix table owner is null when creating table through spark sql or thriftserver ## What changes were proposed in this pull request? fix table owner is null when create new table through spark sql ## How was this patch tested? manual test. 1ãfirst create an table 2ãselect the table properties in mysql of hive metastore Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/BruceXu1991/spark SPARK-22846 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20034.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20034 commit e8c3035028e6242005806476f5ce7cbdad5af889 Author: xu.wenchun Date: 2017-12-20T13:05:13Z fix SPARK-22846 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20020: [SPARK-22834][SQL] Make insertion commands have r...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20020#discussion_r158019546 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala --- @@ -87,6 +87,51 @@ case class ExecutedCommandExec(cmd: RunnableCommand) extends LeafExecNode { } } +/** + * A physical operator that executes the run method of a `RunnableCommand` and + * saves the result to prevent multiple executions. + * + * @param cmd the `RunnableCommand` this operator will run. + * @param children the children physical plans ran by the `RunnableCommand`. + */ +case class DataWritingCommandExec(cmd: DataWritingCommand, children: Seq[SparkPlan]) + extends SparkPlan { + + override lazy val metrics: Map[String, SQLMetric] = cmd.metrics + + /** + * A concrete command should override this lazy field to wrap up any side effects caused by the + * command or any other computation that should be evaluated exactly once. The value of this field + * can be used as the contents of the corresponding RDD generated from the physical plan of this + * command. + * + * The `execute()` method of all the physical command classes should reference `sideEffectResult` + * so that the command can be executed eagerly right after the command query is created. + */ + protected[sql] lazy val sideEffectResult: Seq[InternalRow] = { +val converter = CatalystTypeConverters.createToCatalystConverter(schema) +val rows = cmd.run(sqlContext.sparkSession, children) + +rows.map(converter(_).asInstanceOf[InternalRow]) + } + + override def innerChildren: Seq[QueryPlan[_]] = cmd.children --- End diff -- why do we need this? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20033: [SPARK-22847] [CORE] Remove duplicate code in AppStatusL...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20033 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20020: [SPARK-22834][SQL] Make insertion commands have r...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20020#discussion_r158018917 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/DataWritingCommand.scala --- @@ -57,4 +45,8 @@ trait DataWritingCommand extends RunnableCommand { val serializableHadoopConf = new SerializableConfiguration(hadoopConf) new BasicWriteJobStatsTracker(serializableHadoopConf, metrics) } + + def run(sparkSession: SparkSession, children: Seq[SparkPlan]): Seq[Row] = { +throw new NotImplementedError --- End diff -- shall we force all sub-classes to implement it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20008 **[Test build #85187 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85187/testReport)** for PR 20008 at commit [`3291339`](https://github.com/apache/spark/commit/3291339bfa643f12e9d5c3d7cb68c02617f22afa). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20033: [SPARK-22847] [CORE] Remove duplicate code in App...
GitHub user Ngone51 opened a pull request: https://github.com/apache/spark/pull/20033 [SPARK-22847] [CORE] Remove duplicate code in AppStatusListener while assigning schedulingPool for stage ## What changes were proposed in this pull request? In AppStatusListener's onStageSubmitted(event: SparkListenerStageSubmitted) method, there are duplicate code: ``` // schedulingPool was assigned twice with the same code stage.schedulingPool = Option(event.properties).flatMap { p => Option(p.getProperty("spark.scheduler.pool")) }.getOrElse(SparkUI.DEFAULT_POOL_NAME) ``` But, it does not make any sense to do this and there are no comment to explain for this. ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/Ngone51/spark dev-spark-22847 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20033.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20033 commit ace04a5c75a0dc46e0575677be6be77ab6b58895 Author: wuyi Date: 2017-12-20T13:03:49Z remove duplicate code in AppStatusListener while assigning schedulingPool for stage --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19218 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85173/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19218 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19218 **[Test build #85173 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85173/testReport)** for PR 19218 at commit [`0cb7b7a`](https://github.com/apache/spark/commit/0cb7b7af517f37c428faa394f652bc564ecb097f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCo...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/20008#discussion_r158016336 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -252,7 +252,7 @@ case class SpecifiedWindowFrame( case e: Expression if !frameType.inputType.acceptsType(e.dataType) => TypeCheckFailure( s"The data type of the $location bound '${e.dataType} does not match " + - s"the expected data type '${frameType.inputType}'.") + s"the expected data type '${frameType.inputType.simpleString}'.") --- End diff -- Otherwise the result is: ``` cannot resolve 'RANGE BETWEEN CURRENT ROW AND CAST(1 AS STRING) FOLLOWING' due to data type mismatch: The data type of the upper bound 'StringType does not match the expected data type 'org.apache.spark.sql.types.TypeCollection@7ff36201'.; line 1 pos 21 ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20008 **[Test build #85186 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85186/testReport)** for PR 20008 at commit [`19bcca1`](https://github.com/apache/spark/commit/19bcca13ab03c9a5cb5399476e1afac26a30ec49). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/20021 Oh, you are right. I misunderstood. After our optimizations, output is also a part of `arguments`. Let me check others again. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19904 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85183/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19904 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19904 **[Test build #85183 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85183/testReport)** for PR 19904 at commit [`cad2104`](https://github.com/apache/spark/commit/cad210439b7a0bc3eb870f1d68fd96fbd0763aa8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20030: [SPARK-10496][CORE] Efficient RDD cumulative sum
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20030 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85172/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20030: [SPARK-10496][CORE] Efficient RDD cumulative sum
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20030 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20030: [SPARK-10496][CORE] Efficient RDD cumulative sum
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20030 **[Test build #85172 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85172/testReport)** for PR 20030 at commit [`4f1d5e2`](https://github.com/apache/spark/commit/4f1d5e269c5f84f6126fea97c201b6cd6fef461f). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19498 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19498 **[Test build #85184 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85184/testReport)** for PR 19498 at commit [`174ec21`](https://github.com/apache/spark/commit/174ec2139a7e0af049e2954494525fd3fff145e2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19498 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85184/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20032: [SPARK-22845] [Scheduler] Modify spark.kubernetes.alloca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20032 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85185/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20032: [SPARK-22845] [Scheduler] Modify spark.kubernetes.alloca...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20032 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20032: [SPARK-22845] [Scheduler] Modify spark.kubernetes.alloca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20032 **[Test build #85185 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85185/testReport)** for PR 20032 at commit [`48a3326`](https://github.com/apache/spark/commit/48a3326faaea69bf74d97d028bffdd0552777ffe). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - Document...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19946 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - Document...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19946 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85167/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - Document...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19946 **[Test build #85167 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85167/testReport)** for PR 19946 at commit [`74ac5c9`](https://github.com/apache/spark/commit/74ac5c9e5b495d0133e8e1378867a43f2bc1ff4a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...
Github user yashs360 commented on the issue: https://github.com/apache/spark/pull/18029 Hi @brkyvz , I've added the new changes with the java classes. Had to make the classes serializable for passing them to the KinesisReceiver. Please have a look when you get time. Thanks. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - D...
Github user foxish commented on a diff in the pull request: https://github.com/apache/spark/pull/19946#discussion_r158008588 --- Diff: docs/running-on-kubernetes.md --- @@ -0,0 +1,498 @@ +--- +layout: global +title: Running Spark on Kubernetes +--- +* This will become a table of contents (this text will be scraped). +{:toc} + +Spark can run on clusters managed by [Kubernetes](https://kubernetes.io). This feature makes use of the new experimental native +Kubernetes scheduler that has been added to Spark. + +# Prerequisites + +* A runnable distribution of Spark 2.3 or above. +* A running Kubernetes cluster at version >= 1.6 with access configured to it using +[kubectl](https://kubernetes.io/docs/user-guide/prereqs/). If you do not already have a working Kubernetes cluster, +you may setup a test cluster on your local machine using +[minikube](https://kubernetes.io/docs/getting-started-guides/minikube/). + * We recommend using the latest releases of minikube be updated to the most recent version with the DNS addon enabled. +* You must have appropriate permissions to list, create, edit and delete +[pods](https://kubernetes.io/docs/user-guide/pods/) in your cluster. You can verify that you can list these resources +by running `kubectl auth can-i pods`. + * The service account credentials used by the driver pods must be allowed to create pods, services and configmaps. +* You must have [Kubernetes DNS](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/) configured in your cluster. + +# How it works + + + + + +spark-submit can be directly used to submit a Spark application to a Kubernetes cluster. The mechanism by which spark-submit happens is as follows: + +* Spark creates a spark driver running within a [Kubernetes pod](https://kubernetes.io/docs/concepts/workloads/pods/pod/). +* The driver creates executors which are also running within Kubernetes pods and connects to them, and executes application code. +* When the application completes, the executor pods terminate and are cleaned up, but the driver pod persists +logs and remains in "completed" state in the Kubernetes API till it's eventually garbage collected or manually cleaned up. + +Note that in the completed state, the driver pod does *not* use any computational or memory resources. + +The driver and executor pod scheduling is handled by Kubernetes. It will be possible to affect Kubernetes scheduling +decisions for driver and executor pods using advanced primitives like +[node selectors](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector) +and [node/pod affinities](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity) +in a future release. + +# Submitting Applications to Kubernetes + +## Docker Images + +Kubernetes requires users to supply images that can be deployed into containers within pods. The images are built to +be run in a container runtime environment that Kubernetes supports. Docker is a container runtime environment that is +frequently used with Kubernetes. With Spark 2.3, there are Dockerfiles provided in the runnable distribution that can be customized +and built for your usage. + +You may build these docker images from sources. +There is a script, `sbin/build-push-docker-images.sh` that you can use to build and push +customized spark distribution images consisting of all the above components. + +Example usage is: + +./sbin/build-push-docker-images.sh -r -t my-tag build +./sbin/build-push-docker-images.sh -r -t my-tag push + +Docker files are under the `dockerfiles/` and can be customized further before +building using the supplied script, or manually. + +## Cluster Mode + +To launch Spark Pi in cluster mode, + +{% highlight bash %} +$ bin/spark-submit \ +--deploy-mode cluster \ +--class org.apache.spark.examples.SparkPi \ +--master k8s://https://: \ +--conf spark.kubernetes.namespace=default \ +--conf spark.executor.instances=5 \ +--conf spark.app.name=spark-pi \ +--conf spark.kubernetes.driver.docker.image= \ +--conf spark.kubernetes.executor.docker.image= \ +local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar +{% endhighlight %} + +The Spark master, specified either via passing the `--master` command line argument to `spark-submit` or by setting +`spark.master` in the application's configuration, must be a URL with the format `k8s://`. Prefixing the +master string with `k8s://` will cause the Spark application to launch on the Kubernetes cluster, with the API server +being contacted at `api_server_url`. I
[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18029 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85181/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18029 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18029 **[Test build #85181 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85181/testReport)** for PR 18029 at commit [`3c16c47`](https://github.com/apache/spark/commit/3c16c478257c8aed61b1cef4d75360b8bb8b166d). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class KinesisInitialPositions ` * `public static class Latest implements KinesisInitialPosition, Serializable ` * `public static class TrimHorizon implements KinesisInitialPosition, Serializable ` * `public static class AtTimestamp implements KinesisInitialPosition, Serializable ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20032: [SPARK-22845] [Scheduler] Modify spark.kubernetes.alloca...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20032 **[Test build #85185 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85185/testReport)** for PR 20032 at commit [`48a3326`](https://github.com/apache/spark/commit/48a3326faaea69bf74d97d028bffdd0552777ffe). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20032: [SPARK-22845] [Scheduler] Modify spark.kubernetes...
GitHub user foxish opened a pull request: https://github.com/apache/spark/pull/20032 [SPARK-22845] [Scheduler] Modify spark.kubernetes.allocation.batch.delay to take time instead of int ## What changes were proposed in this pull request? Fixing configuration that was taking an int which should take time. Discussion in https://github.com/apache/spark/pull/19946#discussion_r156682354 Made the granularity milliseconds as opposed to seconds since there's a use-case for sub-second reactions to scale-up rapidly especially with dynamic allocation. ## How was this patch tested? TODO: manual run of integration tests against this PR. PTAL cc/ @mccheah @liyinan926 @kimoonkim @vanzin @mridulm @jiangxb1987 @ueshin You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache-spark-on-k8s/spark fix-time-conf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20032.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20032 commit 48a3326faaea69bf74d97d028bffdd0552777ffe Author: foxish Date: 2017-12-20T12:03:07Z Change config to support millisecond based timeconf --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20002: [SPARK-22465][Core][WIP] Add a safety-check to RDD defau...
Github user sujithjay commented on the issue: https://github.com/apache/spark/pull/20002 @tgravescs , could you please take a look when you have some time ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19498 **[Test build #85184 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85184/testReport)** for PR 19498 at commit [`174ec21`](https://github.com/apache/spark/commit/174ec2139a7e0af049e2954494525fd3fff145e2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20023 @cloud-fan yes, Hive changed and most important at the moment we are not compliant with SQL standard. So currently Spark is returning results which are different from Hive and not compliant with SQL standard. This is why I proposed this change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19498 Hi @brkyvz, could you take a look please? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19498: [SPARK-17756][PYTHON][STREAMING] Workaround to avoid ret...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19498 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20023: [SPARK-22036][SQL] Decimal multiplication with high prec...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20023 Ideally we should not change behaviors as possible as we can, but since this behavior is from Hive and Hive also changed it, might be OK to follow Hive and also change it? cc @hvanhovell too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/19977#discussion_r158005532 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -566,6 +568,21 @@ object TypeCoercion { } } + /** + * When all inputs in [[Concat]] are binary, coerces an output type to binary + */ + case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule { --- End diff -- ok --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #19977: [SPARK-22771][SQL] Concatenate binary inputs into...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19977#discussion_r158004864 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -566,6 +568,21 @@ object TypeCoercion { } } + /** + * When all inputs in [[Concat]] are binary, coerces an output type to binary + */ + case class ConcatCoercion(conf: SQLConf) extends TypeCoercionRule { --- End diff -- I think we should do it in this PR, because this is a new requirement for the new behavior introduced in this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20021 > I checked some call sites. Here is one example that `extraArguments` has `ev.value` instead of local variable. Hey, `ev.value` is not from children, it's the output of the current expression, which we can make sure it's local variable, e.g. https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala#L296 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19884 For https://github.com/apache/spark/pull/19884#issuecomment-352993779, > I have seen few tests failed in Python 2 and PyPy with Pandas and PyArrow in my local if i am not mistaken and if I remember correctly but haven't got really enough time to check if it is an actual issue and file a JIRA. But, I am pretty sure some tests will fail after the upgrade (or installation) of Pandas and PyArrow. Just double checked in my local by: Python 3: pandas (0.19.2)/ pyarrow (0.4.1) - all pass ``` ./run-tests --python-executables=python3 --modules pyspark-sql ``` PyPy 5.8.0: pandas (0.21.1) / no pyarrow - all pass ``` ./run-tests --python-executables=pypy --modules pyspark-sql ``` Python 2.7: pandas (0.20.2) / pyarrow (0.4.1) - several few tests look constantly failed (3 times). ``` ./run-tests --python-executables=python2.7 --modules pyspark-sql ``` for example as below: ``` ..E... == ERROR [2.557s]: test_createDataFrame_respect_session_timezone (pyspark.sql.tests.ArrowTests) -- Traceback (most recent call last): File "/...spark/python/pyspark/sql/tests.py", line 3284, in test_createDataFrame_respect_session_timezone self.assertEqual(result_la, result_arrow_la) AssertionError: Lists differ: [Row(1_str_t=u'a', 2_int_t=1, ... != [Row(1_str_t=u'a', 2_int_t=1, ... First differing element 0: Row(1_str_t=u'a', 2_int_t=1, 3_long_t=10, 4_float_t=0.2000298023224, 5_double_t=2.0, 6_date_t=datetime.date(1969, 1, 1), 7_timestamp_t=datetime.datetime(1969, 1, 1, 1, 1, 1)) Row(1_str_t=u'a', 2_int_t=1, 3_long_t=10, 4_float_t=0.2000298023224, 5_double_t=2.0, 6_date_t=datetime.date(1969, 1, 1), 7_timestamp_t=datetime.datetime(1968, 12, 31, 8, 1, 1)) Diff is 2160 characters long. Set self.maxDiff to None to see it. == ERROR [0.209s]: test_createDataFrame_toggle (pyspark.sql.tests.ArrowTests) -- Traceback (most recent call last): File "/...spark/python/pyspark/sql/tests.py", line 3270, in test_createDataFrame_toggle self.assertEquals(df_no_arrow.collect(), df_arrow.collect()) AssertionError: Lists differ: [Row(1_str_t=u'a', 2_int_t=1, ... != [Row(1_str_t=u'a', 2_int_t=1, ... First differing element 0: Row(1_str_t=u'a', 2_int_t=1, 3_long_t=10, 4_float_t=0.2000298023224, 5_double_t=2.0, 6_date_t=datetime.date(1969, 1, 1), 7_timestamp_t=datetime.datetime(1969, 1, 1, 18, 1, 1)) Row(1_str_t=u'a', 2_int_t=1, 3_long_t=10, 4_float_t=0.2000298023224, 5_double_t=2.0, 6_date_t=datetime.date(1969, 1, 1), 7_timestamp_t=datetime.datetime(1969, 1, 1, 1, 1, 1)) Diff is 2160 characters long. Set self.maxDiff to None to see it. == ERROR [0.166s]: test_toPandas_arrow_toggle (pyspark.sql.tests.ArrowTests) -- Traceback (most recent call last): File "/...spark/python/pyspark/sql/tests.py", line 3216, in test_toPandas_arrow_toggle self.assertFramesEqual(pdf_arrow, pdf) File "/...spark/python/pyspark/sql/tests.py", line 3178, in assertFramesEqual self.assertTrue(df_without.equals(df_with_arrow), msg=msg) AssertionError: DataFrame from Arrow is not equal With Arrow: 1_str_t 2_int_t 3_long_t 4_float_t 5_double_t 6_date_t \ 7_timestamp_t dtype: object Without: 1_str_t 2_int_t 3_long_t 4_float_t 5_double_t 6_date_t \ 7_timestamp_t dtype: object == ERROR [0.182s]: test_toPandas_respect_session_timezone (pyspark.sql.tests.ArrowTests) -- Traceback (most recent call last): File "/...spark/python/pyspark/sql/tests.py", line 3227, in test_toPandas_respect_session_timezone self.assertFramesEqual(pdf_arrow_la, pdf_la) File "/...spark/python/pyspark/sql/tests.py", line 3178, in assertFramesEqual self.assertTrue(df_without.equals(df_with_arrow), msg=msg) AssertionError: DataFrame from Arrow is not equal With Arrow: 1_str_t 2_int_t 3_long_t 4_float_t 5_double_t 6_date_t \ 7_timestamp_t dtype: object Without: 1_str_t 2_int_t 3_long_t 4_float_t 5_double_t 6_date_t \ 7_timestamp_
[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19904 **[Test build #85183 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85183/testReport)** for PR 19904 at commit [`cad2104`](https://github.com/apache/spark/commit/cad210439b7a0bc3eb870f1d68fd96fbd0763aa8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19977 **[Test build #85182 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85182/testReport)** for PR 19977 at commit [`fc14aeb`](https://github.com/apache/spark/commit/fc14aeb4e92e67aba1750fc1bc2b0fc9afaa5fac). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19977: [SPARK-22771][SQL] Concatenate binary inputs into a bina...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19977 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18029 **[Test build #85181 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85181/testReport)** for PR 18029 at commit [`3c16c47`](https://github.com/apache/spark/commit/3c16c478257c8aed61b1cef4d75360b8bb8b166d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19991: [SPARK-22801][ML][PYSPARK] Allow FeatureHasher to treat ...
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/19991 @holdenk @sethah any other comments? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19884 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85165/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19884 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19884 **[Test build #85165 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85165/testReport)** for PR 19884 at commit [`d92ae90`](https://github.com/apache/spark/commit/d92ae90e05f55955eaad8e7f55e6324bf333a6bc). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...
Github user foxish commented on the issue: https://github.com/apache/spark/pull/19954 > I don't think they are independent as architecturally they make sense together and represent a single concern: enabling use of remote dependencies through init-containers. Missing any one of the three makes the feature unusable. I would also argue that it won't necessarily make review easier as reviewers need to mentally connect them together to make sense of each change set. I agree with this. This is pretty much one cohesive unit and splitting it up is going to probably lead to more difficulty in understanding it. From your comments @vanzin, it seems we definitely do need a good refactor here, and the community can undertake that in Q1 2018. This approach and code has been functionally tested over the last 3 releases of our fork - and I'd be fairly confident about its efficacy - broad changes at this point seem riskier to me from a 2.3 release perspective given that we're still in the process of improving spark-k8s integration testing coverage against apache/spark. cc/ @mccheah --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20021 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20021 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85164/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20021: [SPARK-22668][SQL] Ensure no global variables in argumen...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20021 **[Test build #85164 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85164/testReport)** for PR 20021 at commit [`3d44195`](https://github.com/apache/spark/commit/3d44195f48c1688d7dc5b87fd0c9f07c1535000b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20020: [SPARK-22834][SQL] Make insertion commands have real chi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20020 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85161/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20020: [SPARK-22834][SQL] Make insertion commands have real chi...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20020 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20020: [SPARK-22834][SQL] Make insertion commands have real chi...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20020 **[Test build #85161 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85161/testReport)** for PR 20020 at commit [`e25a9eb`](https://github.com/apache/spark/commit/e25a9eb285d56a771a56b77534413be59b9f111b). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait DataWritingCommand extends Command ` * `case class DataWritingCommandExec(cmd: DataWritingCommand, children: Seq[SparkPlan])` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19804: [WIP][SPARK-22573][SQL] Shouldn't inferFilters if it con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19804 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85163/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19804: [WIP][SPARK-22573][SQL] Shouldn't inferFilters if it con...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19804 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19804: [WIP][SPARK-22573][SQL] Shouldn't inferFilters if it con...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19804 **[Test build #85163 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85163/testReport)** for PR 19804 at commit [`edd0434`](https://github.com/apache/spark/commit/edd0434b710a764c7be2ea94242dd7ea5ce6ace7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20031: [SPARK-22844][R] Adds date_trunc in R API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20031 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20031: [SPARK-22844][R] Adds date_trunc in R API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20031 **[Test build #85177 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85177/testReport)** for PR 20031 at commit [`1c3e956`](https://github.com/apache/spark/commit/1c3e956313b78da492f917c003c38e981cce7877). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20031: [SPARK-22844][R] Adds date_trunc in R API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20031 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85177/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19884 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85160/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19884 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19884 **[Test build #85160 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85160/testReport)** for PR 19884 at commit [`0047f7a`](https://github.com/apache/spark/commit/0047f7a6560bfbb46d7ee28df0c2781f7538b907). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19096 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19096 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85180/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19096 **[Test build #85180 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85180/testReport)** for PR 19096 at commit [`edd5bc3`](https://github.com/apache/spark/commit/edd5bc3f4511c5825f1334e5d237eaa6de21d3d5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20031: [SPARK-22844][R] Adds date_trunc in R API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20031 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20031: [SPARK-22844][R] Adds date_trunc in R API
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20031 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85175/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20031: [SPARK-22844][R] Adds date_trunc in R API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20031 **[Test build #85175 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85175/testReport)** for PR 20031 at commit [`c54abe9`](https://github.com/apache/spark/commit/c54abe9dfad3dd7447209807826e79b1682f028c). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19096 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85176/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19096 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19096 **[Test build #85176 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85176/testReport)** for PR 19096 at commit [`dbabbf9`](https://github.com/apache/spark/commit/dbabbf96c26a6b32f1618c13f7cb142bf64634d3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19096 **[Test build #85180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85180/testReport)** for PR 19096 at commit [`edd5bc3`](https://github.com/apache/spark/commit/edd5bc3f4511c5825f1334e5d237eaa6de21d3d5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - Document...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19946 **[Test build #85179 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85179/testReport)** for PR 19946 at commit [`702162b`](https://github.com/apache/spark/commit/702162b4ca9eab83adb0b362d5b4d9479b6b3d0a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19096 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85174/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19096 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19096 **[Test build #85174 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85174/testReport)** for PR 19096 at commit [`488c70f`](https://github.com/apache/spark/commit/488c70fb3de0e3fde72d8d94e46e6af16ae41a65). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20008 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85162/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20008 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20008: [SPARK-22822][TEST] Basic tests for WindowFrameCoercion ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20008 **[Test build #85162 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85162/testReport)** for PR 20008 at commit [`b4c5339`](https://github.com/apache/spark/commit/b4c5339cc44eeed7e75aac30ba6b6aaf06316305). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19946: [SPARK-22648] [Scheduler] Spark on Kubernetes - Document...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19946 **[Test build #85178 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85178/testReport)** for PR 19946 at commit [`d235847`](https://github.com/apache/spark/commit/d2358470e86ab44522371b8fd733a97527d95ec5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20031: [SPARK-22844][R] Adds date_trunc in R API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20031 **[Test build #85177 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85177/testReport)** for PR 20031 at commit [`1c3e956`](https://github.com/apache/spark/commit/1c3e956313b78da492f917c003c38e981cce7877). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20031: [SPARK-22844][R] Adds date_trunc in R API
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20031 cc @felixcheung, could you take a look when you have some time? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18029 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18029 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85171/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18029 **[Test build #85171 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85171/testReport)** for PR 18029 at commit [`e18fdaa`](https://github.com/apache/spark/commit/e18fdaa2b9c70f58b57fc564c137a2dce51d2b25). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `public class KinesisInitialPositions ` * `public static class Latest implements KinesisInitialPosition ` * `public static class TrimHorizon implements KinesisInitialPosition ` * `public static class AtTimestamp implements KinesisInitialPosition ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19096 **[Test build #85176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85176/testReport)** for PR 19096 at commit [`dbabbf9`](https://github.com/apache/spark/commit/dbabbf96c26a6b32f1618c13f7cb142bf64634d3). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19096 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19096 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85170/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19096 **[Test build #85170 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85170/testReport)** for PR 19096 at commit [`f609d7e`](https://github.com/apache/spark/commit/f609d7e23bf0675525f22a151f7c80eb0fa15a73). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20031: [SPARK-22844][R] Adds date_trunc in R API
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20031 **[Test build #85175 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85175/testReport)** for PR 20031 at commit [`c54abe9`](https://github.com/apache/spark/commit/c54abe9dfad3dd7447209807826e79b1682f028c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20031: [SPARK-22844][R] Adds date_trunc in R API
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/20031 [SPARK-22844][R] Adds date_trunc in R API ## What changes were proposed in this pull request? This PR adds `date_trunc` in R API as below: ```r > df <- createDataFrame(list(list(a = as.POSIXlt("2012-12-13 12:34:00" > head(select(df, date_trunc("hour", df$a))) date_trunc(hour, a) 1 2012-12-13 12:00:00 ``` ## How was this patch tested? Unit tests added in `R/pkg/tests/fulltests/test_sparkSQL.R`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HyukjinKwon/spark r-datetrunc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20031.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20031 commit c54abe9dfad3dd7447209807826e79b1682f028c Author: hyukjinkwon Date: 2017-12-20T09:53:53Z Adds date_trunc in R API --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19096 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #19096: [SPARK-21869][SS] A cached Kafka producer should not be ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19096 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85169/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org