[GitHub] [spark] gaborgsomogyi commented on issue #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka
gaborgsomogyi commented on issue #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka URL: https://github.com/apache/spark/pull/26153#issuecomment-543718974 > What would you suggest? Aren't batch, streaming, and continuous streaming using the same KafkaRowWriter class under the hood? I've had a deeper look and I tend to agree to depend on micro-batch tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin
SparkQA commented on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin URL: https://github.com/apache/spark/pull/26164#issuecomment-543738613 **[Test build #112276 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112276/testReport)** for PR 26164 at commit [`f9567d5`](https://github.com/apache/spark/commit/f9567d58105fb92d73e8118d96c8c58abcd2414f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected
SparkQA commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected URL: https://github.com/apache/spark/pull/26080#issuecomment-543738656 **[Test build #112277 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112277/testReport)** for PR 26080 at commit [`3077f5b`](https://github.com/apache/spark/commit/3077f5b7f5d8ca4c37d2308203187042bce381cc). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin
AmplabJenkins commented on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin URL: https://github.com/apache/spark/pull/26164#issuecomment-543739224 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin
AmplabJenkins commented on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin URL: https://github.com/apache/spark/pull/26164#issuecomment-543739239 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17259/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin
SparkQA removed a comment on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin URL: https://github.com/apache/spark/pull/26164#issuecomment-543738613 **[Test build #112276 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112276/testReport)** for PR 26164 at commit [`f9567d5`](https://github.com/apache/spark/commit/f9567d58105fb92d73e8118d96c8c58abcd2414f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin
AmplabJenkins removed a comment on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin URL: https://github.com/apache/spark/pull/26164#issuecomment-543744501 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin
AmplabJenkins commented on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin URL: https://github.com/apache/spark/pull/26164#issuecomment-543744519 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112276/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] redsk commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka
redsk commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka URL: https://github.com/apache/spark/pull/26153#discussion_r336506341 ## File path: docs/structured-streaming-kafka-integration.md ## @@ -622,6 +626,10 @@ a ```null``` valued key column will be automatically added (see Kafka semantics how ```null``` valued key values are handled). If a topic column exists then its value is used as the topic when writing the given row to Kafka, unless the "topic" configuration option is set i.e., the "topic" configuration option overrides the topic column. +If a partition column is not specified then the partition is calculated by the Kafka producer +(using ```org.apache.kafka.clients.producer.internals.DefaultPartitioner```). +This can be overridden in Spark by setting the ```kafka.partitioner.class``` option. Review comment: Do we still need test 1 (`kafka.partitioner.class` overrides default partitioner)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto commented on issue #26161: [SPARK-27900][K8s] Add jvm oom flag in cluster mode
skonto commented on issue #26161: [SPARK-27900][K8s] Add jvm oom flag in cluster mode URL: https://github.com/apache/spark/pull/26161#issuecomment-543645687 @holdenk @erikerlandson pls review. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26129: [SPARK-29482][SQL] ANALYZE TABLE should look up catalog/table like v2 commands
SparkQA removed a comment on issue #26129: [SPARK-29482][SQL] ANALYZE TABLE should look up catalog/table like v2 commands URL: https://github.com/apache/spark/pull/26129#issuecomment-543559210 **[Test build #112259 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112259/testReport)** for PR 26129 at commit [`78e1495`](https://github.com/apache/spark/commit/78e1495fa3a5e85af4795fafc9c3a72ae40a8038). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS show '!=', '<>' , 'between', 'case'
SparkQA commented on issue #26053: [SPARK-29379][SQL]SHOW FUNCTIONS show '!=', '<>' , 'between', 'case' URL: https://github.com/apache/spark/pull/26053#issuecomment-543683653 **[Test build #112263 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112263/testReport)** for PR 26053 at commit [`7ac4d16`](https://github.com/apache/spark/commit/7ac4d1612a99c3b2aaec93accab92a5c7dfea8aa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26157: [SPARK-28560][SQL][followup] resolve the remaining comments for PR#25295
AmplabJenkins removed a comment on issue #26157: [SPARK-28560][SQL][followup] resolve the remaining comments for PR#25295 URL: https://github.com/apache/spark/pull/26157#issuecomment-543690465 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112264/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership
SparkQA commented on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership URL: https://github.com/apache/spark/pull/26160#issuecomment-543690800 **[Test build #112271 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112271/testReport)** for PR 26160 at commit [`86d8325`](https://github.com/apache/spark/commit/86d8325ed10444074ccfbcb145f511492db5cf44). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26157: [SPARK-28560][SQL][followup] resolve the remaining comments for PR#25295
AmplabJenkins removed a comment on issue #26157: [SPARK-28560][SQL][followup] resolve the remaining comments for PR#25295 URL: https://github.com/apache/spark/pull/26157#issuecomment-543690455 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gaborgsomogyi commented on issue #26158: [MINOR][SQL][SS] Deduplicate codes from Kafka data source
gaborgsomogyi commented on issue #26158: [MINOR][SQL][SS] Deduplicate codes from Kafka data source URL: https://github.com/apache/spark/pull/26158#issuecomment-543696357 I think the direction is good but not sure it's a minor stuff. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] skonto commented on a change in pull request #25870: [SPARK-27936][K8S] support python deps
skonto commented on a change in pull request #25870: [SPARK-27936][K8S] support python deps URL: https://github.com/apache/spark/pull/25870#discussion_r336457149 ## File path: resource-managers/kubernetes/integration-tests/src/test/scala/org/apache/spark/deploy/k8s/integrationtest/DepsTestsSuite.scala ## @@ -183,6 +187,62 @@ private[spark] trait DepsTestsSuite { k8sSuite: KubernetesSuite => } } + test("Launcher python client dependencies using py", k8sTestTag, MinikubeTag) { +val depsFile = Utils.getTestFileAbsolutePath("py_container_checks.py", sparkHomeDir) +testPythonDeps(depsFile) + } + + test("Launcher python client dependencies using a zip file", k8sTestTag, MinikubeTag) { +val inDepsFile = Utils.getTestFileAbsolutePath("py_container_checks.py", sparkHomeDir) +val outDepsFile = s"${inDepsFile.substring(0, inDepsFile.lastIndexOf("."))}.zip" +Utils.createZipFile(inDepsFile, outDepsFile) +testPythonDeps(outDepsFile) + } + + private def testPythonDeps(depsFile: String): Unit = { +try { + setupCephStorage() + val cephUrlStr = getServiceUrl(svcName) + val cephUrl = new URL(cephUrlStr) + val cephHost = cephUrl.getHost + val cephPort = cephUrl.getPort + val examplesJar = Utils.getTestFileAbsolutePath(Utils.getExamplesJarName(), sparkHomeDir) + + val (accessKey, secretKey) = getCephCredentials() + sparkAppConf +.set("spark.kubernetes.container.image", pyImage) +.set("spark.kubernetes.pyspark.pythonVersion", "2") Review comment: I got used to 2 for years will miss it :) Sure will change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] redsk commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka
redsk commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka URL: https://github.com/apache/spark/pull/26153#discussion_r336457378 ## File path: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala ## @@ -404,20 +420,85 @@ abstract class KafkaSinkBatchSuiteBase extends KafkaSinkSuiteBase { .save() checkAnswer( createKafkaReader(topic, includeHeaders = true).selectExpr( -"CAST(value as STRING) value", "headers" +"CAST(value as STRING) value", "headers", "partition" ), - Row("1", Seq(Row("a", "b".getBytes(UTF_8 :: -Row("2", Seq(Row("c", "d".getBytes(UTF_8)), Row("e", "f".getBytes(UTF_8 :: -Row("3", Seq(Row("g", "h".getBytes(UTF_8)), Row("g", "i".getBytes(UTF_8 :: -Row("4", null) :: + Row("1", Seq(Row("a", "b".getBytes(UTF_8))), 0) :: +Row("2", Seq(Row("c", "d".getBytes(UTF_8)), Row("e", "f".getBytes(UTF_8))), 1) :: +Row("3", Seq(Row("g", "h".getBytes(UTF_8)), Row("g", "i".getBytes(UTF_8))), 2) :: +Row("4", null, 3) :: Row("5", Seq( Row("j", "k".getBytes(UTF_8)), Row("j", "l".getBytes(UTF_8)), - Row("m", "n".getBytes(UTF_8 :: + Row("m", "n".getBytes(UTF_8))), 0) :: Nil ) } + test("batch - partition column vs default Kafka partitioner") { +val fixedKey = "fixed_key" +val nrPartitions = 100 + +// default Kafka partitioner calculate partition deterministically based on the key +val keyTopic = newTopic() +testUtils.createTopic(keyTopic, nrPartitions) + +Seq((keyTopic, fixedKey, "value")) + .toDF("topic", "key", "value") + .write + .format("kafka") + .option("kafka.bootstrap.servers", testUtils.brokerAddress) + .option("topic", keyTopic) + .mode("append") + .save() + +// getting the partition corresponding to the fixed key +val keyPartition = createKafkaReader(keyTopic).select("partition") + .map(_.getInt(0)).collect().toList.head + +val topic = newTopic() +testUtils.createTopic(topic, nrPartitions) + +// even values use default kafka partitioner, odd use 'n' +val df = (0 until 100) + .map(n => (topic, fixedKey, s"$n", if (n % 2 == 0) None else Some(n))) + .toDF("topic", "key", "value", "partition") + +df.write + .format("kafka") + .option("kafka.bootstrap.servers", testUtils.brokerAddress) + .option("topic", topic) + .mode("append") + .save() + +checkAnswer( + createKafkaReader(topic).selectExpr( +"CAST(key as STRING) key", "CAST(value as STRING) value", "partition" + ), + (0 until 100) +.map(n => (fixedKey, s"$n", if (n % 2 == 0) keyPartition else n)) +.toDF("key", "value", "partition") +) + } + + test("batch - non-existing partitions trigger standard Kafka exception") { Review comment: Ok, I'll remove it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership
AmplabJenkins removed a comment on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership URL: https://github.com/apache/spark/pull/26160#issuecomment-543715650 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership
AmplabJenkins removed a comment on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership URL: https://github.com/apache/spark/pull/26160#issuecomment-543715656 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17257/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership
AmplabJenkins commented on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership URL: https://github.com/apache/spark/pull/26160#issuecomment-543715656 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17257/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership
AmplabJenkins commented on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership URL: https://github.com/apache/spark/pull/26160#issuecomment-543715650 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26161: [SPARK-27900][K8s] Add jvm oom flag in cluster mode
AmplabJenkins removed a comment on issue #26161: [SPARK-27900][K8s] Add jvm oom flag in cluster mode URL: https://github.com/apache/spark/pull/26161#issuecomment-543717546 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26161: [SPARK-27900][K8s] Add jvm oom flag in cluster mode
AmplabJenkins removed a comment on issue #26161: [SPARK-27900][K8s] Add jvm oom flag in cluster mode URL: https://github.com/apache/spark/pull/26161#issuecomment-543717558 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112270/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #26158: [SPARK-29509][SQL][SS] Deduplicate codes from Kafka data source
HeartSaVioR commented on issue #26158: [SPARK-29509][SQL][SS] Deduplicate codes from Kafka data source URL: https://github.com/apache/spark/pull/26158#issuecomment-543737818 Thanks for the suggestion. Filed an issue and changed the title. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25870: [SPARK-27936][K8S] support python deps
SparkQA commented on issue #25870: [SPARK-27936][K8S] support python deps URL: https://github.com/apache/spark/pull/25870#issuecomment-543737420 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/17258/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking commented on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin
xuanyuanking commented on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin URL: https://github.com/apache/spark/pull/26164#issuecomment-543738534 cc @cloud-fan @gatorsmile This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26120: [SPARK-29014][SQL] DataSourceV2: Fix current/default catalog usage
AmplabJenkins commented on issue #26120: [SPARK-29014][SQL] DataSourceV2: Fix current/default catalog usage URL: https://github.com/apache/spark/pull/26120#issuecomment-543743384 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112269/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26120: [SPARK-29014][SQL] DataSourceV2: Fix current/default catalog usage
AmplabJenkins commented on issue #26120: [SPARK-29014][SQL] DataSourceV2: Fix current/default catalog usage URL: https://github.com/apache/spark/pull/26120#issuecomment-543743370 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26165: [SPARK-29508][SQL] Implicitly cast strings in datetime arithmetic operations
AmplabJenkins removed a comment on issue #26165: [SPARK-29508][SQL] Implicitly cast strings in datetime arithmetic operations URL: https://github.com/apache/spark/pull/26165#issuecomment-543752019 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17262/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26165: [SPARK-29508][SQL] Implicitly cast strings in datetime arithmetic operations
AmplabJenkins removed a comment on issue #26165: [SPARK-29508][SQL] Implicitly cast strings in datetime arithmetic operations URL: https://github.com/apache/spark/pull/26165#issuecomment-543752007 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected
SparkQA commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected URL: https://github.com/apache/spark/pull/26080#issuecomment-543754862 **[Test build #112280 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112280/testReport)** for PR 26080 at commit [`4ccec2a`](https://github.com/apache/spark/commit/4ccec2a1143f3b91a2cf967c064e823e7441fed9). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26165: [SPARK-29508][SQL] Implicitly cast strings in datetime arithmetic operations
SparkQA commented on issue #26165: [SPARK-29508][SQL] Implicitly cast strings in datetime arithmetic operations URL: https://github.com/apache/spark/pull/26165#issuecomment-543754816 **[Test build #112279 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112279/testReport)** for PR 26165 at commit [`9a2ce13`](https://github.com/apache/spark/commit/9a2ce139261b3bb28fc089cdd4e52733c30d8464). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected
AmplabJenkins commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected URL: https://github.com/apache/spark/pull/26080#issuecomment-543755327 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on issue #26162: [SPARK-29438][SS] Use partition ID of source for state store in stream-stream join
HeartSaVioR commented on issue #26162: [SPARK-29438][SS] Use partition ID of source for state store in stream-stream join URL: https://github.com/apache/spark/pull/26162#issuecomment-543668712 We may also need to describe the change into migration guide (and/or release note) - once we decide how to do with compatibility, I'll also describe the change. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26151: [SPARK-29502][SQL] typed interval expression should fail for invalid format
AmplabJenkins removed a comment on issue #26151: [SPARK-29502][SQL] typed interval expression should fail for invalid format URL: https://github.com/apache/spark/pull/26151#issuecomment-543672860 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26151: [SPARK-29502][SQL] typed interval expression should fail for invalid format
AmplabJenkins removed a comment on issue #26151: [SPARK-29502][SQL] typed interval expression should fail for invalid format URL: https://github.com/apache/spark/pull/26151#issuecomment-543672879 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112258/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25981: [SPARK-28420][SQL] Support the `INTERVAL` type in `date_part()`
AmplabJenkins removed a comment on issue #25981: [SPARK-28420][SQL] Support the `INTERVAL` type in `date_part()` URL: https://github.com/apache/spark/pull/25981#issuecomment-543685093 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25981: [SPARK-28420][SQL] Support the `INTERVAL` type in `date_part()`
AmplabJenkins removed a comment on issue #25981: [SPARK-28420][SQL] Support the `INTERVAL` type in `date_part()` URL: https://github.com/apache/spark/pull/25981#issuecomment-543685105 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112262/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership
AmplabJenkins commented on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership URL: https://github.com/apache/spark/pull/26160#issuecomment-543690963 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112271/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership
SparkQA removed a comment on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership URL: https://github.com/apache/spark/pull/26160#issuecomment-543646547 **[Test build #112271 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112271/testReport)** for PR 26160 at commit [`86d8325`](https://github.com/apache/spark/commit/86d8325ed10444074ccfbcb145f511492db5cf44). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership
AmplabJenkins commented on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership URL: https://github.com/apache/spark/pull/26160#issuecomment-543690956 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership
AmplabJenkins removed a comment on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership URL: https://github.com/apache/spark/pull/26160#issuecomment-543690956 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership
AmplabJenkins removed a comment on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership URL: https://github.com/apache/spark/pull/26160#issuecomment-543690963 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112271/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Fokko commented on issue #26131: [SPARK-29483][BUILD] Bump Jackson to 2.10.0
Fokko commented on issue #26131: [SPARK-29483][BUILD] Bump Jackson to 2.10.0 URL: https://github.com/apache/spark/pull/26131#issuecomment-543701784 @krishna-pandey please refer to https://github.com/apache/spark/pull/21596 Jackson is notorious for having to change their public API, and aggressively deprecating methods. This will conflict with older versions of Hadoop. This might have huge implications for your application. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26161: [SPARK-27900][K8s] Add jvm oom flag in cluster mode
SparkQA commented on issue #26161: [SPARK-27900][K8s] Add jvm oom flag in cluster mode URL: https://github.com/apache/spark/pull/26161#issuecomment-543716490 **[Test build #112270 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112270/testReport)** for PR 26161 at commit [`5505083`](https://github.com/apache/spark/commit/550508339d401750d9f1e74f2bcfcd9c83ed4427). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26161: [SPARK-27900][K8s] Add jvm oom flag in cluster mode
SparkQA removed a comment on issue #26161: [SPARK-27900][K8s] Add jvm oom flag in cluster mode URL: https://github.com/apache/spark/pull/26161#issuecomment-543646573 **[Test build #112270 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112270/testReport)** for PR 26161 at commit [`5505083`](https://github.com/apache/spark/commit/550508339d401750d9f1e74f2bcfcd9c83ed4427). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xuanyuanking opened a new pull request #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin
xuanyuanking opened a new pull request #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin URL: https://github.com/apache/spark/pull/26164 ### What changes were proposed in this pull request? We shall have a new mechanism that the downstream operators may notify its parents that they may release the output data stream. In this PR, we implement the mechanism as below: - Add function named `cleanupResources` in SparkPlan, which default call children's `cleanupResources` function, the operator which need a resource cleanup should rewrite this with the self cleanup and also call `super.cleanupResources`, like SortExec in this PR. - Add logic support on the trigger side, in this PR is SortMergeJoinExec, which make sure and call the `cleanupResources` to do the cleanup job for all its upstream(children) operator. - Add a conf `spark.sql.sortMergeJoinExec.eagerCleanupResources` to control this behavior for safety, default value is true. ### Why are the changes needed? Bugfix for SortMergeJoin memory leak, and implement a general framework for SparkPlan resource cleanup. ### Does this PR introduce any user-facing change? No. ### How was this patch tested? UT: Add new test suite JoinWithResourceCleanSuite to check both standard and code generation scenario. Integrate Test: Test with driver/executor default memory set 1g, local mode 10 thread. Set `spark.sql.sortMergeJoinExec.eagerCleanupResources=fasle` the below test(thanks @taosaildrone for providing this test [here](https://github.com/apache/spark/pull/23762#issuecomment-463303175)) will fail by OOM, while open the conf it'll pass. ``` from pyspark.sql.functions import rand, col spark.conf.set("spark.sql.join.preferSortMergeJoin", "true") spark.conf.set("spark.sql.autoBroadcastJoinThreshold", -1) # spark.conf.set("spark.sql.sortMergeJoinExec.eagerCleanupResources", "true") r1 = spark.range(1, 1001).select(col("id").alias("timestamp1")) r1 = r1.withColumn('value', rand()) r2 = spark.range(1000, 1001).select(col("id").alias("timestamp2")) r2 = r2.withColumn('value2', rand()) joined = r1.join(r2, r1.timestamp1 == r2.timestamp2, "inner") joined = joined.coalesce(1) joined.explain() joined.show() ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected
AmplabJenkins commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected URL: https://github.com/apache/spark/pull/26080#issuecomment-543739386 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected
AmplabJenkins commented on issue #26080: [SPARK-29425][SQL] The ownership of a database should be respected URL: https://github.com/apache/spark/pull/26080#issuecomment-543739396 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17260/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin
AmplabJenkins removed a comment on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin URL: https://github.com/apache/spark/pull/26164#issuecomment-543739224 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin
AmplabJenkins removed a comment on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin URL: https://github.com/apache/spark/pull/26164#issuecomment-543739239 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17259/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25870: [SPARK-27936][K8S] support python deps
AmplabJenkins removed a comment on issue #25870: [SPARK-27936][K8S] support python deps URL: https://github.com/apache/spark/pull/25870#issuecomment-543754002 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25870: [SPARK-27936][K8S] support python deps
AmplabJenkins removed a comment on issue #25870: [SPARK-27936][K8S] support python deps URL: https://github.com/apache/spark/pull/25870#issuecomment-543754014 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17258/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gaborgsomogyi commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka
gaborgsomogyi commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka URL: https://github.com/apache/spark/pull/26153#discussion_r336503495 ## File path: docs/structured-streaming-kafka-integration.md ## @@ -622,6 +626,10 @@ a ```null``` valued key column will be automatically added (see Kafka semantics how ```null``` valued key values are handled). If a topic column exists then its value is used as the topic when writing the given row to Kafka, unless the "topic" configuration option is set i.e., the "topic" configuration option overrides the topic column. +If a partition column is not specified then the partition is calculated by the Kafka producer +(using ```org.apache.kafka.clients.producer.internals.DefaultPartitioner```). +This can be overridden in Spark by setting the ```kafka.partitioner.class``` option. Review comment: > Any config that starts with kafka. will be passed down to the producer, actually. There are exceptions, please see them either in the doc or in the code. If such thing is not checked by a test then it can be broken easily. Just a personal thought: if something is not covered in a test later it's super hard to find out it's a bug or a feature... > Don't the other test I added prove this point already? If we add the mentioned simplified test then you're right it will cover it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gaborgsomogyi commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka
gaborgsomogyi commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka URL: https://github.com/apache/spark/pull/26153#discussion_r336503495 ## File path: docs/structured-streaming-kafka-integration.md ## @@ -622,6 +626,10 @@ a ```null``` valued key column will be automatically added (see Kafka semantics how ```null``` valued key values are handled). If a topic column exists then its value is used as the topic when writing the given row to Kafka, unless the "topic" configuration option is set i.e., the "topic" configuration option overrides the topic column. +If a partition column is not specified then the partition is calculated by the Kafka producer +(using ```org.apache.kafka.clients.producer.internals.DefaultPartitioner```). +This can be overridden in Spark by setting the ```kafka.partitioner.class``` option. Review comment: > Any config that starts with kafka. will be passed down to the producer, actually. There are exceptions, please see them either in the doc or in the code. If such thing is not checked by a test then it can be broken easily. Just a personal thought: if something is not covered in a test later it's super hard to find out it's a bug or a feature... > Don't the other test I added prove this point already? If we add the mentioned simplified test then you're right it will cover the partition filed part. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gaborgsomogyi commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka
gaborgsomogyi commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka URL: https://github.com/apache/spark/pull/26153#discussion_r336503495 ## File path: docs/structured-streaming-kafka-integration.md ## @@ -622,6 +626,10 @@ a ```null``` valued key column will be automatically added (see Kafka semantics how ```null``` valued key values are handled). If a topic column exists then its value is used as the topic when writing the given row to Kafka, unless the "topic" configuration option is set i.e., the "topic" configuration option overrides the topic column. +If a partition column is not specified then the partition is calculated by the Kafka producer +(using ```org.apache.kafka.clients.producer.internals.DefaultPartitioner```). +This can be overridden in Spark by setting the ```kafka.partitioner.class``` option. Review comment: > Any config that starts with kafka. will be passed down to the producer, actually. There are exceptions, please see them either in the doc or in the code. If such thing is not checked by a test then it can be broken easily. Just a personal thought: if something is not covered in a test later it's super hard to find out it's a bug or a feature... > Don't the other test I added prove this point already? If we add the mentioned simplified test then you're right it will cover the partition field part. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26165: [SPARK-29508][SQL] Implicitly cast strings in datetime arithmetic operations
AmplabJenkins commented on issue #26165: [SPARK-29508][SQL] Implicitly cast strings in datetime arithmetic operations URL: https://github.com/apache/spark/pull/26165#issuecomment-543765423 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26165: [SPARK-29508][SQL] Implicitly cast strings in datetime arithmetic operations
SparkQA commented on issue #26165: [SPARK-29508][SQL] Implicitly cast strings in datetime arithmetic operations URL: https://github.com/apache/spark/pull/26165#issuecomment-543765377 **[Test build #112279 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112279/testReport)** for PR 26165 at commit [`9a2ce13`](https://github.com/apache/spark/commit/9a2ce139261b3bb28fc089cdd4e52733c30d8464). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26165: [SPARK-29508][SQL] Implicitly cast strings in datetime arithmetic operations
AmplabJenkins commented on issue #26165: [SPARK-29508][SQL] Implicitly cast strings in datetime arithmetic operations URL: https://github.com/apache/spark/pull/26165#issuecomment-543765429 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112279/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership
AmplabJenkins removed a comment on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership URL: https://github.com/apache/spark/pull/26160#issuecomment-543647406 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership
AmplabJenkins removed a comment on issue #26160: [SPARK-29498][SQL] CatalogTable to HiveTable should not change the table's ownership URL: https://github.com/apache/spark/pull/26160#issuecomment-543647417 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17255/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26080: [SPARK-29425][SQL] The ownership of a database should be respected
yaooqinn commented on a change in pull request #26080: [SPARK-29425][SQL] The ownership of a database should be respected URL: https://github.com/apache/spark/pull/26080#discussion_r336422180 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala ## @@ -576,6 +576,8 @@ case class CatalogDatabase( name: String, description: String, locationUri: URI, +ownerName: String, +ownerType: String, Review comment: ok, I will change it to that way This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26162: [SPARK-29438][SS] Use partition ID of source for state store in stream-stream join
SparkQA commented on issue #26162: [SPARK-29438][SS] Use partition ID of source for state store in stream-stream join URL: https://github.com/apache/spark/pull/26162#issuecomment-543665325 **[Test build #112273 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112273/testReport)** for PR 26162 at commit [`2699944`](https://github.com/apache/spark/commit/269994480dbf2da838fecf8066d48b88806b0461). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26163: [MINOR][DOCS] Fix incorrect EqualNullSafe symbol in sql-migration-guide.md
AmplabJenkins removed a comment on issue #26163: [MINOR][DOCS] Fix incorrect EqualNullSafe symbol in sql-migration-guide.md URL: https://github.com/apache/spark/pull/26163#issuecomment-543662175 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17256/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26162: [SPARK-29438][SS] Use partition ID of source for state store in stream-stream join
AmplabJenkins removed a comment on issue #26162: [SPARK-29438][SS] Use partition ID of source for state store in stream-stream join URL: https://github.com/apache/spark/pull/26162#issuecomment-543661942 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26163: [MINOR][DOCS] Fix incorrect EqualNullSafe symbol in sql-migration-guide.md
AmplabJenkins removed a comment on issue #26163: [MINOR][DOCS] Fix incorrect EqualNullSafe symbol in sql-migration-guide.md URL: https://github.com/apache/spark/pull/26163#issuecomment-543662158 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26163: [MINOR][DOCS] Fix incorrect EqualNullSafe symbol in sql-migration-guide.md
SparkQA commented on issue #26163: [MINOR][DOCS] Fix incorrect EqualNullSafe symbol in sql-migration-guide.md URL: https://github.com/apache/spark/pull/26163#issuecomment-543665344 **[Test build #112272 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112272/testReport)** for PR 26163 at commit [`371c65a`](https://github.com/apache/spark/commit/371c65a373bebcf87fd66e98a6abb31b3b372737). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25981: [SPARK-28420][SQL] Support the `INTERVAL` type in `date_part()`
AmplabJenkins commented on issue #25981: [SPARK-28420][SQL] Support the `INTERVAL` type in `date_part()` URL: https://github.com/apache/spark/pull/25981#issuecomment-543671940 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112260/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26129: [SPARK-29482][SQL] ANALYZE TABLE should look up catalog/table like v2 commands
AmplabJenkins removed a comment on issue #26129: [SPARK-29482][SQL] ANALYZE TABLE should look up catalog/table like v2 commands URL: https://github.com/apache/spark/pull/26129#issuecomment-543670787 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26129: [SPARK-29482][SQL] ANALYZE TABLE should look up catalog/table like v2 commands
AmplabJenkins commented on issue #26129: [SPARK-29482][SQL] ANALYZE TABLE should look up catalog/table like v2 commands URL: https://github.com/apache/spark/pull/26129#issuecomment-543670787 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26163: [MINOR][DOCS] Fix incorrect EqualNullSafe symbol in sql-migration-guide.md
SparkQA removed a comment on issue #26163: [MINOR][DOCS] Fix incorrect EqualNullSafe symbol in sql-migration-guide.md URL: https://github.com/apache/spark/pull/26163#issuecomment-543665344 **[Test build #112272 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112272/testReport)** for PR 26163 at commit [`371c65a`](https://github.com/apache/spark/commit/371c65a373bebcf87fd66e98a6abb31b3b372737). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26151: [SPARK-29502][SQL] typed interval expression should fail for invalid format
SparkQA commented on issue #26151: [SPARK-29502][SQL] typed interval expression should fail for invalid format URL: https://github.com/apache/spark/pull/26151#issuecomment-543671775 **[Test build #112258 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112258/testReport)** for PR 26151 at commit [`5a375eb`](https://github.com/apache/spark/commit/5a375eb520602e68f3b4c0faceb6c56a33461b13). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25981: [SPARK-28420][SQL] Support the `INTERVAL` type in `date_part()`
AmplabJenkins commented on issue #25981: [SPARK-28420][SQL] Support the `INTERVAL` type in `date_part()` URL: https://github.com/apache/spark/pull/25981#issuecomment-543671926 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26129: [SPARK-29482][SQL] ANALYZE TABLE should look up catalog/table like v2 commands
AmplabJenkins removed a comment on issue #26129: [SPARK-29482][SQL] ANALYZE TABLE should look up catalog/table like v2 commands URL: https://github.com/apache/spark/pull/26129#issuecomment-543670799 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112259/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26163: [MINOR][DOCS] Fix incorrect EqualNullSafe symbol in sql-migration-guide.md
AmplabJenkins commented on issue #26163: [MINOR][DOCS] Fix incorrect EqualNullSafe symbol in sql-migration-guide.md URL: https://github.com/apache/spark/pull/26163#issuecomment-543671702 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112272/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on issue #26129: [SPARK-29482][SQL] ANALYZE TABLE should look up catalog/table like v2 commands
gengliangwang commented on issue #26129: [SPARK-29482][SQL] ANALYZE TABLE should look up catalog/table like v2 commands URL: https://github.com/apache/spark/pull/26129#issuecomment-543671481 Thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25981: [SPARK-28420][SQL] Support the `INTERVAL` type in `date_part()`
AmplabJenkins removed a comment on issue #25981: [SPARK-28420][SQL] Support the `INTERVAL` type in `date_part()` URL: https://github.com/apache/spark/pull/25981#issuecomment-543671926 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26163: [MINOR][DOCS] Fix incorrect EqualNullSafe symbol in sql-migration-guide.md
AmplabJenkins removed a comment on issue #26163: [MINOR][DOCS] Fix incorrect EqualNullSafe symbol in sql-migration-guide.md URL: https://github.com/apache/spark/pull/26163#issuecomment-543671702 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112272/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25981: [SPARK-28420][SQL] Support the `INTERVAL` type in `date_part()`
AmplabJenkins removed a comment on issue #25981: [SPARK-28420][SQL] Support the `INTERVAL` type in `date_part()` URL: https://github.com/apache/spark/pull/25981#issuecomment-543671940 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112260/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26151: [SPARK-29502][SQL] typed interval expression should fail for invalid format
SparkQA removed a comment on issue #26151: [SPARK-29502][SQL] typed interval expression should fail for invalid format URL: https://github.com/apache/spark/pull/26151#issuecomment-543559175 **[Test build #112258 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112258/testReport)** for PR 26151 at commit [`5a375eb`](https://github.com/apache/spark/commit/5a375eb520602e68f3b4c0faceb6c56a33461b13). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26163: [MINOR][DOCS] Fix incorrect EqualNullSafe symbol in sql-migration-guide.md
AmplabJenkins removed a comment on issue #26163: [MINOR][DOCS] Fix incorrect EqualNullSafe symbol in sql-migration-guide.md URL: https://github.com/apache/spark/pull/26163#issuecomment-543671696 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26163: [MINOR][DOCS] Fix incorrect EqualNullSafe symbol in sql-migration-guide.md
SparkQA commented on issue #26163: [MINOR][DOCS] Fix incorrect EqualNullSafe symbol in sql-migration-guide.md URL: https://github.com/apache/spark/pull/26163#issuecomment-543671517 **[Test build #112272 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112272/testReport)** for PR 26163 at commit [`371c65a`](https://github.com/apache/spark/commit/371c65a373bebcf87fd66e98a6abb31b3b372737). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25981: [SPARK-28420][SQL] Support the `INTERVAL` type in `date_part()`
SparkQA commented on issue #25981: [SPARK-28420][SQL] Support the `INTERVAL` type in `date_part()` URL: https://github.com/apache/spark/pull/25981#issuecomment-543671131 **[Test build #112260 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112260/testReport)** for PR 25981 at commit [`d4375b5`](https://github.com/apache/spark/commit/d4375b5a938c6a93708059fcf66402d8782e8731). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class ExtractIntervalPart(` * `case class ExtractIntervalMillenniums(child: Expression)` * `case class ExtractIntervalCenturies(child: Expression)` * `case class ExtractIntervalDecades(child: Expression)` * `case class ExtractIntervalYears(child: Expression)` * `case class ExtractIntervalQuarters(child: Expression)` * `case class ExtractIntervalMonths(child: Expression)` * `case class ExtractIntervalDays(child: Expression)` * `case class ExtractIntervalHours(child: Expression)` * `case class ExtractIntervalMinutes(child: Expression)` * `case class ExtractIntervalSeconds(child: Expression)` * `case class ExtractIntervalMilliseconds(child: Expression)` * `case class ExtractIntervalMicroseconds(child: Expression)` * `case class ExtractIntervalEpoch(child: Expression)` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25981: [SPARK-28420][SQL] Support the `INTERVAL` type in `date_part()`
SparkQA removed a comment on issue #25981: [SPARK-28420][SQL] Support the `INTERVAL` type in `date_part()` URL: https://github.com/apache/spark/pull/25981#issuecomment-543559212 **[Test build #112260 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112260/testReport)** for PR 25981 at commit [`d4375b5`](https://github.com/apache/spark/commit/d4375b5a938c6a93708059fcf66402d8782e8731). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26163: [MINOR][DOCS] Fix incorrect EqualNullSafe symbol in sql-migration-guide.md
AmplabJenkins commented on issue #26163: [MINOR][DOCS] Fix incorrect EqualNullSafe symbol in sql-migration-guide.md URL: https://github.com/apache/spark/pull/26163#issuecomment-543671696 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26129: [SPARK-29482][SQL] ANALYZE TABLE should look up catalog/table like v2 commands
AmplabJenkins commented on issue #26129: [SPARK-29482][SQL] ANALYZE TABLE should look up catalog/table like v2 commands URL: https://github.com/apache/spark/pull/26129#issuecomment-543670799 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112259/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26156: [SPARK-29438][SS] Failed to get state store when task number is not determinate
SparkQA commented on issue #26156: [SPARK-29438][SS] Failed to get state store when task number is not determinate URL: https://github.com/apache/spark/pull/26156#issuecomment-543679213 **[Test build #112261 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112261/testReport)** for PR 26156 at commit [`f25e4ac`](https://github.com/apache/spark/commit/f25e4acf91c517c33afce444c1e45bae71cf6d2a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] redsk commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka
redsk commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka URL: https://github.com/apache/spark/pull/26153#discussion_r336453652 ## File path: docs/structured-streaming-kafka-integration.md ## @@ -622,6 +626,10 @@ a ```null``` valued key column will be automatically added (see Kafka semantics how ```null``` valued key values are handled). If a topic column exists then its value is used as the topic when writing the given row to Kafka, unless the "topic" configuration option is set i.e., the "topic" configuration option overrides the topic column. +If a partition column is not specified then the partition is calculated by the Kafka producer +(using ```org.apache.kafka.clients.producer.internals.DefaultPartitioner```). +This can be overridden in Spark by setting the ```kafka.partitioner.class``` option. Review comment: Yes, exactly. But this is `KafkaProducer` standard behaviour: - it uses `ProducerRecord` partition field. if `null`, fall backs to: - `kafka.partitioner.class` provided. If not set: - use default partitioner. I don't believe we need a test for this (otherwise we would be testing Kafka API) but maybe we should explicitly state it in the doc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] redsk commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka
redsk commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka URL: https://github.com/apache/spark/pull/26153#discussion_r336456840 ## File path: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala ## @@ -404,20 +420,85 @@ abstract class KafkaSinkBatchSuiteBase extends KafkaSinkSuiteBase { .save() checkAnswer( createKafkaReader(topic, includeHeaders = true).selectExpr( -"CAST(value as STRING) value", "headers" +"CAST(value as STRING) value", "headers", "partition" ), - Row("1", Seq(Row("a", "b".getBytes(UTF_8 :: -Row("2", Seq(Row("c", "d".getBytes(UTF_8)), Row("e", "f".getBytes(UTF_8 :: -Row("3", Seq(Row("g", "h".getBytes(UTF_8)), Row("g", "i".getBytes(UTF_8 :: -Row("4", null) :: + Row("1", Seq(Row("a", "b".getBytes(UTF_8))), 0) :: +Row("2", Seq(Row("c", "d".getBytes(UTF_8)), Row("e", "f".getBytes(UTF_8))), 1) :: +Row("3", Seq(Row("g", "h".getBytes(UTF_8)), Row("g", "i".getBytes(UTF_8))), 2) :: +Row("4", null, 3) :: Row("5", Seq( Row("j", "k".getBytes(UTF_8)), Row("j", "l".getBytes(UTF_8)), - Row("m", "n".getBytes(UTF_8 :: + Row("m", "n".getBytes(UTF_8))), 0) :: Nil ) } + test("batch - partition column vs default Kafka partitioner") { Review comment: - "Not sure why 100 partitions necessary?": I just wanted to avoid the test to succeed by chance. I admit it's overkill. I'll reduce it to 10. - "Don't we need the following tests?". See my other comment. - "`.collect().toList.head` maybe not enough because it would be good to make sure the data is in one partition." `.collect()` docstring says "Returns an array that contains all rows in this Dataset. Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError." Does `collect()` work on a per-partition base? If so maybe I could use `.coalesce(1)` before calling `collect()`. - "I think this test can be formed more simple." Ok it makes sense. I'll simplify it. - "There are a couple of copy-pastes" Do you refer to the ```df.write .format("kafka") .option("kafka.bootstrap.servers", testUtils.brokerAddress) .option("topic", topic) .mode("append") .save()``` blocks? In that case, they are also all over the file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gaborgsomogyi commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka
gaborgsomogyi commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka URL: https://github.com/apache/spark/pull/26153#discussion_r336461734 ## File path: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala ## @@ -404,20 +420,85 @@ abstract class KafkaSinkBatchSuiteBase extends KafkaSinkSuiteBase { .save() checkAnswer( createKafkaReader(topic, includeHeaders = true).selectExpr( -"CAST(value as STRING) value", "headers" +"CAST(value as STRING) value", "headers", "partition" ), - Row("1", Seq(Row("a", "b".getBytes(UTF_8 :: -Row("2", Seq(Row("c", "d".getBytes(UTF_8)), Row("e", "f".getBytes(UTF_8 :: -Row("3", Seq(Row("g", "h".getBytes(UTF_8)), Row("g", "i".getBytes(UTF_8 :: -Row("4", null) :: + Row("1", Seq(Row("a", "b".getBytes(UTF_8))), 0) :: +Row("2", Seq(Row("c", "d".getBytes(UTF_8)), Row("e", "f".getBytes(UTF_8))), 1) :: +Row("3", Seq(Row("g", "h".getBytes(UTF_8)), Row("g", "i".getBytes(UTF_8))), 2) :: +Row("4", null, 3) :: Row("5", Seq( Row("j", "k".getBytes(UTF_8)), Row("j", "l".getBytes(UTF_8)), - Row("m", "n".getBytes(UTF_8 :: + Row("m", "n".getBytes(UTF_8))), 0) :: Nil ) } + test("batch - partition column vs default Kafka partitioner") { Review comment: > I can revert it to the previous state (with headers). Is that ok? That's fine since that code is already in. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gaborgsomogyi commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka
gaborgsomogyi commented on a change in pull request #26153: [SPARK-29500][SQL][SS] Support partition column when writing to Kafka URL: https://github.com/apache/spark/pull/26153#discussion_r336472228 ## File path: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaWriteTask.scala ## @@ -92,16 +92,19 @@ private[kafka010] abstract class KafkaRowWriter( throw new NullPointerException(s"null topic present in the data. Use the " + s"${KafkaSourceProvider.TOPIC_OPTION_KEY} option for setting a default topic.") } +val partition: java.lang.Integer = Review comment: What I mean: `val partition: Integer =...` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24939: [SPARK-18569][ML][R] Support RFormula arithmetic, I() and spark functions
AmplabJenkins removed a comment on issue #24939: [SPARK-18569][ML][R] Support RFormula arithmetic, I() and spark functions URL: https://github.com/apache/spark/pull/24939#issuecomment-543743557 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17261/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24939: [SPARK-18569][ML][R] Support RFormula arithmetic, I() and spark functions
AmplabJenkins removed a comment on issue #24939: [SPARK-18569][ML][R] Support RFormula arithmetic, I() and spark functions URL: https://github.com/apache/spark/pull/24939#issuecomment-543743551 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26120: [SPARK-29014][SQL] DataSourceV2: Fix current/default catalog usage
AmplabJenkins removed a comment on issue #26120: [SPARK-29014][SQL] DataSourceV2: Fix current/default catalog usage URL: https://github.com/apache/spark/pull/26120#issuecomment-543743370 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26120: [SPARK-29014][SQL] DataSourceV2: Fix current/default catalog usage
AmplabJenkins removed a comment on issue #26120: [SPARK-29014][SQL] DataSourceV2: Fix current/default catalog usage URL: https://github.com/apache/spark/pull/26120#issuecomment-543743384 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/112269/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24939: [SPARK-18569][ML][R] Support RFormula arithmetic, I() and spark functions
AmplabJenkins commented on issue #24939: [SPARK-18569][ML][R] Support RFormula arithmetic, I() and spark functions URL: https://github.com/apache/spark/pull/24939#issuecomment-543743557 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/17261/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24939: [SPARK-18569][ML][R] Support RFormula arithmetic, I() and spark functions
AmplabJenkins commented on issue #24939: [SPARK-18569][ML][R] Support RFormula arithmetic, I() and spark functions URL: https://github.com/apache/spark/pull/24939#issuecomment-543743551 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin
AmplabJenkins commented on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin URL: https://github.com/apache/spark/pull/26164#issuecomment-543744501 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin
SparkQA commented on issue #26164: [SPARK-21492][SQL] Fix memory leak in SortMergeJoin URL: https://github.com/apache/spark/pull/26164#issuecomment-543744459 **[Test build #112276 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112276/testReport)** for PR 26164 at commit [`f9567d5`](https://github.com/apache/spark/commit/f9567d58105fb92d73e8118d96c8c58abcd2414f). * This patch **fails to generate documentation**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org