[GitHub] spark pull request #16601: [SPARK-19182][DStream] Optimize the lock in Strea...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16601#discussion_r96358523 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/DStreamGraph.scala --- @@ -112,12 +112,10 @@ final private[streaming] class DStreamGraph extends Serializable with Logging { def generateJobs(time: Time): Seq[Job] = { logDebug("Generating jobs for time " + time) -val jobs = this.synchronized { - outputStreams.flatMap { outputStream => -val jobOption = outputStream.generateJob(time) -jobOption.foreach(_.setCallSite(outputStream.creationSite)) -jobOption - } +val jobs = getOutputStreams().flatMap { outputStream => --- End diff -- Yes, I have put the question to be too simple --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema ...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16606#discussion_r96358416 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionProviderCompatibilitySuite.scala --- @@ -481,4 +481,27 @@ class PartitionProviderCompatibilitySuite assert(spark.sql("show partitions test").count() == 5) } } + + test("saveAsTable with inconsistent columns order" + --- End diff -- Could you move it to `PartitionedWriteSuite`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16610: [SPARK-19254][SQL] Support Seq, Map, and Struct in funct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16610 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71473/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16610: [SPARK-19254][SQL] Support Seq, Map, and Struct in funct...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16610 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16606#discussion_r96357937 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -183,9 +183,12 @@ case class CatalogTable( import CatalogTable._ - /** schema of this table's partition columns */ - def partitionSchema: StructType = StructType(schema.filter { -c => partitionColumnNames.contains(c.name) + /** + * schema of this table's partition columns + * keep the schema order with partitionColumnNames --- End diff -- let's keep the previous document, I think it's clear enough. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16610: [SPARK-19254][SQL] Support Seq, Map, and Struct in funct...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16610 **[Test build #71473 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71473/testReport)** for PR 16610 at commit [`6a02490`](https://github.com/apache/spark/commit/6a02490745952bd2a5c5b0c84482b5cd874ae820). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16599: [SPARK-19239][PySpark] Check the lowerBound and u...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16599#discussion_r96357852 --- Diff: python/pyspark/sql/readwriter.py --- @@ -431,6 +432,8 @@ def jdbc(self, url, table, column=None, lowerBound=None, upperBound=None, numPar if column is not None: if numPartitions is None: numPartitions = self._spark._sc.defaultParallelism --- End diff -- I think we should make the Scala API and Python API consistent. The existing Python API is not following [the document](http://spark.apache.org/docs/2.1.0/sql-programming-guide.html). ``` These options must all be specified if any of them is specified. They describe how to partition the table when reading in parallel from multiple workers. partitionColumn must be a numeric column from the table in question. Notice that lowerBound and upperBound are just used to decide the partition stride, not for filtering the rows in table. So all rows in the table will be partitioned and returned. This option applies only to reading. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema ...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16606#discussion_r96357715 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -183,9 +183,12 @@ case class CatalogTable( import CatalogTable._ - /** schema of this table's partition columns */ - def partitionSchema: StructType = StructType(schema.filter { -c => partitionColumnNames.contains(c.name) + /** + * schema of this table's partition columns + * keep the schema order with partitionColumnNames + */ + def partitionSchema: StructType = StructType(partitionColumnNames.flatMap { +p => schema.filter(_.name == p) --- End diff -- nit: code style ``` xxx.map { p => xxx } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16573: [SPARK-19210][DStream] Add log level info into checkpoin...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16573 @zsxwing I know what you mean, and indeed it can achieve the right result! IMHO, since we have provided the `SparkContext.setLogLevel`, it is weird to call `org.apache.log4j.Logger.getRootLogger().setLevel(l)`, but not use `SparkContext.setLogLevel()`. Besides, the new adding conf is just an internal conf. At last, the actual change is far from complicated. Anyway, it is not a major issue, and can be fixed with your way, if you do not like this PR, I will close it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16597: [SPARK-19240][SQL][TEST] add test for setting location f...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16597 Just FYI, this only test the behaviors of InMemoryCatalog. I will port it to `HiveDDLSuite` in https://github.com/apache/spark/pull/16592 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16599: [SPARK-19239][PySpark] Check the lowerBound and u...
Github user djvulee commented on a diff in the pull request: https://github.com/apache/spark/pull/16599#discussion_r96357233 --- Diff: python/pyspark/sql/readwriter.py --- @@ -431,6 +432,8 @@ def jdbc(self, url, table, column=None, lowerBound=None, upperBound=None, numPar if column is not None: if numPartitions is None: numPartitions = self._spark._sc.defaultParallelism --- End diff -- I have a little worry whether this change will break the API. If some users just specify the `column`, `lowerBound`, `upperBound` in some Spark version, its program will fail after update, even very few people just use the default parallelism. In my personal opinion, I prefer to make a change and keep API consistent. If your opinion is to add the assert on `numPartitions`, I will update the PR soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #14559: [SPARK-16968]Add additional options in jdbc when creatin...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/14559 This is a pretty general issue for JDBC users. Could we backport it to Spark 2.0? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16564: [SPARK-19065][SQL]Don't inherit expression id in dropDup...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16564 **[Test build #71487 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71487/testReport)** for PR 16564 at commit [`26652a0`](https://github.com/apache/spark/commit/26652a09be891de4a26fe54e4d3755b1cd42094f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16599: [SPARK-19239][PySpark] Check the lowerBound and u...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16599#discussion_r96355936 --- Diff: python/pyspark/sql/readwriter.py --- @@ -431,6 +432,8 @@ def jdbc(self, url, table, column=None, lowerBound=None, upperBound=None, numPar if column is not None: if numPartitions is None: numPartitions = self._spark._sc.defaultParallelism --- End diff -- This is contradicting with the scala version. Could you also change it to the following code ```Python assert numPartitions is not None, "numPartitions can not be None when ``column`` is specified" ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16599: [SPARK-19239][PySpark] Check the lowerBound and upperBou...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16599 Have you manually tested your code changes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16564: [SPARK-19065][SQL]Don't inherit expression id in ...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16564#discussion_r9636 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -898,11 +899,15 @@ class DatasetSuite extends QueryTest with SharedSQLContext { (1, 2), (1, 1), (2, 1), (2, 2)) } - test("dropDuplicates should not change child plan output") { -val ds = Seq(("a", 1), ("a", 2), ("b", 1), ("a", 1)).toDS() -checkDataset( - ds.dropDuplicates("_1").select(ds("_1").as[String], ds("_2").as[Int]), - ("a", 1), ("b", 1)) + test("SPARK-19065 dropDuplicates should not create expressions using the same id") { --- End diff -- It seems weird to me that adding a test to verify that we don't support some feature, so I just added my previous regression test back in order to have a test to catch this issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16597: [SPARK-19240][SQL][TEST] add test for setting location f...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16597 **[Test build #71484 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71484/testReport)** for PR 16597 at commit [`a5687f8`](https://github.com/apache/spark/commit/a5687f8d99bb0cfdc075c6947898d4a5a65dd57f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16587: [SPARK-19229] [SQL] Disallow Creating Hive Source Tables...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16587 **[Test build #71485 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71485/testReport)** for PR 16587 at commit [`49e6e81`](https://github.com/apache/spark/commit/49e6e815639550a9c597b0752f8aa68ec9cfb496). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16473: [SPARK-19069] [CORE] Expose task 'status' and 'duration'...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16473 **[Test build #71486 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71486/testReport)** for PR 16473 at commit [`b2ad3bc`](https://github.com/apache/spark/commit/b2ad3bc2ab02f99bce4498726e11728516ba1be0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16573: [SPARK-19210][DStream] Add log level info into checkpoin...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16573 You can just call `org.apache.log4j.Logger.getRootLogger().setLevel(l)` in your main method before `StreamingContext.getOrCreate`. I don't think it's a good idea to add a new Spark conf just for Streaming checkpoints. In addition, it seems weird to me that Streaming also checkpoints the log level. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16606: [SPARK-19246][SQL]CataLogTable's partitionSchema order a...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16606 **[Test build #71483 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71483/testReport)** for PR 16606 at commit [`4260f84`](https://github.com/apache/spark/commit/4260f844530c17533d811f0c7f3deed14ed7a307). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16601: [SPARK-19182][DStream] Optimize the lock in Strea...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16601#discussion_r96354491 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/DStreamGraph.scala --- @@ -112,12 +112,10 @@ final private[streaming] class DStreamGraph extends Serializable with Logging { def generateJobs(time: Time): Seq[Job] = { logDebug("Generating jobs for time " + time) -val jobs = this.synchronized { - outputStreams.flatMap { outputStream => -val jobOption = outputStream.generateJob(time) -jobOption.foreach(_.setCallSite(outputStream.creationSite)) -jobOption - } +val jobs = getOutputStreams().flatMap { outputStream => --- End diff -- `synchronized` is to make sure `writeObject` never write some intermediate states of `DStreamGraph`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19251] remove unused imports and outdated comment...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16591 **[Test build #71482 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71482/testReport)** for PR 16591 at commit [`f0e0576`](https://github.com/apache/spark/commit/f0e0576e0163bd72ff749a1eef885d9296302925). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19251] remove unused imports and outdated comment...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16591 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19251] remove unused imports and outdated comment...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16591 **[Test build #71481 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71481/testReport)** for PR 16591 at commit [`ab70d6b`](https://github.com/apache/spark/commit/ab70d6ba21aea42991a66dabafe8ab495d2413e7). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19251] remove unused imports and outdated comment...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16591 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71481/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16573: [SPARK-19210][DStream] Add log level info into checkpoin...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16573 also cc @tdas --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16601: [SPARK-19182][DStream] Optimize the lock in StreamingJob...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16601 also cc @tdas --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19251] remove unused imports and outdated comment...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16591 **[Test build #71481 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71481/testReport)** for PR 16591 at commit [`ab70d6b`](https://github.com/apache/spark/commit/ab70d6ba21aea42991a66dabafe8ab495d2413e7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19251] remove unused imports and outdated comment...
Github user uncleGen commented on the issue: https://github.com/apache/spark/pull/16591 retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19251][CORE] remove unused imports and outdated c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16591 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19251][CORE] remove unused imports and outdated c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16591 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71480/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19251][CORE] remove unused imports and outdated c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16591 **[Test build #71480 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71480/testReport)** for PR 16591 at commit [`b5244ec`](https://github.com/apache/spark/commit/b5244ecffc2e957ebdf4f6c70e42b507cfda7595). * This patch **fails Scala style tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19251][CORE] remove unused imports and outdated c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16591 **[Test build #71480 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71480/testReport)** for PR 16591 at commit [`b5244ec`](https://github.com/apache/spark/commit/b5244ecffc2e957ebdf4f6c70e42b507cfda7595). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16583: [SPARK-19129] [SQL] SessionCatalog: Disallow empty part ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16583 **[Test build #71479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71479/testReport)** for PR 16583 at commit [`f1b6fe0`](https://github.com/apache/spark/commit/f1b6fe0d733ab160531ce261564340491a7840dd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16599: [SPARK-19239][PySpark] Check the lowerBound and upperBou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16599 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71476/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16599: [SPARK-19239][PySpark] Check the lowerBound and upperBou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16599 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16599: [SPARK-19239][PySpark] Check the lowerBound and upperBou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16599 **[Test build #71476 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71476/testReport)** for PR 16599 at commit [`43602b5`](https://github.com/apache/spark/commit/43602b56d6099213a103a0c0389ac37ebb2c326b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16583: [SPARK-19129] [SQL] SessionCatalog: Disallow empt...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/16583#discussion_r96349730 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala --- @@ -568,7 +569,9 @@ private[hive] class HiveClientImpl( val hiveTable = toHiveTable(table) val parts = spec match { case None => shim.getAllPartitions(client, hiveTable).map(fromHivePartition) - case Some(s) => client.getPartitions(hiveTable, s.asJava).asScala.map(fromHivePartition) + case Some(s) => +assert(s.values.forall(_.nonEmpty), s"partition spec '$s' is invalid") --- End diff -- Yeah, it has the same issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable wor...
Github user lins05 commented on a diff in the pull request: https://github.com/apache/spark/pull/16593#discussion_r96348515 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -1343,17 +1343,41 @@ class HiveDDLSuite sql("INSERT INTO t SELECT 2, 'b'") checkAnswer(spark.table("t"), Row(9, "x") :: Row(2, "b") :: Nil) - val e = intercept[AnalysisException] { -Seq(1 -> "a").toDF("i", "j").write.format("hive").partitionBy("i").saveAsTable("t2") - } - assert(e.message.contains("A Create Table As Select (CTAS) statement is not allowed " + -"to create a partitioned table using Hive")) - val e2 = intercept[AnalysisException] { Seq(1 -> "a").toDF("i", "j").write.format("hive").bucketBy(4, "i").saveAsTable("t2") } assert(e2.message.contains("Creating bucketed Hive serde table is not supported yet")) + try { +spark.sql("set hive.exec.dynamic.partition.mode=nonstrict") --- End diff -- I think we can use `withSQLConf` instead of `try .. finally ..`. ```scala withSQLConf("hive.exec.dynamic.partition.mode" -> "nonstrict") { ... } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable wor...
Github user lins05 commented on a diff in the pull request: https://github.com/apache/spark/pull/16593#discussion_r96348791 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -45,6 +46,25 @@ case class CreateHiveTableAsSelectCommand( override def innerChildren: Seq[LogicalPlan] = Seq(query) override def run(sparkSession: SparkSession): Seq[Row] = { + +// relation should move partition columns to the last +val (partOutputs, nonPartOutputs) = query.output.partition { + a => +tableDesc.partitionColumnNames.contains(a.name) +} + +// the CTAS's SELECT partition-outputs order should be consistent with +// tableDesc.partitionColumnNames +val reorderPartOutputs = tableDesc.partitionColumnNames.map { --- End diff -- nit: `reorderPartOutputs` -> `reorderedPartOutputs`. The former sounds like a verb while the later sounds like a noun. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable wor...
Github user lins05 commented on a diff in the pull request: https://github.com/apache/spark/pull/16593#discussion_r96349044 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -183,9 +183,15 @@ case class CatalogTable( import CatalogTable._ - /** schema of this table's partition columns */ - def partitionSchema: StructType = StructType(schema.filter { -c => partitionColumnNames.contains(c.name) + /** + * schema of this table's partition columns + * keep the schema order with partitionColumnNames --- End diff -- "keep the schema order with partitionColumnNames because we always concatenate the partition columns to the schema when reading the table information from hive metastore." --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable wor...
Github user lins05 commented on a diff in the pull request: https://github.com/apache/spark/pull/16593#discussion_r96348696 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -88,7 +108,9 @@ case class CreateHiveTableAsSelectCommand( } else { try { sparkSession.sessionState.executePlan(InsertIntoTable( - metastoreRelation, Map(), query, overwrite = true, ifNotExists = false)).toRdd +metastoreRelation, Map(), reorderOutputQuery, overwrite = true + , ifNotExists = false)) --- End diff -- nit: The comma should be in the line above (after `overwrite = true`). Actually I think we can put all the args to `InsertIntoTable` in the same line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable wor...
Github user lins05 commented on a diff in the pull request: https://github.com/apache/spark/pull/16593#discussion_r96349144 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -183,9 +183,15 @@ case class CatalogTable( import CatalogTable._ - /** schema of this table's partition columns */ - def partitionSchema: StructType = StructType(schema.filter { -c => partitionColumnNames.contains(c.name) + /** + * schema of this table's partition columns + * keep the schema order with partitionColumnNames + */ + def partitionSchema: StructType = StructType(partitionColumnNames.map { +p => schema.find(_.name == p).getOrElse( + throw new AnalysisException(s"Partition column [$p] " + +s"did not exist in schema ${schema.toString}") --- End diff -- "did not exist" -> "does not exist" --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable wor...
Github user lins05 commented on a diff in the pull request: https://github.com/apache/spark/pull/16593#discussion_r96348933 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala --- @@ -1343,17 +1343,41 @@ class HiveDDLSuite sql("INSERT INTO t SELECT 2, 'b'") checkAnswer(spark.table("t"), Row(9, "x") :: Row(2, "b") :: Nil) - val e = intercept[AnalysisException] { -Seq(1 -> "a").toDF("i", "j").write.format("hive").partitionBy("i").saveAsTable("t2") - } - assert(e.message.contains("A Create Table As Select (CTAS) statement is not allowed " + -"to create a partitioned table using Hive")) - val e2 = intercept[AnalysisException] { Seq(1 -> "a").toDF("i", "j").write.format("hive").bucketBy(4, "i").saveAsTable("t2") } assert(e2.message.contains("Creating bucketed Hive serde table is not supported yet")) + try { +spark.sql("set hive.exec.dynamic.partition.mode=nonstrict") +Seq(10 -> "y").toDF("i", "j").write.format("hive").partitionBy("i").saveAsTable("t3") +checkAnswer(spark.table("t3"), Row("y", 10) :: Nil) +table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t3")) +var partitionSchema = table.partitionSchema +assert(partitionSchema.size == 1 && partitionSchema.fields(0).name == "i" && + partitionSchema.fields(0).dataType == IntegerType) + +Seq(11 -> "z").toDF("i", "j").write.mode("overwrite").format("hive") + .partitionBy("j").saveAsTable("t3") +checkAnswer(spark.table("t3"), Row(11, "z") :: Nil) +table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t3")) +partitionSchema = table.partitionSchema +assert(partitionSchema.size == 1 && partitionSchema.fields(0).name == "j" && + partitionSchema.fields(0).dataType == StringType) + +Seq((1, 2, 3)).toDF("i", "j", "k").write.mode("overwrite").format("hive") + .partitionBy("k", "j").saveAsTable("t3") +table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t3")) +checkAnswer(spark.table("t3"), Row(1, 3, 2) :: Nil) + +Seq((1, 2, 3)).toDF("i", "j", "k").write.mode("overwrite").format("hive") + .partitionBy("j", "k").saveAsTable("t3") +table = spark.sessionState.catalog.getTableMetadata(TableIdentifier("t3")) +checkAnswer(spark.table("t3"), Row(1, 2, 3) :: Nil) + } finally { +spark.sql("set hive.exec.dynamic.partition.mode=strict") + } + --- End diff -- I think this test case is a bit fat, maybe we can split it into two or three smaller ones? e.g.: ```scala test("create hive serde table with DataFrameWriter.saveAsTable - basic") ... test("create hive serde table with DataFrameWriter.saveAsTable - overwrite and append") ... test("create hive serde table with DataFrameWriter.saveAsTable - partitioned") ... ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16473: [SPARK-19069] [CORE] Expose task 'status' and 'duration'...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16473 **[Test build #71478 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71478/testReport)** for PR 16473 at commit [`96194df`](https://github.com/apache/spark/commit/96194df0ec6fdead12e18f436ee4ef107518152b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16585 BTW please add a test case for this. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15300: [SPARK-17729] [SQL] Enable creating hive bucketed tables
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15300 @cloud-fan : I have linked a proposal in https://issues.apache.org/jira/browse/SPARK-19256. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r96348295 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,96 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. --- End diff -- Good idea, thx! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16473: [SPARK-19069] [CORE] Expose task 'status' and 'duration'...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16473 **[Test build #71477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71477/testReport)** for PR 16473 at commit [`0e25b30`](https://github.com/apache/spark/commit/0e25b301b3c2c7d9fe4f5ab2a4f266133f916960). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16599: [SPARK-19239][PySpark] Check the lowerBound and upperBou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16599 **[Test build #71476 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71476/testReport)** for PR 16599 at commit [`43602b5`](https://github.com/apache/spark/commit/43602b56d6099213a103a0c0389ac37ebb2c326b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r96347397 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,96 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. + * It's also acceptable to give parameters of different types. + * If the search string is NULL, the return value is 0 because NULL fails equality comparison with any value. + * When the paramters have different types, comparing will be done based on type firstly, + * for example, ''999'' won't be considered equal with 999, no implicit cast will be done here. + */ +@ExpressionDescription( + usage = "_FUNC_(expr, expr1, expr2, ...) - Returns the index of expr in the expr1, expr2, ... or 0 if not found.", + extended = """ +Examples: + > SELECT _FUNC_(10, 9, 3, 10, 4); + 3 + > SELECT _FUNC_('a', 'b', 'c', 'd', 'a'); + 4 + > SELECT _FUNC_('999', 'a', 999, 9.99, '999'); + 4 + """) +case class Field(children: Seq[Expression]) extends Expression { + + /** Even if expr is not found in (expr1, expr2, ...) list, the value will be 0, not null */ + override def nullable: Boolean = false + override def foldable: Boolean = children.forall(_.foldable) + + private lazy val ordering = TypeUtils.getInterpretedOrdering(children(0).dataType) + + private val dataTypeMatchIndex: Array[Int] = children.zipWithIndex.tail.filter( +_._1.dataType.sameType(children.head.dataType)).map(_._2).toArray + + override def checkInputDataTypes(): TypeCheckResult = { +if (children.length <= 1) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires at least 2 arguments") +} else if (!children.forall( +e => e.dataType.isInstanceOf[AtomicType] || e.dataType.isInstanceOf[NullType])) { + TypeCheckResult.TypeCheckFailure(s"FIELD requires all arguments to be of AtomicType") --- End diff -- That's for user's explicit indication of NULL, that's legal in Hive's `field` expression. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r96347281 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala --- @@ -17,11 +17,13 @@ package org.apache.spark.sql +import java.sql.{Date, Timestamp} --- End diff -- My bad. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16476: [SPARK-19084][SQL] Implement expression field
Github user gczsjdy commented on a diff in the pull request: https://github.com/apache/spark/pull/16476#discussion_r96347166 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -340,3 +344,96 @@ object CaseKeyWhen { CaseWhen(cases, elseValue) } } + +/** + * A function that returns the index of expr in (expr1, expr2, ...) list or 0 if not found. + * It takes at least 2 parameters, and all parameters should be subtype of AtomicType or NullType. --- End diff -- Yes, that's right. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16610: [SPARK-19254][SQL] Support Seq, Map, and Struct in funct...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16610 Since spark possibly supports comparable `MapType` in future (#15970), it might also needs to support this type in `functions.lit`. However, Since I exactly know that adding new IFs is much arguable, anyone gives me some insights about this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16585 **[Test build #71475 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71475/testReport)** for PR 16585 at commit [`1563e03`](https://github.com/apache/spark/commit/1563e03796a1ee557decfa041d39dbd5eee8cf33). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16603 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71471/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16603 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16603 **[Test build #71471 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71471/testReport)** for PR 16603 at commit [`070ec51`](https://github.com/apache/spark/commit/070ec51f322d3af889c499f60be11fef29068aa5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16559: [WIP] Add expression index and test cases
Github user gczsjdy closed the pull request at: https://github.com/apache/spark/pull/16559 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16559: [WIP] Add expression index and test cases
Github user gczsjdy commented on the issue: https://github.com/apache/spark/pull/16559 Thanks for your inform @rxin @aray @cloud-fan , I will close this PR. Sorry for the late reply. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16610: [SPARK-19254][SQL] Support Seq, Map, and Struct in funct...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16610 **[Test build #71473 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71473/testReport)** for PR 16610 at commit [`6a02490`](https://github.com/apache/spark/commit/6a02490745952bd2a5c5b0c84482b5cd874ae820). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16585 **[Test build #71474 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71474/testReport)** for PR 16585 at commit [`e2d872c`](https://github.com/apache/spark/commit/e2d872c2ba706433e9aebe74213c4dbeb9c0754b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16610: [SPARK-19254][SQL] Support Seq, Map, and Struct i...
GitHub user maropu opened a pull request: https://github.com/apache/spark/pull/16610 [SPARK-19254][SQL] Support Seq, Map, and Struct in functions.lit ## What changes were proposed in this pull request? This pr is to support Seq, Map, and Struct in functions.lit; it adds a new IF named `lit2` with `TypeTag` for avoiding type erasure. ## How was this patch tested? Added tests in `LiteralExpressionSuite` You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/spark SPARK-19254 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16610.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16610 commit 6a02490745952bd2a5c5b0c84482b5cd874ae820 Author: Takeshi YAMAMURO Date: 2016-11-14T13:21:09Z Add a new create with TypeTag in Literal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16605 many thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/16605 Sure, @maropu . I'll do that tomorrow morning (PST). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19251][CORE] remove unused imports and outdated c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16591 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19251][CORE] remove unused imports and outdated c...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16591 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71470/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16585 `InheritableThreadLocal` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16599: [SPARK-19239][PySpark] Check the lowerBound and upperBou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16599 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71472/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16599: [SPARK-19239][PySpark] Check the lowerBound and upperBou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16599 **[Test build #71472 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71472/testReport)** for PR 16599 at commit [`94c44ba`](https://github.com/apache/spark/commit/94c44ba368acb3c7fa648ad66cfd3cac352af911). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19251][CORE] remove unused imports and outdated c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16591 **[Test build #71470 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71470/testReport)** for PR 16591 at commit [`958c2fe`](https://github.com/apache/spark/commit/958c2fe8170514e392b080d15d7e78b6568c403c). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16599: [SPARK-19239][PySpark] Check the lowerBound and upperBou...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16599 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16599: [SPARK-19239][PySpark] Check the lowerBound and upperBou...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16599 **[Test build #71472 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71472/testReport)** for PR 16599 at commit [`94c44ba`](https://github.com/apache/spark/commit/94c44ba368acb3c7fa648ad66cfd3cac352af911). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16599: [SPARK-19239][PySpark] Check the lowerBound and upperBou...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16599 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16528: [SPARK-19148][SQL] do not expose the external tab...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16528 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16585 @cloud-fan SGTM is for current approach or `InheritableThreadLocal`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16585 SGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16528: [SPARK-19148][SQL] do not expose the external table conc...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16528 thanks for the review, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/16344 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16528: [SPARK-19148][SQL] do not expose the external table conc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16528 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16528: [SPARK-19148][SQL] do not expose the external table conc...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16528 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71469/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16528: [SPARK-19148][SQL] do not expose the external table conc...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16528 **[Test build #71469 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71469/testReport)** for PR 16528 at commit [`318dc04`](https://github.com/apache/spark/commit/318dc0459cd1ba487643abff52b5979b4ab0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user maropu commented on the issue: https://github.com/apache/spark/pull/16605 @dongjoon-hyun Could you take time to review this before committers do? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16605 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71468/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16605 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16605: [SPARK-18884][SQL] Support Array[_] in ScalaUDF
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16605 **[Test build #71468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71468/testReport)** for PR 16605 at commit [`581c7fa`](https://github.com/apache/spark/commit/581c7fa46e9f3f8b71759eaaf0490f84f56825aa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16599: [SPARK-19239][PySpark] Check the lowerBound and u...
Github user djvulee commented on a diff in the pull request: https://github.com/apache/spark/pull/16599#discussion_r96339764 --- Diff: python/pyspark/sql/readwriter.py --- @@ -431,6 +432,8 @@ def jdbc(self, url, table, column=None, lowerBound=None, upperBound=None, numPar if column is not None: if numPartitions is None: numPartitions = self._spark._sc.defaultParallelism +assert lowerBound != None, "lowerBound can not be None when ``column`` is specified" +assert upperBound != None, "upperBound can not be None when ``column`` is specified" --- End diff -- Yes, The Scala code could check this, but the PySpark code will fail at ```int(lowerBound)``` first, so the customer is confused. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16429 @davies, Could this be merged by any change maybe? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16553: [SPARK-9435][SQL] Reuse function in Java UDF to correctl...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/16553 @marmbrus Can this be merged by any change maybe? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16599: [SPARK-19239][PySpark] Check the lowerBound and u...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/16599#discussion_r96339264 --- Diff: python/pyspark/sql/readwriter.py --- @@ -431,6 +432,8 @@ def jdbc(self, url, table, column=None, lowerBound=None, upperBound=None, numPar if column is not None: if numPartitions is None: numPartitions = self._spark._sc.defaultParallelism +assert lowerBound != None, "lowerBound can not be None when ``column`` is specified" +assert upperBound != None, "upperBound can not be None when ``column`` is specified" --- End diff -- Should we resemble the condition here - https://github.com/apache/spark/blob/55d528f2ba0ba689dbb881616d9436dc7958e943/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala#L100-L103 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16585: [SPARK-19223][SQL][PySpark] Fix InputFileBlockHolder for...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16585 @rxin Thanks for looking at this. I think the simplest way to transfer the info is using `InheritableThreadLocal` to replace `ThreadLocal` in `InputFileBlockHolder`. As I tested, it works. What do you think? It is ok for you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16603: [SPARK-19244][Core] Sort MemoryConsumers according to th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16603 **[Test build #71471 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71471/testReport)** for PR 16603 at commit [`070ec51`](https://github.com/apache/spark/commit/070ec51f322d3af889c499f60be11fef29068aa5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16603: [SPARK-19244][Core] Sort MemoryConsumers accordin...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/16603#discussion_r96337114 --- Diff: core/src/main/java/org/apache/spark/memory/TaskMemoryManager.java --- @@ -144,23 +170,31 @@ public long acquireExecutionMemory(long required, MemoryConsumer consumer) { // spilling, avoid to have too many spilled files. if (got < required) { // Call spill() on other consumers to release memory +// Sort the consumers according their memory usage. So we avoid spilling the same consumer +// which is just spilled in last few times and re-spilling on it will produce many small +// spill files. +List sortedList = new ArrayList<>(); for (MemoryConsumer c: consumers) { if (c != consumer && c.getUsed() > 0 && c.getMode() == mode) { -try { - long released = c.spill(required - got, consumer); - if (released > 0) { -logger.debug("Task {} released {} from {} for {}", taskAttemptId, - Utils.bytesToString(released), c, consumer); -got += memoryManager.acquireExecutionMemory(required - got, taskAttemptId, mode); -if (got >= required) { - break; -} +sortedList.add(c); + } +} +Collections.sort(sortedList, new ConsumerComparator()); +for (MemoryConsumer c: sortedList) { + try { +long released = c.spill(required - got, consumer); +if (released > 0) { + logger.debug("Task {} released {} from {} for {}", taskAttemptId, +Utils.bytesToString(released), c, consumer); + got += memoryManager.acquireExecutionMemory(required - got, taskAttemptId, mode); + if (got >= required) { +break; } -} catch (IOException e) { - logger.error("error while calling spill() on " + c, e); - throw new OutOfMemoryError("error while calling spill() on " + c + " : " -+ e.getMessage()); } + } catch (IOException e) { +logger.error("error while calling spill() on " + c, e); +throw new OutOfMemoryError("error while calling spill() on " + c + " : " + + e.getMessage()); } --- End diff -- As the memory usage of memory consumer is changing over time, not sure if we use TreeSet/TreeMap for consumers, can we get the correctly sorted order from the TreeSet/Map? In other words, the sorted order of TreeSet/Map is still guaranteed if the elements are mutable and changing after insertion? I think it is not. If we are going to do sorting here anyway, a TreeMap/TreeSet might be overkill than a list like that. Another concern is that the API of TreeMap/TreeSet can let us find the tail set or ceiling element, but it requires we give it an input element to compare. But we only have the required memory number, not a memory consumer to compare. Another concern is that TreeSet/TreeMap could return an empty set if all elements have less memory than required size. In this case, we need to go back to iterate all elements in the set/map to spill. It seems add more complexity. Totally agreed that it is better to fetch the required size instead of going from largest to smallest always. With the current list based approach, we still can achieve that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16591: [SPARK-19251][CORE] remove unused imports and out...
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/16591#discussion_r96336953 --- Diff: core/src/main/java/org/apache/spark/api/java/JavaFutureAction.java --- @@ -17,7 +17,6 @@ package org.apache.spark.api.java; - --- End diff -- Get it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16473: [SPARK-19069] [CORE] Expose task 'status' and 'du...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16473#discussion_r96335765 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskInfo.scala --- @@ -20,6 +20,7 @@ package org.apache.spark.scheduler import org.apache.spark.TaskState import org.apache.spark.TaskState.TaskState import org.apache.spark.annotation.DeveloperApi +import org.apache.spark.ui.jobs.UIData.TaskMetricsUIData --- End diff -- nit: unused import. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16473: [SPARK-19069] [CORE] Expose task 'status' and 'du...
Github user zsxwing commented on a diff in the pull request: https://github.com/apache/spark/pull/16473#discussion_r96335590 --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/UIData.scala --- @@ -127,6 +127,14 @@ private[spark] object UIData { def updateTaskMetrics(metrics: Option[TaskMetrics]): Unit = { _metrics = TaskUIData.toTaskMetricsUIData(metrics) } + +def getTaskDuration(): Long = { --- End diff -- nit: `getTaskDuration()` -> `taskDuration` as this doesn't have side effects. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16591: [SPARK-19227][CORE] remove unused imports and outdated c...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16591 **[Test build #71470 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71470/testReport)** for PR 16591 at commit [`958c2fe`](https://github.com/apache/spark/commit/958c2fe8170514e392b080d15d7e78b6568c403c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable wor...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/16593#discussion_r96336300 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateHiveTableAsSelectCommand.scala --- @@ -45,6 +46,25 @@ case class CreateHiveTableAsSelectCommand( override def innerChildren: Seq[LogicalPlan] = Seq(query) override def run(sparkSession: SparkSession): Seq[Row] = { + +// relation should move partition columns to the last +val (partOutputs, nonPartOutputs) = query.output.partition { + a => --- End diff -- nit: code style ``` xxx.map { p => xxx } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16542: [SPARK-18905][STREAMING] Fix the issue of removing a fai...
Github user CodingCat commented on the issue: https://github.com/apache/spark/pull/16542 Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org