[GitHub] [spark] SparkQA commented on issue #24805: [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation
SparkQA commented on issue #24805: [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation URL: https://github.com/apache/spark/pull/24805#issuecomment-499767122 **[Test build #106266 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106266/testReport)** for PR 24805 at commit [`69996a6`](https://github.com/apache/spark/commit/69996a61a8f1c8e0cba6a50f5f93f00e40d23c3b). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class ExprReuseOutput(child: Expression) extends UnaryExpression ` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ketank-new commented on issue #24788: [SPARK-26985] [core]
ketank-new commented on issue #24788: [SPARK-26985] [core] URL: https://github.com/apache/spark/pull/24788#issuecomment-499766766 > If that's the exact same problem, please link that JIRA to the PR title (see https://spark.apache.org/contributing.html). Linked by changing the Title above.. were in the JIRA number is seen > > Also, please clarify why changing from little endian to big endian is safe in little endian OSes. Clarification for the changes putFloats() and putDoubles() from files OffHeapColumnVector.java and OnHeapColumnVector.java do get called whenever test cases use float and double data respectively. If you check the definitions of putFloat() and putDouble() for a BIG_ENDIAN system, the control moves into the else block,within the else block the byteorder which is set is LITTLE_ENDIAN which is exactly opposite to what is expected on a BIG_ENDIAN system changing this byteorder to BIG_ENDIAN represents float and double values as expected on a BIG_ENDIAN processor and hence further helps in passing the written test cases. With this changes done and running the test cases on LITTLE_ENDIAN moves the control to if block and thereby also passes test cases for LITTLE_ENDIAN system. Thereby i conclude that the changes are working and test for both types of systems that is LITTLE_ENDIAN and BIG_ENDIAN This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ketank-new commented on issue #24788: [SPARK-26985] [core]
ketank-new commented on issue #24788: [SPARK-26985] [core] URL: https://github.com/apache/spark/pull/24788#issuecomment-499765576 > Could you please add JIRA number `[SPARK-]` and `[core]` as a prefix of the title? done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24335: [SPARK-27425][SQL] Add count_if function
AmplabJenkins removed a comment on issue #24335: [SPARK-27425][SQL] Add count_if function URL: https://github.com/apache/spark/pull/24335#issuecomment-499764312 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24335: [SPARK-27425][SQL] Add count_if function
AmplabJenkins removed a comment on issue #24335: [SPARK-27425][SQL] Add count_if function URL: https://github.com/apache/spark/pull/24335#issuecomment-499764318 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11515/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zsxwing commented on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver
zsxwing commented on issue #24796: [SPARK-27900][CORE] Add uncaught exception handler to the driver URL: https://github.com/apache/spark/pull/24796#issuecomment-499764530 > `onReceive()` is interrupted before `onStop()` But there will be a race condition if removing `join`. We cannot guarantee that `onReceive` can return immediately when it receives the interrupt signal. By the way, is there any theory about how this deadlock can happen? As I mentioned here: https://github.com/apache/spark/pull/24796#discussion_r290908122 I could not reproduce it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24335: [SPARK-27425][SQL] Add count_if function
AmplabJenkins commented on issue #24335: [SPARK-27425][SQL] Add count_if function URL: https://github.com/apache/spark/pull/24335#issuecomment-499764312 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24335: [SPARK-27425][SQL] Add count_if function
AmplabJenkins commented on issue #24335: [SPARK-27425][SQL] Add count_if function URL: https://github.com/apache/spark/pull/24335#issuecomment-499764318 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11515/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on a change in pull request #24800: [SPARK-27947][SQL] ParsedStatement subclass toString may throw ClassCastException
gatorsmile commented on a change in pull request #24800: [SPARK-27947][SQL] ParsedStatement subclass toString may throw ClassCastException URL: https://github.com/apache/spark/pull/24800#discussion_r291457845 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/ParsedStatement.scala ## @@ -36,8 +38,11 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan private[sql] abstract class ParsedStatement extends LogicalPlan { // Redact properties and options when parsed nodes are used by generic methods like toString override def productIterator: Iterator[Any] = super.productIterator.map { -case mapArg: Map[_, _] => conf.redactOptions(mapArg.asInstanceOf[Map[String, String]]) -case other => other +case mapArg: Map[_, _] => + // May match any Map type, e.g. Map[String, Int], due to type erasure + Try(conf.redactOptions(mapArg.asInstanceOf[Map[String, String]])).getOrElse(mapArg) Review comment: In Spark source code, we always try to avoid rely on the exception handling if we can possibly avoid it. Also, we try our best to avoid make an assumption in the utility class. I think we can enhance these Utils redact methods in this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gatorsmile commented on a change in pull request #24800: [SPARK-27947][SQL] ParsedStatement subclass toString may throw ClassCastException
gatorsmile commented on a change in pull request #24800: [SPARK-27947][SQL] ParsedStatement subclass toString may throw ClassCastException URL: https://github.com/apache/spark/pull/24800#discussion_r291457845 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/ParsedStatement.scala ## @@ -36,8 +38,11 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan private[sql] abstract class ParsedStatement extends LogicalPlan { // Redact properties and options when parsed nodes are used by generic methods like toString override def productIterator: Iterator[Any] = super.productIterator.map { -case mapArg: Map[_, _] => conf.redactOptions(mapArg.asInstanceOf[Map[String, String]]) -case other => other +case mapArg: Map[_, _] => + // May match any Map type, e.g. Map[String, Int], due to type erasure + Try(conf.redactOptions(mapArg.asInstanceOf[Map[String, String]])).getOrElse(mapArg) Review comment: In Spark source code, we always try to avoid rely on the exception handling if we can possibly avoid it. Also, we try our best to avoid making a hidden assumption in the utility class. I think we can enhance these Utils redact methods in this PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24335: [SPARK-27425][SQL] Add count_if function
SparkQA commented on issue #24335: [SPARK-27425][SQL] Add count_if function URL: https://github.com/apache/spark/pull/24335#issuecomment-499763307 **[Test build #106269 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106269/testReport)** for PR 24335 at commit [`81ab7e6`](https://github.com/apache/spark/commit/81ab7e662d13b8c18d8a02e05799d0a554d07bd2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cryeo commented on issue #24335: [SPARK-27425][SQL] Add count_if function
cryeo commented on issue #24335: [SPARK-27425][SQL] Add count_if function URL: https://github.com/apache/spark/pull/24335#issuecomment-499763393 @dongjoon-hyun Thanks for your review. I just modified code and PR description. Could you confirm it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24688: [SPARK-27970][SQL] Support Hive 3.0 metastore
AmplabJenkins removed a comment on issue #24688: [SPARK-27970][SQL] Support Hive 3.0 metastore URL: https://github.com/apache/spark/pull/24688#issuecomment-499762988 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11514/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24688: [SPARK-27970][SQL] Support Hive 3.0 metastore
AmplabJenkins removed a comment on issue #24688: [SPARK-27970][SQL] Support Hive 3.0 metastore URL: https://github.com/apache/spark/pull/24688#issuecomment-499762981 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24815: [SPARK-27961][SQL] DataSourceV2Relation should not have refresh method
AmplabJenkins removed a comment on issue #24815: [SPARK-27961][SQL] DataSourceV2Relation should not have refresh method URL: https://github.com/apache/spark/pull/24815#issuecomment-499762964 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24815: [SPARK-27961][SQL] DataSourceV2Relation should not have refresh method
AmplabJenkins removed a comment on issue #24815: [SPARK-27961][SQL] DataSourceV2Relation should not have refresh method URL: https://github.com/apache/spark/pull/24815#issuecomment-499762968 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11513/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24688: [SPARK-27970][SQL] Support Hive 3.0 metastore
AmplabJenkins commented on issue #24688: [SPARK-27970][SQL] Support Hive 3.0 metastore URL: https://github.com/apache/spark/pull/24688#issuecomment-499762988 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11514/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24688: [SPARK-27970][SQL] Support Hive 3.0 metastore
AmplabJenkins commented on issue #24688: [SPARK-27970][SQL] Support Hive 3.0 metastore URL: https://github.com/apache/spark/pull/24688#issuecomment-499762981 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24815: [SPARK-27961][SQL] DataSourceV2Relation should not have refresh method
AmplabJenkins commented on issue #24815: [SPARK-27961][SQL] DataSourceV2Relation should not have refresh method URL: https://github.com/apache/spark/pull/24815#issuecomment-499762968 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11513/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24815: [SPARK-27961][SQL] DataSourceV2Relation should not have refresh method
AmplabJenkins commented on issue #24815: [SPARK-27961][SQL] DataSourceV2Relation should not have refresh method URL: https://github.com/apache/spark/pull/24815#issuecomment-499762964 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #24788: s390x specific changes
HyukjinKwon commented on issue #24788: s390x specific changes URL: https://github.com/apache/spark/pull/24788#issuecomment-499762117 If that's the exact same problem, please link that JIRA to the PR title (see https://spark.apache.org/contributing.html). Also, please clarify why changing from little endian to big endian is safe in little endian OSes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24815: [SPARK-27961][SQL] DataSourceV2Relation should not have refresh method
SparkQA commented on issue #24815: [SPARK-27961][SQL] DataSourceV2Relation should not have refresh method URL: https://github.com/apache/spark/pull/24815#issuecomment-499762010 **[Test build #106267 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106267/testReport)** for PR 24815 at commit [`98c6105`](https://github.com/apache/spark/commit/98c61053ab519cc0002b9372bbc93752cc507cef). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24688: [SPARK-27970][SQL] Support Hive 3.0 metastore
SparkQA commented on issue #24688: [SPARK-27970][SQL] Support Hive 3.0 metastore URL: https://github.com/apache/spark/pull/24688#issuecomment-499762009 **[Test build #106268 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106268/testReport)** for PR 24688 at commit [`dba68a1`](https://github.com/apache/spark/commit/dba68a1ca0ca1f3538276cb06ff0972e2122fa98). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #24688: [SPARK-27970][SQL] Support Hive 3.0 metastore
wangyum commented on a change in pull request #24688: [SPARK-27970][SQL] Support Hive 3.0 metastore URL: https://github.com/apache/spark/pull/24688#discussion_r291456525 ## File path: docs/sql-data-sources-hive-tables.md ## @@ -130,7 +130,7 @@ The following options can be used to configure the version of Hive that is used 1.2.1 Version of the Hive metastore. Available - options are 0.12.0 through 2.3.5 and 3.1.0 through 3.1.1. + options are 0.12.0 through 3.1.1. Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24819: [SPARK-27973][MINOR] [EXAMPLES]correct DirectKafkaWordCount usage text with groupId
AmplabJenkins removed a comment on issue #24819: [SPARK-27973][MINOR] [EXAMPLES]correct DirectKafkaWordCount usage text with groupId URL: https://github.com/apache/spark/pull/24819#issuecomment-499757833 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24819: [SPARK-27973][MINOR] [EXAMPLES]correct DirectKafkaWordCount usage text with groupId
AmplabJenkins commented on issue #24819: [SPARK-27973][MINOR] [EXAMPLES]correct DirectKafkaWordCount usage text with groupId URL: https://github.com/apache/spark/pull/24819#issuecomment-499758184 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24819: [SPARK-27973][MINOR] [EXAMPLES]correct DirectKafkaWordCount usage text with groupId
AmplabJenkins removed a comment on issue #24819: [SPARK-27973][MINOR] [EXAMPLES]correct DirectKafkaWordCount usage text with groupId URL: https://github.com/apache/spark/pull/24819#issuecomment-499757749 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24819: [SPARK-27973][MINOR] [EXAMPLES]correct DirectKafkaWordCount usage text with groupId
AmplabJenkins commented on issue #24819: [SPARK-27973][MINOR] [EXAMPLES]correct DirectKafkaWordCount usage text with groupId URL: https://github.com/apache/spark/pull/24819#issuecomment-499757833 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24819: [SPARK-27973][MINOR] [EXAMPLES]correct DirectKafkaWordCount usage text with groupId
AmplabJenkins commented on issue #24819: [SPARK-27973][MINOR] [EXAMPLES]correct DirectKafkaWordCount usage text with groupId URL: https://github.com/apache/spark/pull/24819#issuecomment-499757749 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch
AmplabJenkins removed a comment on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch URL: https://github.com/apache/spark/pull/24818#issuecomment-499757129 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch
AmplabJenkins removed a comment on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch URL: https://github.com/apache/spark/pull/24818#issuecomment-499757133 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106264/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cnZach opened a new pull request #24819: [SPARK-27973][MINOR] [EXAMPLES]correct DirectKafkaWordCount usage text with groupId
cnZach opened a new pull request #24819: [SPARK-27973][MINOR] [EXAMPLES]correct DirectKafkaWordCount usage text with groupId URL: https://github.com/apache/spark/pull/24819 ## What changes were proposed in this pull request? Usage: DirectKafkaWordCount -- is a list of one or more Kafka brokers is a consumer group name to consume from topics is a list of one or more kafka topics to consume from ## How was this patch tested? N/A. Please review https://spark.apache.org/contributing.html before opening a pull request. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch
AmplabJenkins commented on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch URL: https://github.com/apache/spark/pull/24818#issuecomment-499757129 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch
AmplabJenkins commented on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch URL: https://github.com/apache/spark/pull/24818#issuecomment-499757133 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106264/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch
SparkQA removed a comment on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch URL: https://github.com/apache/spark/pull/24818#issuecomment-499732479 **[Test build #106264 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106264/testReport)** for PR 24818 at commit [`d848354`](https://github.com/apache/spark/commit/d8483541ee161ac249c8a439343a66136d2f0079). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch
SparkQA commented on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch URL: https://github.com/apache/spark/pull/24818#issuecomment-499756844 **[Test build #106264 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106264/testReport)** for PR 24818 at commit [`d848354`](https://github.com/apache/spark/commit/d8483541ee161ac249c8a439343a66136d2f0079). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Udbhav30 commented on issue #24601: [SPARK-27702][K8S] Allow using some alternatives for service accounts
Udbhav30 commented on issue #24601: [SPARK-27702][K8S] Allow using some alternatives for service accounts URL: https://github.com/apache/spark/pull/24601#issuecomment-499756451 Gentle ping, @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ketank-new commented on issue #24788: s390x specific changes
ketank-new commented on issue #24788: s390x specific changes URL: https://github.com/apache/spark/pull/24788#issuecomment-499754058 @HyukjinKwon : I do not mind raising a new JIRA for the above changes But let me inform you that we have been continuously in discussion on JIRA regarding this earlier too Please refer SPARK-26985 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cryeo commented on a change in pull request #24335: [SPARK-27425][SQL] Add count_if function
cryeo commented on a change in pull request #24335: [SPARK-27425][SQL] Add count_if function URL: https://github.com/apache/spark/pull/24335#discussion_r291444380 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CountIf.scala ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types._ + +@ExpressionDescription( + usage = """ +_FUNC_(expr) - Returns the number of rows that the supplied expression is non-null and true. + """, + examples = """ +Examples: + > SELECT _FUNC_(col % 2 = 0) FROM VALUES (NULL), (0), (1), (2), (3) AS tab(col); + 2 + > SELECT _FUNC_(col IS NULL) FROM VALUES (NULL), (0), (1), (2), (3) AS tab(col); + 1 + """, + since = "3.0.0") +case class CountIf(predicate: Expression) extends UnevaluableAggregate with ImplicitCastInputTypes { + override def prettyName: String = "count_if" + + override def children: Seq[Expression] = predicate :: Nil + + override def nullable: Boolean = false + + override def dataType: DataType = LongType + + override def inputTypes: Seq[AbstractDataType] = BooleanType :: Nil Review comment: Is it better to change `children` together? ```scala override def children: Seq[Expression] = Seq(predicate) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #24797: Detecting key in map type when value type is complex
HyukjinKwon commented on a change in pull request #24797: Detecting key in map type when value type is complex URL: https://github.com/apache/spark/pull/24797#discussion_r291443140 ## File path: R/pkg/R/schema.R ## @@ -162,7 +162,7 @@ checkType <- function(type) { }, m = { # Map type - m <- regexec("^map<(.+),(.+)>$", type) + m <- regexec("map<(string|character),(.+)>", type) Review comment: This is just a legacy sanity check since R type parsing is now delegated into SQL parser. We should actually remove this method entirely (see https://github.com/apache/spark/commit/70f1bcd7bcd42b30eabcf06a9639363f1ca4b449). Can you file a JIRA, review https://spark.apache.org/contributing.html closely, and update this PR to remove this with a set of tests? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24795: [SPARK-27945][SQL] Minimal changes to support columnar processing
dongjoon-hyun commented on a change in pull request #24795: [SPARK-27945][SQL] Minimal changes to support columnar processing URL: https://github.com/apache/spark/pull/24795#discussion_r291442978 ## File path: NOTICE-binary ## @@ -73,6 +73,10 @@ Copyright 2005-2015 The Apache Software Foundation This product includes software developed at OW2 Consortium (http://asm.ow2.org/) +This product includes software developed at +NVIDIA (https://www.nvidia.com) +* Copyright 2019 NVIDIA CORPORATION + Review comment: Thank you for removing this, @revans2 . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #24800: [SPARK-27947][SQL] ParsedStatement subclass toString may throw ClassCastException
HyukjinKwon commented on a change in pull request #24800: [SPARK-27947][SQL] ParsedStatement subclass toString may throw ClassCastException URL: https://github.com/apache/spark/pull/24800#discussion_r291442243 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/sql/ParsedStatement.scala ## @@ -36,8 +38,11 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan private[sql] abstract class ParsedStatement extends LogicalPlan { // Redact properties and options when parsed nodes are used by generic methods like toString override def productIterator: Iterator[Any] = super.productIterator.map { Review comment: why don't we add like: ```scala protected def options: Map[String, String] = { Map.empty } protected def properties: Map[String, String] = { Map.empty } ``` and, ```diff -options: Map[String, String], +override val options: Map[String, String], ``` at implementation of this classes? Seems like currently we'll check every maps whatever it is. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #24806: [SPARK-27856][SQL] Only allow type upcasting when inserting table
viirya commented on a change in pull request #24806: [SPARK-27856][SQL] Only allow type upcasting when inserting table URL: https://github.com/apache/spark/pull/24806#discussion_r291439699 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ## @@ -126,9 +126,12 @@ object Cast { */ def canUpCast(from: DataType, to: DataType): Boolean = (from, to) match { case _ if from == to => true +case (NullType, _) => false Review comment: Is this covered by default case previously? Or is it missing before? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #24806: [SPARK-27856][SQL] Only allow type upcasting when inserting table
viirya commented on a change in pull request #24806: [SPARK-27856][SQL] Only allow type upcasting when inserting table URL: https://github.com/apache/spark/pull/24806#discussion_r291441249 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala ## @@ -81,6 +81,7 @@ class HiveSessionStateBuilder(session: SparkSession, parentState: Option[Session RelationConversions(conf, catalog) +: PreprocessTableCreation(session) +: PreprocessTableInsertion(conf) +: +ResolveUpCast +: Review comment: Is it good to add a comment like `BaseSessionStateBuilder`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #24806: [SPARK-27856][SQL] Only allow type upcasting when inserting table
viirya commented on a change in pull request #24806: [SPARK-27856][SQL] Only allow type upcasting when inserting table URL: https://github.com/apache/spark/pull/24806#discussion_r291440948 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala ## @@ -356,8 +358,28 @@ case class PreprocessTableInsertion(conf: SQLConf) extends Rule[LogicalPlan] { s"including ${staticPartCols.size} partition column(s) having constant value(s).") } -val newQuery = DDLPreprocessingUtils.castAndRenameQueryOutput( Review comment: I saw there is another usage of `castAndRenameQueryOutput` , for `CreateTable` case, should it get rid of unsafe casts too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24805: [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation
dongjoon-hyun commented on a change in pull request #24805: [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation URL: https://github.com/apache/spark/pull/24805#discussion_r291440164 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/ConvertToLocalRelationSuite.scala ## @@ -70,4 +72,36 @@ class ConvertToLocalRelationSuite extends PlanTest { comparePlans(optimized, correctAnswer) } + + test("SPARK-27798: Expression reusing output shouldn't override values in local relation") { Review comment: Thank you for adding this, @viirya . This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #24335: [SPARK-27425][SQL] Add count_if function
dongjoon-hyun edited a comment on issue #24335: [SPARK-27425][SQL] Add count_if function URL: https://github.com/apache/spark/pull/24335#issuecomment-499741465 I also support this feature and @HyukjinKwon . cc @gatorsmile This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #24788: s390x specific changes
HyukjinKwon commented on issue #24788: s390x specific changes URL: https://github.com/apache/spark/pull/24788#issuecomment-499741691 @ketank-new, please file a JIRA with error message with problem analysis and describe how this PR fixes in PR description. Otherwise, no one knows what's going on about what you faced. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24335: [SPARK-27425][SQL] Add count_if function
dongjoon-hyun commented on issue #24335: [SPARK-27425][SQL] Add count_if function URL: https://github.com/apache/spark/pull/24335#issuecomment-499741465 cc @gatorsmile This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24335: [SPARK-27425][SQL] Add count_if function
dongjoon-hyun commented on issue #24335: [SPARK-27425][SQL] Add count_if function URL: https://github.com/apache/spark/pull/24335#issuecomment-499741306 @cryeo . Please update the PR description with more SQL references. You already told us `Presto/BigQuery/Excel` references. That will make this PR stronger. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #24805: [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation
HyukjinKwon commented on a change in pull request #24805: [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation URL: https://github.com/apache/spark/pull/24805#discussion_r291439572 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ## @@ -1420,9 +1420,9 @@ object ConvertToLocalRelation extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan transform { case Project(projectList, LocalRelation(output, data, isStreaming)) if !projectList.exists(hasUnevaluableExpr) => - val projection = new InterpretedProjection(projectList, output) + val projection = new InterpretedMutableProjection(projectList, output) projection.initialize(0) - LocalRelation(projectList.map(_.toAttribute), data.map(projection), isStreaming) + LocalRelation(projectList.map(_.toAttribute), data.map(projection(_).copy()), isStreaming) Review comment: I agree with this take (Option 2), This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24335: [SPARK-27425][SQL] Add count_if function
dongjoon-hyun commented on a change in pull request #24335: [SPARK-27425][SQL] Add count_if function URL: https://github.com/apache/spark/pull/24335#discussion_r291439473 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CountIf.scala ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types._ + +@ExpressionDescription( + usage = """ +_FUNC_(expr) - Returns the number of rows that the supplied expression is non-null and true. Review comment: I know this follows the description of `Count`, but it looks a little bit weird at `non-null and true`? `True` is already not a null. Can we say like Presto/BigQuery? Also, we can give the alternative for Spark 2.4 and older together like the following. ``` Returns the number of TRUE values for the expression. This function is equivalent to count(CASE WHEN x THEN 1 END). ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24805: [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation
SparkQA commented on issue #24805: [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation URL: https://github.com/apache/spark/pull/24805#issuecomment-499740453 **[Test build #106266 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106266/testReport)** for PR 24805 at commit [`69996a6`](https://github.com/apache/spark/commit/69996a61a8f1c8e0cba6a50f5f93f00e40d23c3b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24805: [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation
AmplabJenkins removed a comment on issue #24805: [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation URL: https://github.com/apache/spark/pull/24805#issuecomment-499740154 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11512/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24805: [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation
AmplabJenkins removed a comment on issue #24805: [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation URL: https://github.com/apache/spark/pull/24805#issuecomment-499740151 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24805: [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation
AmplabJenkins commented on issue #24805: [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation URL: https://github.com/apache/spark/pull/24805#issuecomment-499740154 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11512/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24805: [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation
AmplabJenkins commented on issue #24805: [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation URL: https://github.com/apache/spark/pull/24805#issuecomment-499740151 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #24805: [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation
viirya commented on a change in pull request #24805: [SPARK-27798][SQL] from_avro shouldn't produces same value when converted to local relation URL: https://github.com/apache/spark/pull/24805#discussion_r291438298 ## File path: external/avro/src/test/scala/org/apache/spark/sql/avro/AvroSuite.scala ## @@ -1491,4 +1494,38 @@ class AvroSuite extends QueryTest with SharedSQLContext with SQLTestUtils { |} """.stripMargin) } + + test("SPARK-27798: from_avro produces same value when converted to local relation") { Review comment: Moved. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #24807: [SPARK-27958] Stopping a SparkSession should not always stop Spark Context
HyukjinKwon commented on a change in pull request #24807: [SPARK-27958] Stopping a SparkSession should not always stop Spark Context URL: https://github.com/apache/spark/pull/24807#discussion_r291438274 ## File path: sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala ## @@ -711,12 +711,15 @@ class SparkSession private( // scalastyle:on /** - * Stop the underlying `SparkContext`. + * Stop the underlying `SparkContext` if there are are no active sessions remaining. * * @since 2.0.0 */ def stop(): Unit = { Review comment: Hey, I think this was a design decision that stopping sessions stops spark context too. Why don't you just don't call `stop()` on the session since what it does it just stops the session? Seems like the behaviour is documented properly as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception
AmplabJenkins removed a comment on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception URL: https://github.com/apache/spark/pull/24811#issuecomment-499739443 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106265/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception
AmplabJenkins removed a comment on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception URL: https://github.com/apache/spark/pull/24811#issuecomment-499739438 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception
SparkQA removed a comment on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception URL: https://github.com/apache/spark/pull/24811#issuecomment-499739362 **[Test build #106265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106265/testReport)** for PR 24811 at commit [`e96ced9`](https://github.com/apache/spark/commit/e96ced93159734cd83420853b7ab89706ecf8f99). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception
AmplabJenkins commented on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception URL: https://github.com/apache/spark/pull/24811#issuecomment-499739443 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106265/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception
AmplabJenkins commented on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception URL: https://github.com/apache/spark/pull/24811#issuecomment-499739438 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception
SparkQA commented on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception URL: https://github.com/apache/spark/pull/24811#issuecomment-499739434 **[Test build #106265 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106265/testReport)** for PR 24811 at commit [`e96ced9`](https://github.com/apache/spark/commit/e96ced93159734cd83420853b7ab89706ecf8f99). * This patch **fails RAT tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception
SparkQA commented on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception URL: https://github.com/apache/spark/pull/24811#issuecomment-499739362 **[Test build #106265 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106265/testReport)** for PR 24811 at commit [`e96ced9`](https://github.com/apache/spark/commit/e96ced93159734cd83420853b7ab89706ecf8f99). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception
AmplabJenkins removed a comment on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception URL: https://github.com/apache/spark/pull/24811#issuecomment-499739047 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception
AmplabJenkins commented on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception URL: https://github.com/apache/spark/pull/24811#issuecomment-499739056 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11511/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception
AmplabJenkins commented on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception URL: https://github.com/apache/spark/pull/24811#issuecomment-499739047 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception
HyukjinKwon commented on a change in pull request #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception URL: https://github.com/apache/spark/pull/24811#discussion_r291437663 ## File path: core/src/main/scala/org/apache/spark/deploy/RRunner.scala ## @@ -100,15 +100,17 @@ object RRunner { builder.redirectErrorStream(true) // Ugly but needed for stdout and stderr to synchronize val process = builder.start() -new RedirectThread(process.getInputStream, System.out, "redirect R output").start() +val stdoutBuffer = new CircularBuffer(1024) +val output = new TeeOutputStream(System.out, stdoutBuffer) +new RedirectThread(process.getInputStream, output, "redirect R output").start() -process.waitFor() +val returnCode = process.waitFor() +if (returnCode != 0) { + throw SparkUserAppException(returnCode, Option(stdoutBuffer.toString)) Review comment: or do you mean it's an issue because the error message is not included in the exception message? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception
AmplabJenkins removed a comment on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception URL: https://github.com/apache/spark/pull/24811#issuecomment-499739056 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11511/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception
HyukjinKwon commented on a change in pull request #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception URL: https://github.com/apache/spark/pull/24811#discussion_r291437528 ## File path: core/src/main/scala/org/apache/spark/deploy/RRunner.scala ## @@ -100,15 +100,17 @@ object RRunner { builder.redirectErrorStream(true) // Ugly but needed for stdout and stderr to synchronize val process = builder.start() -new RedirectThread(process.getInputStream, System.out, "redirect R output").start() +val stdoutBuffer = new CircularBuffer(1024) +val output = new TeeOutputStream(System.out, stdoutBuffer) +new RedirectThread(process.getInputStream, output, "redirect R output").start() -process.waitFor() +val returnCode = process.waitFor() +if (returnCode != 0) { + throw SparkUserAppException(returnCode, Option(stdoutBuffer.toString)) +} } finally { sparkRBackend.close() } - if (returnCode != 0) { Review comment: @jeremyjliu, can you show before/after error messages? Seems like we redirect stderr. Doesn't that work? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception
AmplabJenkins removed a comment on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception URL: https://github.com/apache/spark/pull/24811#issuecomment-499240743 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception
HyukjinKwon commented on issue #24811: [SPARK-27962][R][CORE] Propagate subprocess stdout in deploy.RRunner in exception URL: https://github.com/apache/spark/pull/24811#issuecomment-499738236 ok to test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24335: [SPARK-27425][SQL] Add count_if functions
dongjoon-hyun commented on a change in pull request #24335: [SPARK-27425][SQL] Add count_if functions URL: https://github.com/apache/spark/pull/24335#discussion_r291436969 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CountIf.scala ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types._ + +@ExpressionDescription( + usage = """ +_FUNC_(expr) - Returns the number of rows that the supplied expression is non-null and true. + """, + examples = """ +Examples: + > SELECT _FUNC_(col % 2 = 0) FROM VALUES (NULL), (0), (1), (2), (3) AS tab(col); + 2 + > SELECT _FUNC_(col IS NULL) FROM VALUES (NULL), (0), (1), (2), (3) AS tab(col); + 1 + """, + since = "3.0.0") +case class CountIf(predicate: Expression) extends UnevaluableAggregate with ImplicitCastInputTypes { + override def prettyName: String = "count_if" + + override def children: Seq[Expression] = predicate :: Nil + + override def nullable: Boolean = false + + override def dataType: DataType = LongType + + override def inputTypes: Seq[AbstractDataType] = BooleanType :: Nil Review comment: nit. ```scala override def inputTypes: Seq[AbstractDataType] = Seq(BooleanType) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #24814: set Int MaxValue
HyukjinKwon commented on a change in pull request #24814: set Int MaxValue URL: https://github.com/apache/spark/pull/24814#discussion_r291436506 ## File path: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala ## @@ -233,7 +233,7 @@ class Word2Vec extends Serializable with Logging { a += 1 } while (a < 2 * vocabSize) { - count(a) = 1e9.toInt + count(a) = Int.MaxValue Review comment: Both values are different. Why do we need to change? Can you file a JIRA since before/after aren't virtually same. ```scala scala> Int.MaxValue res0: Int = 2147483647 scala> 1e9.toInt res1: Int = 10 ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #24814: set Int MaxValue
HyukjinKwon commented on issue #24814: set Int MaxValue URL: https://github.com/apache/spark/pull/24814#issuecomment-499737601 Please review https://spark.apache.org/contributing.html before opening a pull request. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #24784: [SPARK-27938][SQL] Remove feature flag LEGACY_PASS_PARTITION_BY_AS_OPTIONS
HyukjinKwon closed pull request #24784: [SPARK-27938][SQL] Remove feature flag LEGACY_PASS_PARTITION_BY_AS_OPTIONS URL: https://github.com/apache/spark/pull/24784 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #24784: [SPARK-27938][SQL] Remove feature flag LEGACY_PASS_PARTITION_BY_AS_OPTIONS
HyukjinKwon commented on a change in pull request #24784: [SPARK-27938][SQL] Remove feature flag LEGACY_PASS_PARTITION_BY_AS_OPTIONS URL: https://github.com/apache/spark/pull/24784#discussion_r291435639 ## File path: sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala ## @@ -225,21 +225,13 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be } test("pass partitionBy as options") { -Seq(true, false).foreach { flag => - withSQLConf(SQLConf.LEGACY_PASS_PARTITION_BY_AS_OPTIONS.key -> s"$flag") { -Seq(1).toDF.write - .format("org.apache.spark.sql.test") - .partitionBy("col1", "col2") - .save() - -if (flag) { - val partColumns = LastOptions.parameters(DataSourceUtils.PARTITIONING_COLUMNS_KEY) - assert(DataSourceUtils.decodePartitioningColumns(partColumns) === Seq("col1", "col2")) -} else { - assert(!LastOptions.parameters.contains(DataSourceUtils.PARTITIONING_COLUMNS_KEY)) -} - } -} +Seq(1).toDF.write + .format("org.apache.spark.sql.test") + .partitionBy("col1", "col2") + .save() + +val partColumns = LastOptions.parameters(DataSourceUtils.PARTITIONING_COLUMNS_KEY) +assert(DataSourceUtils.decodePartitioningColumns(partColumns) === Seq("col1", "col2")) Review comment: `decodePartitioningColumns` is under `execution` package that's not supposed to be exposed so users shouldn't use this util directly. Did we document this option to any public datasource v1 API? We should also say this is a JSON string. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #24784: [SPARK-27938][SQL] Remove feature flag LEGACY_PASS_PARTITION_BY_AS_OPTIONS
HyukjinKwon commented on issue #24784: [SPARK-27938][SQL] Remove feature flag LEGACY_PASS_PARTITION_BY_AS_OPTIONS URL: https://github.com/apache/spark/pull/24784#issuecomment-499736700 LGTM too. strictly https://github.com/apache/spark/pull/24784#discussion_r291435639 can be done separately. Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24335: [SPARK-27425][SQL] Add count_if functions
dongjoon-hyun commented on a change in pull request #24335: [SPARK-27425][SQL] Add count_if functions URL: https://github.com/apache/spark/pull/24335#discussion_r291433907 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ## @@ -894,4 +894,30 @@ class DataFrameAggregateSuite extends QueryTest with SharedSQLContext { error.message.contains("function min_by does not support ordering on type map")) } } + + test("SPARK-27425: count_if function") { +def checkError(df: => DataFrame): Unit = { + val thrownException = the [AnalysisException] thrownBy df.queryExecution.analyzed + assert(thrownException.message.contains("function count_if requires boolean type")) +} + +withTempView("tempView") { + Seq(("a", None), ("a", Some(1)), ("a", Some(2)), ("a", Some(3)), +("b", None), ("b", Some(4)), ("b", Some(5)), ("b", Some(6))) +.toDF("x", "y") +.createOrReplaceTempView("tempView") + + checkAnswer( +sql("SELECT COUNT_IF(NULL), COUNT_IF(y % 2 = 0), COUNT_IF(y % 2 <> 0), " + + "COUNT_IF(y IS NULL) FROM tempView"), +Row(0L, 3L, 3L, 2L)) + + checkAnswer( +sql("SELECT x, COUNT_IF(NULL), COUNT_IF(y % 2 = 0), COUNT_IF(y % 2 <> 0), " + + "COUNT_IF(y IS NULL) FROM tempView GROUP BY x"), +Row("a", 0L, 1L, 2L, 1L) :: Row("b", 0L, 2L, 1L, 1L) :: Nil) + + checkError(sql("SELECT COUNT_IF(x) FROM tempView")) Review comment: We usually test like the following. ```scala val m = intercept[AnalysisException] { sql("SELECT COUNT_IF(x) FROM tempView") }.getMessage assert(m.contains("function count_if requires boolean type")) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24335: [SPARK-27425][SQL] Add count_if functions
dongjoon-hyun commented on a change in pull request #24335: [SPARK-27425][SQL] Add count_if functions URL: https://github.com/apache/spark/pull/24335#discussion_r291433432 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ## @@ -894,4 +894,30 @@ class DataFrameAggregateSuite extends QueryTest with SharedSQLContext { error.message.contains("function min_by does not support ordering on type map")) } } + + test("SPARK-27425: count_if function") { +def checkError(df: => DataFrame): Unit = { + val thrownException = the [AnalysisException] thrownBy df.queryExecution.analyzed + assert(thrownException.message.contains("function count_if requires boolean type")) +} Review comment: Let's not declare a function which is used once. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24335: [SPARK-27425][SQL] Add count_if functions
dongjoon-hyun commented on a change in pull request #24335: [SPARK-27425][SQL] Add count_if functions URL: https://github.com/apache/spark/pull/24335#discussion_r291433282 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ## @@ -894,4 +894,30 @@ class DataFrameAggregateSuite extends QueryTest with SharedSQLContext { error.message.contains("function min_by does not support ordering on type map")) } } + + test("SPARK-27425: count_if function") { Review comment: In general, we don't use SPARK JIRA id for new feature test case name. Could you remove `SPARK-27425: `? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24335: [SPARK-27425][SQL] Add count_if functions
dongjoon-hyun commented on a change in pull request #24335: [SPARK-27425][SQL] Add count_if functions URL: https://github.com/apache/spark/pull/24335#discussion_r291432774 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CountIf.scala ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types._ + +@ExpressionDescription( + usage = """ +_FUNC_(expr) - Returns the number of rows that the supplied expression is non-null and true. + """, + examples = """ +Examples: + > SELECT _FUNC_(col % 2 = 0) FROM VALUES (NULL), (0), (1), (2), (3) AS tab(col); + 2 + > SELECT _FUNC_(col IS NULL) FROM VALUES (NULL), (0), (1), (2), (3) AS tab(col); + 1 + """, + since = "3.0.0") +case class CountIf(predicate: Expression) extends UnevaluableAggregate with ImplicitCastInputTypes { + override def prettyName: String = "count_if" + + override def children: Seq[Expression] = predicate :: Nil + + override def nullable: Boolean = false + + override def dataType: DataType = LongType + + override def inputTypes: Seq[AbstractDataType] = BooleanType :: Nil + + override def checkInputDataTypes(): TypeCheckResult = predicate.dataType match { +case BooleanType => + TypeCheckResult.TypeCheckSuccess +case _ => + TypeCheckResult.TypeCheckFailure( +s"function ${prettyName} requires boolean type, not ${predicate.dataType.catalogString}" Review comment: `${prettyName}` -> `$prettyName`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch
AmplabJenkins removed a comment on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch URL: https://github.com/apache/spark/pull/24818#issuecomment-499732132 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch
AmplabJenkins removed a comment on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch URL: https://github.com/apache/spark/pull/24818#issuecomment-499732137 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11510/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24335: [SPARK-27425][SQL] Add count_if functions
dongjoon-hyun commented on a change in pull request #24335: [SPARK-27425][SQL] Add count_if functions URL: https://github.com/apache/spark/pull/24335#discussion_r291431861 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/CountIf.scala ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.types._ Review comment: Shall we import explicitly? ```scala import org.apache.spark.sql.catalyst.expressions.{Expression, ExpressionDescription, ImplicitCastInputTypes, UnevaluableAggregate} import org.apache.spark.sql.types.{AbstractDataType, BooleanType, DataType, LongType} ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch
SparkQA commented on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch URL: https://github.com/apache/spark/pull/24818#issuecomment-499732479 **[Test build #106264 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106264/testReport)** for PR 24818 at commit [`d848354`](https://github.com/apache/spark/commit/d8483541ee161ac249c8a439343a66136d2f0079). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch
AmplabJenkins commented on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch URL: https://github.com/apache/spark/pull/24818#issuecomment-499732132 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch
AmplabJenkins commented on issue #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch URL: https://github.com/apache/spark/pull/24818#issuecomment-499732137 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11510/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch
HyukjinKwon opened a new pull request #24818: [SPARK-27971[SQL][R] MapPartitionsInRWithArrowExec.evaluate shouldn't eagerly read the first batch URL: https://github.com/apache/spark/pull/24818 ## What changes were proposed in this pull request? This PR is the same fix as https://github.com/apache/spark/pull/24816 but in vectorized `dapply` in SparkR. ## How was this patch tested? Manually tested. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24734: [SPARK-27870][SQL][PySpark] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline)
AmplabJenkins removed a comment on issue #24734: [SPARK-27870][SQL][PySpark] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline) URL: https://github.com/apache/spark/pull/24734#issuecomment-499730842 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106262/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24734: [SPARK-27870][SQL][PySpark] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline)
AmplabJenkins removed a comment on issue #24734: [SPARK-27870][SQL][PySpark] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline) URL: https://github.com/apache/spark/pull/24734#issuecomment-499730835 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24734: [SPARK-27870][SQL][PySpark] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline)
AmplabJenkins commented on issue #24734: [SPARK-27870][SQL][PySpark] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline) URL: https://github.com/apache/spark/pull/24734#issuecomment-499730842 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106262/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24734: [SPARK-27870][SQL][PySpark] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline)
AmplabJenkins commented on issue #24734: [SPARK-27870][SQL][PySpark] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline) URL: https://github.com/apache/spark/pull/24734#issuecomment-499730835 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24734: [SPARK-27870][SQL][PySpark] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline)
SparkQA removed a comment on issue #24734: [SPARK-27870][SQL][PySpark] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline) URL: https://github.com/apache/spark/pull/24734#issuecomment-499697138 **[Test build #106262 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106262/testReport)** for PR 24734 at commit [`ed7aee0`](https://github.com/apache/spark/commit/ed7aee06344fd75e6921fa38a0f24183285b1e12). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24734: [SPARK-27870][SQL][PySpark] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline)
SparkQA commented on issue #24734: [SPARK-27870][SQL][PySpark] Flush batch timely for pandas UDF (for improving pandas UDFs pipeline) URL: https://github.com/apache/spark/pull/24734#issuecomment-499730555 **[Test build #106262 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106262/testReport)** for PR 24734 at commit [`ed7aee0`](https://github.com/apache/spark/commit/ed7aee06344fd75e6921fa38a0f24183285b1e12). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24815: [SPARK-27961][SQL] DataSourceV2Relation should not have refresh method
dongjoon-hyun commented on a change in pull request #24815: [SPARK-27961][SQL] DataSourceV2Relation should not have refresh method URL: https://github.com/apache/spark/pull/24815#discussion_r291424651 ## File path: sql/core/src/test/scala/org/apache/spark/sql/MetadataCacheSuite.scala ## @@ -57,13 +56,20 @@ abstract class MetadataCacheSuite extends QueryTest with SharedSQLContext { df.count() } assert(e.getMessage.contains("FileNotFoundException")) - assert(e.getMessage.contains("REFRESH")) + assert(e.getMessage.contains("recreating the Dataset/DataFrame involved")) } } +} + +class MetadataCacheV1Suite extends MetadataCacheSuite { + override protected def sparkConf: SparkConf = +super + .sparkConf + .set(SQLConf.USE_V1_SOURCE_READER_LIST, "orc") test("SPARK-16337,SPARK-27504 temporary view refresh") { Review comment: The `SPARK-27504` had better be removed from this test case name like [this](https://github.com/apache/spark/pull/24815/files#diff-0667b59236ca014a47b3fc20b6ea820eR41)? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24817: [WIP][SPARK-27963][core] Allow dynamic allocation without a shuffle service.
AmplabJenkins removed a comment on issue #24817: [WIP][SPARK-27963][core] Allow dynamic allocation without a shuffle service. URL: https://github.com/apache/spark/pull/24817#issuecomment-499728669 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106263/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24817: [WIP][SPARK-27963][core] Allow dynamic allocation without a shuffle service.
AmplabJenkins removed a comment on issue #24817: [WIP][SPARK-27963][core] Allow dynamic allocation without a shuffle service. URL: https://github.com/apache/spark/pull/24817#issuecomment-499728668 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org