[GitHub] spark issue #22204: [SPARK-25196][SQL] Analyze column statistics in cached q...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22204 cc @wzhfy as well --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95240/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22229: [MINOR] Fix Scala 2.12 build
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21732 **[Test build #95240 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95240/testReport)** for PR 21732 at commit [`c1f798f`](https://github.com/apache/spark/commit/c1f798f7e9cba0d04223eed06f1b1f547ec29dc5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22229: [MINOR] Fix Scala 2.12 build
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/9 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22229: [MINOR] Fix Scala 2.12 build
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95239/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22229: [MINOR] Fix Scala 2.12 build
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22229: [MINOR] Fix Scala 2.12 build
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9 **[Test build #95239 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95239/testReport)** for PR 9 at commit [`20a2aa2`](https://github.com/apache/spark/commit/20a2aa21813cbbb131110e00d2e5c1426d6dae78). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22214: [SPARK-23698][Python] Avoid 'undefined name' by defining...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22214 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95243/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22214: [SPARK-23698][Python] Avoid 'undefined name' by defining...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22214 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22214: [SPARK-23698][Python] Avoid 'undefined name' by defining...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22214 **[Test build #95243 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95243/testReport)** for PR 22214 at commit [`a077542`](https://github.com/apache/spark/commit/a077542e67213c0add6093c0bf8c7568b6cb7b32). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22214: [SPARK-23698][Python] Avoid 'undefined name' by d...
Github user cclauss closed the pull request at: https://github.com/apache/spark/pull/22214 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21087: [SPARK-23997][SQL] Configurable maximum number of bucket...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21087 **[Test build #95244 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95244/testReport)** for PR 21087 at commit [`ebd9265`](https://github.com/apache/spark/commit/ebd926530c1d8b2f515a4a233f5963eafc17e460). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21087: [SPARK-23997][SQL] Configurable maximum number of bucket...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21087 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21665: [SPARK-24688][examples]Modify the comments about Labeled...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21665 @srowen, +1 for taking over. Seems it's been inactive almost for a month. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22197: [SPARK-25207][SQL] Case-insensitve field resoluti...
Github user yucai commented on a diff in the pull request: https://github.com/apache/spark/pull/22197#discussion_r212788978 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -377,7 +378,7 @@ class ParquetFileFormat // Collects all converted Parquet filter predicates. Notice that not all predicates can be // converted (`ParquetFilters.createFilter` returns an `Option`). That's why a `flatMap` // is used here. - .flatMap(parquetFilters.createFilter(parquetSchema, _)) + .flatMap(parquetFilters.createFilter(parquetSchema, _, isCaseSensitive)) --- End diff -- Yes, that way is better. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22214: [SPARK-23698][Python] Avoid 'undefined name' by d...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22214#discussion_r212788823 --- Diff: python/setup.py --- @@ -28,6 +28,7 @@ file=sys.stderr) sys.exit(-1) +__version__ = "Unknown" # Prevent linters from raising 'undefined name' --- End diff -- Shall we simply noqa for `VERSION = __version__`? looks weird we have a version defined as Unknown to fix lint errors. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22197: [SPARK-25207][SQL] Case-insensitve field resoluti...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22197#discussion_r212788741 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -377,7 +378,7 @@ class ParquetFileFormat // Collects all converted Parquet filter predicates. Notice that not all predicates can be // converted (`ParquetFilters.createFilter` returns an `Option`). That's why a `flatMap` // is used here. - .flatMap(parquetFilters.createFilter(parquetSchema, _)) + .flatMap(parquetFilters.createFilter(parquetSchema, _, isCaseSensitive)) --- End diff -- can we pass this config when creating `ParquetFilters`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22189: Correct missing punctuation in the documentation
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/22189 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22214: [SPARK-23698][Python] Avoid 'undefined name' by defining...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22214 **[Test build #95243 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95243/testReport)** for PR 22214 at commit [`a077542`](https://github.com/apache/spark/commit/a077542e67213c0add6093c0bf8c7568b6cb7b32). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22214: [SPARK-23698][Python] Avoid 'undefined name' by defining...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22214 It should have been better fixed together rather then 3 PRs to fix the same instances. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22214: [SPARK-23698][Python] Avoid 'undefined name' by defining...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22214 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22189: Correct missing punctuation in the documentation
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22189 Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22208: [SPARK-25216][SQL] Improve error message when a c...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/22208#discussion_r212788594 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -216,8 +216,16 @@ class Dataset[T] private[sql]( private[sql] def resolve(colName: String): NamedExpression = { queryExecution.analyzed.resolveQuoted(colName, sparkSession.sessionState.analyzer.resolver) .getOrElse { -throw new AnalysisException( - s"""Cannot resolve column name "$colName" among (${schema.fieldNames.mkString(", ")})""") +if (schema.fieldNames.contains(colName)) { + throw new AnalysisException( +s"""Cannot resolve column name "$colName" among (${schema.fieldNames.mkString(", ")}). + | Try adding backticks to the column name, i.e., `$colName`""" --- End diff -- Yup, please go ahead. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21087: [SPARK-23997][SQL] Configurable maximum number of bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21087 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21087: [SPARK-23997][SQL] Configurable maximum number of bucket...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21087 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95238/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21087: [SPARK-23997][SQL] Configurable maximum number of bucket...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21087 **[Test build #95238 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95238/testReport)** for PR 21087 at commit [`ebd9265`](https://github.com/apache/spark/commit/ebd926530c1d8b2f515a4a233f5963eafc17e460). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22112 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2550/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22112 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22112: [SPARK-23243][Core] Fix RDD.repartition() data correctne...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22112 **[Test build #95242 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95242/testReport)** for PR 22112 at commit [`81bd74a`](https://github.com/apache/spark/commit/81bd74ae132c9ec89cfaeeb1f2f8cba20cab3f9e). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22209 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95233/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22209 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22209 **[Test build #95233 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95233/testReport)** for PR 22209 at commit [`70678dc`](https://github.com/apache/spark/commit/70678dc957255d459ba0b5c3960f8a6c8767f50d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22198: [SPARK-25121][SQL] Supports multi-part table names for b...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22198 @dilipbiswal @gatorsmile ping --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22204: [SPARK-25196][SQL] Analyze column statistics in cached q...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/22204 @gatorsmile ping --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21734: [SPARK-24149][YARN][FOLLOW-UP] Only get the delegation t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21734 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21734: [SPARK-24149][YARN][FOLLOW-UP] Only get the delegation t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21734 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95241/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21734: [SPARK-24149][YARN][FOLLOW-UP] Only get the delegation t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21734 **[Test build #95241 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95241/testReport)** for PR 21734 at commit [`f3aa2a4`](https://github.com/apache/spark/commit/f3aa2a4e7ed180e54157394dc9721c53de444866). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22141: [SPARK-25154][SQL] Support NOT IN sub-queries ins...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22141#discussion_r212785260 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala --- @@ -137,13 +137,21 @@ object RewritePredicateSubquery extends Rule[LogicalPlan] with PredicateHelper { plan: LogicalPlan): (Option[Expression], LogicalPlan) = { var newPlan = plan val newExprs = exprs.map { e => - e transformUp { + e transformDown { case Exists(sub, conditions, _) => val exists = AttributeReference("exists", BooleanType, nullable = false)() // Deduplicate conflicting attributes if any. newPlan = dedupJoin( Join(newPlan, sub, ExistenceJoin(exists), conditions.reduceLeftOption(And))) exists +case (Not(InSubquery(values, ListQuery(sub, conditions, _, _ => + val exists = AttributeReference("exists", BooleanType, nullable = false)() --- End diff -- yea, it's ok to keep the current one. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [Core] Implements split with limit sql fun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/7 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95237/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [Core] Implements split with limit sql fun...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/7 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [Core] Implements split with limit sql fun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/7 **[Test build #95237 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95237/testReport)** for PR 7 at commit [`ceb3f41`](https://github.com/apache/spark/commit/ceb3f41238c8731606164cea5c45a0b87bb5d6f2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22208: [SPARK-25216][SQL] Improve error message when a column c...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22208 Could you add some unit tests for this? At least, we had better check the error message for both `spark.sql.caseSensitive=true` and `spark.sql.caseSensitive=false`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22208: [SPARK-25216][SQL] Improve error message when a c...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22208#discussion_r212785096 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -216,8 +216,16 @@ class Dataset[T] private[sql]( private[sql] def resolve(colName: String): NamedExpression = { queryExecution.analyzed.resolveQuoted(colName, sparkSession.sessionState.analyzer.resolver) .getOrElse { -throw new AnalysisException( - s"""Cannot resolve column name "$colName" among (${schema.fieldNames.mkString(", ")})""") +if (schema.fieldNames.contains(colName)) { --- End diff -- @icexelloss . This cannot handle mixed cases like the following. This should be handled for the purpose of this PR. Please use `sparkSession.sessionState.analyzer.resolver`. ```python spark.range(0, 1).toDF('A.b')['a.B'] ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22208: [SPARK-25216][SQL] Improve error message when a c...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/22208#discussion_r212784956 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -216,8 +216,16 @@ class Dataset[T] private[sql]( private[sql] def resolve(colName: String): NamedExpression = { queryExecution.analyzed.resolveQuoted(colName, sparkSession.sessionState.analyzer.resolver) .getOrElse { -throw new AnalysisException( - s"""Cannot resolve column name "$colName" among (${schema.fieldNames.mkString(", ")})""") +if (schema.fieldNames.contains(colName)) { + throw new AnalysisException( +s"""Cannot resolve column name "$colName" among (${schema.fieldNames.mkString(", ")}). + | Try adding backticks to the column name, i.e., `$colName`""" + .stripMargin.replaceAll("\n", "")) +} else { + throw new AnalysisException( +s"""Cannot resolve column name "$colName" among (${schema.fieldNames.mkString(", ")}""" --- End diff -- At the end of message, `)` is missing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22226: [SPARK-24391][SQL] Support arrays of any types by to_jso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/6 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22226: [SPARK-24391][SQL] Support arrays of any types by to_jso...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/6 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95235/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22226: [SPARK-24391][SQL] Support arrays of any types by to_jso...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/6 **[Test build #95235 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95235/testReport)** for PR 6 at commit [`c1755de`](https://github.com/apache/spark/commit/c1755decf93c503dcb3de957408908177a75cf3e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21734: [SPARK-24149][YARN][FOLLOW-UP] Only get the delegation t...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21734 **[Test build #95241 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95241/testReport)** for PR 21734 at commit [`f3aa2a4`](https://github.com/apache/spark/commit/f3aa2a4e7ed180e54157394dc9721c53de444866). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21734: [SPARK-24149][YARN][FOLLOW-UP] Only get the delegation t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21734 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2549/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21734: [SPARK-24149][YARN][FOLLOW-UP] Only get the delegation t...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21734 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22218: [SPARK-25228][CORE]Add executor CPU time metric.
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/22218#discussion_r212784518 --- Diff: core/src/main/scala/org/apache/spark/executor/ExecutorSource.scala --- @@ -73,6 +75,13 @@ class ExecutorSource(threadPool: ThreadPoolExecutor, executorId: String) extends registerFileSystemStat(scheme, "write_ops", _.getWriteOps(), 0) } + // Dropwizard metrics gauge measuring the executor's process (JVM) CPU time. + // The value is returned in nanoseconds, the method return -1 if this operation is not supported. + val osMXBean = ManagementFactory.getOperatingSystemMXBean.asInstanceOf[OperatingSystemMXBean] + metricRegistry.register(MetricRegistry.name("executorCPUTime" ), new Gauge[Long] { +override def getValue: Long = osMXBean.getProcessCpuTime() --- End diff -- This metric is useful for users? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22222: [SPARK-25083][SQL] Remove the type erasure hack in data ...
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/2 cc @cloud-fan and @rdblue have a look when you have time. If this PR doesn't coincide with your expect, I'll close this soon. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22222: [SPARK-25083][SQL] Remove the type erasure hack i...
Github user xuanyuanking commented on a diff in the pull request: https://github.com/apache/spark/pull/2#discussion_r212784374 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ColumnarBatchScan.scala --- @@ -40,6 +42,29 @@ private[sql] trait ColumnarBatchScan extends CodegenSupport { "numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of output rows"), "scanTime" -> SQLMetrics.createTimingMetric(sparkContext, "scan time")) + /** + * Returns all the RDDs of ColumnarBatch which generates the input rows. + */ + def inputBatchRDDs(): Seq[RDD[ColumnarBatch]] + + /** + * Returns all the RDDs of InternalRow which generates the input rows. + */ + def inputRowRDDs(): Seq[RDD[InternalRow]] + + /** + * Get input RDD depends on supportsBatch. + */ + final def getInputRDDs(): Seq[RDD[InternalRow]] = { +if (supportsBatch) { + inputBatchRDDs().asInstanceOf[Seq[RDD[InternalRow]]] --- End diff -- Here maybe the last explicitly erasure hack left, please check whether is it acceptable or not. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22063 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22063 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95234/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22063: [WIP][SPARK-25044][SQL] Address translation of LMF closu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22063 **[Test build #95234 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95234/testReport)** for PR 22063 at commit [`721860d`](https://github.com/apache/spark/commit/721860d38b889848a4facd691146bd81bf8fa905). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2548/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21732 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21732: [SPARK-24762][SQL] Enable Option of Product encoders
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21732 **[Test build #95240 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95240/testReport)** for PR 21732 at commit [`c1f798f`](https://github.com/apache/spark/commit/c1f798f7e9cba0d04223eed06f1b1f547ec29dc5). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22220: [SPARK-25229][SQL] Partition fields are uniformly...
Github user ouyangxiaochen closed the pull request at: https://github.com/apache/spark/pull/0 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22209 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95229/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22209 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22209: [SPARK-24415][Core] Fixed the aggregated stage metrics b...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22209 **[Test build #95229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95229/testReport)** for PR 22209 at commit [`70678dc`](https://github.com/apache/spark/commit/70678dc957255d459ba0b5c3960f8a6c8767f50d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [Core] Implements split with limit ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r212783356 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -232,30 +232,41 @@ case class RLike(left: Expression, right: Expression) extends StringRegexExpress * Splits str around pat (pattern is a regular expression). */ @ExpressionDescription( - usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match `regex`.", + usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences that match `regex`." + +"The `limit` parameter controls the number of times the pattern is applied and " + +"therefore affects the length of the resulting array. If the limit n is " + +"greater than zero then the pattern will be applied at most n - 1 times, " + +"the array's length will be no greater than n, and the array's last entry " + +"will contain all input beyond the last matched delimiter. If n is " + +"non-positive then the pattern will be applied as many times as " + +"possible and the array can have any length. If n is zero then the " + +"pattern will be applied as many times as possible, the array can " + +"have any length, and trailing empty strings will be discarded.", --- End diff -- hmm, is it possible to make this usage more compact? I think the usage here should be concise. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [Core] Implements split with limit ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r212783481 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -232,30 +232,41 @@ case class RLike(left: Expression, right: Expression) extends StringRegexExpress * Splits str around pat (pattern is a regular expression). */ @ExpressionDescription( - usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match `regex`.", + usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences that match `regex`." + +"The `limit` parameter controls the number of times the pattern is applied and " + +"therefore affects the length of the resulting array. If the limit n is " + +"greater than zero then the pattern will be applied at most n - 1 times, " + +"the array's length will be no greater than n, and the array's last entry " + +"will contain all input beyond the last matched delimiter. If n is " + +"non-positive then the pattern will be applied as many times as " + +"possible and the array can have any length. If n is zero then the " + +"pattern will be applied as many times as possible, the array can " + +"have any length, and trailing empty strings will be discarded.", examples = """ Examples: - > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]'); + > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', -1); ["one","two","three",""] +| > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', 2); + | ["one","twoBthreeC"] """) -case class StringSplit(str: Expression, pattern: Expression) - extends BinaryExpression with ImplicitCastInputTypes { +case class StringSplit(str: Expression, pattern: Expression, limit: Expression) + extends TernaryExpression with ImplicitCastInputTypes { - override def left: Expression = str - override def right: Expression = pattern override def dataType: DataType = ArrayType(StringType) - override def inputTypes: Seq[DataType] = Seq(StringType, StringType) + override def inputTypes: Seq[DataType] = Seq(StringType, StringType, IntegerType) + override def children: Seq[Expression] = str :: pattern :: limit :: Nil - override def nullSafeEval(string: Any, regex: Any): Any = { -val strings = string.asInstanceOf[UTF8String].split(regex.asInstanceOf[UTF8String], -1) + override def nullSafeEval(string: Any, regex: Any, limit: Any): Any = { --- End diff -- I think we still need to do some check on `limit`. According to Presto document, `limit` must be a positive number. -1 is only used when no `limit` parameter is given (default value). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [Core] Implements split with limit ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r212783375 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -232,30 +232,41 @@ case class RLike(left: Expression, right: Expression) extends StringRegexExpress * Splits str around pat (pattern is a regular expression). */ @ExpressionDescription( - usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match `regex`.", + usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences that match `regex`." + +"The `limit` parameter controls the number of times the pattern is applied and " + +"therefore affects the length of the resulting array. If the limit n is " + +"greater than zero then the pattern will be applied at most n - 1 times, " + +"the array's length will be no greater than n, and the array's last entry " + +"will contain all input beyond the last matched delimiter. If n is " + +"non-positive then the pattern will be applied as many times as " + +"possible and the array can have any length. If n is zero then the " + +"pattern will be applied as many times as possible, the array can " + +"have any length, and trailing empty strings will be discarded.", examples = """ Examples: - > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]'); + > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', -1); --- End diff -- I think it is better to keep original example for default value. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22229: [MINOR] Fix Scala 2.12 build
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9 **[Test build #95239 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95239/testReport)** for PR 9 at commit [`20a2aa2`](https://github.com/apache/spark/commit/20a2aa21813cbbb131110e00d2e5c1426d6dae78). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22229: [MINOR] Fix Scala 2.12 build
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22229: [MINOR] Fix Scala 2.12 build
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2547/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22229: [MINOR] Fix Scala 2.12 build
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22229: [MINOR] Fix Scala 2.12 build
Github user viirya commented on the issue: https://github.com/apache/spark/pull/9 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22229: [MINOR] Fix Scala 2.12 build
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/9 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95231/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22229: [MINOR] Fix Scala 2.12 build
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/9 **[Test build #95231 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95231/testReport)** for PR 9 at commit [`20a2aa2`](https://github.com/apache/spark/commit/20a2aa21813cbbb131110e00d2e5c1426d6dae78). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [Core] Implements split with limit ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r212783216 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -232,30 +232,41 @@ case class RLike(left: Expression, right: Expression) extends StringRegexExpress * Splits str around pat (pattern is a regular expression). */ @ExpressionDescription( - usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match `regex`.", + usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences that match `regex`." + +"The `limit` parameter controls the number of times the pattern is applied and " + +"therefore affects the length of the resulting array. If the limit n is " + +"greater than zero then the pattern will be applied at most n - 1 times, " + +"the array's length will be no greater than n, and the array's last entry " + +"will contain all input beyond the last matched delimiter. If n is " + +"non-positive then the pattern will be applied as many times as " + +"possible and the array can have any length. If n is zero then the " + +"pattern will be applied as many times as possible, the array can " + +"have any length, and trailing empty strings will be discarded.", examples = """ Examples: - > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]'); + > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', -1); ["one","two","three",""] +| > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', 2); + | ["one","twoBthreeC"] """) -case class StringSplit(str: Expression, pattern: Expression) - extends BinaryExpression with ImplicitCastInputTypes { +case class StringSplit(str: Expression, pattern: Expression, limit: Expression) --- End diff -- For test coverage, better to add tests in `string-functions.sql` for the two cases: two arguments and three arguments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [Core] Implements split with limit sql fun...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/7 @phegstrom Thanks for your contribution! Btw, seems like your email address in your commits is not connected to your GitHub account. Could you connect the address to your account, or use other address connected to your account? Otherwise, your contribution will not be connected to you. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [Core] Implements split with limit ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r212783068 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2554,7 +2554,27 @@ object functions { * @since 1.5.0 */ def split(str: Column, pattern: String): Column = withExpr { -StringSplit(str.expr, lit(pattern).expr) +StringSplit(str.expr, lit(pattern).expr, lit(-1).expr) + } + + /** + * Splits str around pattern (pattern is a regular expression) up to `limit-1` times. + * + * The limit parameter controls the number of times the pattern is applied and therefore + * affects the length of the resulting array. If the limit n is greater than zero then the + * pattern will be applied at most n - 1 times, the array's length will be no greater than + * n, and the array's last entry will contain all input beyond the last matched delimiter. + * If n is non-positive then the pattern will be applied as many times as possible and the + * array can have any length. If n is zero then the pattern will be applied as many times as + * possible, the array can have any length, and trailing empty strings will be discarded. + * + * @note Pattern is a string representation of the regular expression. + * + * @group string_funcs + * @since 1.5.0 + */ + def split(str: Column, pattern: String, limit: Int): Column = withExpr { +StringSplit(str.expr, lit(pattern).expr, lit(limit).expr) --- End diff -- nit: better to directly use `Literal` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [Core] Implements split with limit ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r212782616 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -232,30 +232,41 @@ case class RLike(left: Expression, right: Expression) extends StringRegexExpress * Splits str around pat (pattern is a regular expression). */ @ExpressionDescription( - usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match `regex`.", + usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences that match `regex`." + +"The `limit` parameter controls the number of times the pattern is applied and " + +"therefore affects the length of the resulting array. If the limit n is " + +"greater than zero then the pattern will be applied at most n - 1 times, " + +"the array's length will be no greater than n, and the array's last entry " + +"will contain all input beyond the last matched delimiter. If n is " + +"non-positive then the pattern will be applied as many times as " + +"possible and the array can have any length. If n is zero then the " + +"pattern will be applied as many times as possible, the array can " + +"have any length, and trailing empty strings will be discarded.", examples = """ Examples: - > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]'); + > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', -1); ["one","two","three",""] +| > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', 2); + | ["one","twoBthreeC"] """) -case class StringSplit(str: Expression, pattern: Expression) - extends BinaryExpression with ImplicitCastInputTypes { +case class StringSplit(str: Expression, pattern: Expression, limit: Expression) --- End diff -- We still need to support 2 arguments. Please add a constructor `def this(str: Expression, pattern: Expression)`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [Core] Implements split with limit ...
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r212782550 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -232,30 +232,41 @@ case class RLike(left: Expression, right: Expression) extends StringRegexExpress * Splits str around pat (pattern is a regular expression). */ @ExpressionDescription( - usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match `regex`.", + usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences that match `regex`." + +"The `limit` parameter controls the number of times the pattern is applied and " + +"therefore affects the length of the resulting array. If the limit n is " + +"greater than zero then the pattern will be applied at most n - 1 times, " + +"the array's length will be no greater than n, and the array's last entry " + +"will contain all input beyond the last matched delimiter. If n is " + +"non-positive then the pattern will be applied as many times as " + +"possible and the array can have any length. If n is zero then the " + +"pattern will be applied as many times as possible, the array can " + +"have any length, and trailing empty strings will be discarded.", examples = """ Examples: - > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]'); + > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', -1); ["one","two","three",""] +| > SELECT _FUNC_('oneAtwoBthreeC', '[ABC]', 2); + | ["one","twoBthreeC"] --- End diff -- nit: remove `|`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [Core] Implements split with limit ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r212782784 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala --- @@ -232,30 +232,41 @@ case class RLike(left: Expression, right: Expression) extends StringRegexExpress * Splits str around pat (pattern is a regular expression). */ @ExpressionDescription( - usage = "_FUNC_(str, regex) - Splits `str` around occurrences that match `regex`.", + usage = "_FUNC_(str, regex, limit) - Splits `str` around occurrences that match `regex`." + --- End diff -- Can you refine the description and the format along with the others, e.g., `RLike` https://github.com/apache/spark/blob/ceb3f41238c8731606164cea5c45a0b87bb5d6f2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/regexpExpressions.scala#L78 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21087: [SPARK-23997][SQL] Configurable maximum number of bucket...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21087 **[Test build #95238 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95238/testReport)** for PR 21087 at commit [`ebd9265`](https://github.com/apache/spark/commit/ebd926530c1d8b2f515a4a233f5963eafc17e460). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21977: [SPARK-25004][CORE] Add spark.executor.pyspark.me...
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/21977#discussion_r212782657 --- Diff: resource-managers/yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala --- @@ -161,6 +162,11 @@ abstract class BaseYarnClusterSuite } extraJars.foreach(launcher.addJar) +if (outFile.isDefined) { --- End diff -- Like I said, I think `foreach` is a bad practice with options, so I'd rather not change to use it. I'd be happy to change this to a pattern match if you think it is really desirable to get rid of the `.get`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #21087: [SPARK-23997][SQL] Configurable maximum number of bucket...
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/21087 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [Core] Implements split with limit sql fun...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/7 **[Test build #95237 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95237/testReport)** for PR 7 at commit [`ceb3f41`](https://github.com/apache/spark/commit/ceb3f41238c8731606164cea5c45a0b87bb5d6f2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [Core] Implements split with limit ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r212782576 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2554,7 +2554,27 @@ object functions { * @since 1.5.0 */ def split(str: Column, pattern: String): Column = withExpr { -StringSplit(str.expr, lit(pattern).expr) +StringSplit(str.expr, lit(pattern).expr, lit(-1).expr) + } + + /** + * Splits str around pattern (pattern is a regular expression) up to `limit-1` times. --- End diff -- Drop `up to `limit-1` times` in the first line? That's because the behaviour depends on values described below. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [Core] Implements split with limit sql fun...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/7 ok to test. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22216: [SPARK-25223][SQL] Use a map to store values for NamedLa...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/22216 I see. Let me think again. I'd close this for now. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22227: [SPARK-25202] [Core] Implements split with limit ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/7#discussion_r212781563 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2554,7 +2554,27 @@ object functions { * @since 1.5.0 */ def split(str: Column, pattern: String): Column = withExpr { -StringSplit(str.expr, lit(pattern).expr) +StringSplit(str.expr, lit(pattern).expr, lit(-1).expr) + } + + /** + * Splits str around pattern (pattern is a regular expression) up to `limit-1` times. + * + * The limit parameter controls the number of times the pattern is applied and therefore + * affects the length of the resulting array. If the limit n is greater than zero then the + * pattern will be applied at most n - 1 times, the array's length will be no greater than + * n, and the array's last entry will contain all input beyond the last matched delimiter. + * If n is non-positive then the pattern will be applied as many times as possible and the + * array can have any length. If n is zero then the pattern will be applied as many times as + * possible, the array can have any length, and trailing empty strings will be discarded. + * + * @note Pattern is a string representation of the regular expression. + * + * @group string_funcs + * @since 1.5.0 --- End diff -- `1.5.0` -> `2.4.0` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22216: [SPARK-25223][SQL] Use a map to store values for ...
Github user ueshin closed the pull request at: https://github.com/apache/spark/pull/22216 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API
Github user NiharS commented on a diff in the pull request: https://github.com/apache/spark/pull/22192#discussion_r212781171 --- Diff: core/src/test/java/org/apache/spark/ExecutorPluginSuite.java --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import org.apache.spark.api.java.JavaSparkContext; + +import org.junit.Assert; +import org.junit.Test; + +// Tests loading plugins into executors +public class ExecutorPluginSuite { + // Static value modified by testing plugin to ensure plugin loaded correctly. + public static int numSuccessfulPlugins = 0; + + private SparkConf initializeSparkConf(String pluginNames) { +return new SparkConf() +.setMaster("local") +.setAppName("test") +.set("spark.executor.plugins", pluginNames); --- End diff -- I had the same first response. The original executor plugin code was a class in Java, so I figured I should keep my test in Java as well (manually checked beforehand, couldn't find any instances of Java stuff being primarily tested in Scala so I figured it best not to do that). Perhaps it's for the plugin-writers to get to choose their plugin language, since it's easier to extend a Java interface in Scala, than to extend a Scala trait in Java? I'll leave it as is for the time being --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [Core] Implements split with limit sql fun...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/7 Can you add tests in `StringFunctionsSuite`, too? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22192#discussion_r212780758 --- Diff: core/src/test/java/org/apache/spark/ExecutorPluginSuite.java --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import org.apache.spark.api.java.JavaSparkContext; + +import org.junit.Assert; +import org.junit.Test; + +// Tests loading plugins into executors +public class ExecutorPluginSuite { + // Static value modified by testing plugin to ensure plugin loaded correctly. + public static int numSuccessfulPlugins = 0; + + private SparkConf initializeSparkConf(String pluginNames) { +return new SparkConf() +.setMaster("local") +.setAppName("test") +.set("spark.executor.plugins", pluginNames); --- End diff -- Ah, crap, this is Java... any specific reason to do this in Java? Otherwise it's probably fine the way it is. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [Core] Implements split with limit sql fun...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/7 @gatorsmile @ueshin can you trigger this test? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22227: [SPARK-25202] [Core] Implements split with limit sql fun...
Github user maropu commented on the issue: https://github.com/apache/spark/pull/7 not `[CORE]` but `[SQL]` in the title. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API
Github user NiharS commented on a diff in the pull request: https://github.com/apache/spark/pull/22192#discussion_r212780252 --- Diff: core/src/test/java/org/apache/spark/ExecutorPluginSuite.java --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import org.apache.spark.api.java.JavaSparkContext; + +import org.junit.Assert; +import org.junit.Test; + +// Tests loading plugins into executors +public class ExecutorPluginSuite { + // Static value modified by testing plugin to ensure plugin loaded correctly. + public static int numSuccessfulPlugins = 0; + + private SparkConf initializeSparkConf(String pluginNames) { +return new SparkConf() +.setMaster("local") +.setAppName("test") +.set("spark.executor.plugins", pluginNames); --- End diff -- Is it alright to use Seq in a Java file? I couldn't find it in any other java files in Spark that aren't in a target/ directory. I don't think the compiler is getting mad when I import it but want to make sure I'm sticking with the usual style rules --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22230: [SPARK-25214][SS][FOLLOWUP]Fix the issue that Kafka v2 s...
Github user tdas commented on the issue: https://github.com/apache/spark/pull/22230 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/22192#discussion_r212779435 --- Diff: core/src/test/java/org/apache/spark/ExecutorPluginSuite.java --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import org.apache.spark.api.java.JavaSparkContext; + +import org.junit.Assert; +import org.junit.Test; + +// Tests loading plugins into executors +public class ExecutorPluginSuite { + // Static value modified by testing plugin to ensure plugin loaded correctly. + public static int numSuccessfulPlugins = 0; + + private SparkConf initializeSparkConf(String pluginNames) { +return new SparkConf() +.setMaster("local") +.setAppName("test") +.set("spark.executor.plugins", pluginNames); --- End diff -- `.set(EXECUTOR_PLUGINS, Seq("class1", "class2"))` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API
Github user NiharS commented on a diff in the pull request: https://github.com/apache/spark/pull/22192#discussion_r212778867 --- Diff: core/src/test/java/org/apache/spark/ExecutorPluginSuite.java --- @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import org.apache.spark.api.java.JavaSparkContext; + +import org.junit.Assert; +import org.junit.Test; + +// Tests loading plugins into executors +public class ExecutorPluginSuite { + // Static value modified by testing plugin to ensure plugin loaded correctly. + public static int numSuccessfulPlugins = 0; + + private SparkConf initializeSparkConf(String pluginNames) { +return new SparkConf() +.setMaster("local") +.setAppName("test") +.set("spark.executor.plugins", pluginNames); --- End diff -- Sorry, could you explain this for me? Do you mean instead of `.set("spark.executor.plugins", pluginNames)` I should a) have a variable storing the config name and b) pass the plugin names to initializeSparkConf as a list, and have it as `.set(EXECUTOR_PLUGIN_CONF_NAME, String.join(",", pluginNames)`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #22192: [SPARK-24918][Core] Executor Plugin API
Github user NiharS commented on a diff in the pull request: https://github.com/apache/spark/pull/22192#discussion_r212777617 --- Diff: core/src/main/java/org/apache/spark/ExecutorPlugin.java --- @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark; + +import org.apache.spark.annotation.DeveloperApi; + +/** + * A plugin which can be automaticaly instantiated within each Spark executor. Users can specify + * plugins which should be created with the "spark.executor.plugins" configuration. An instance + * of each plugin will be created for every executor, including those created by dynamic allocation, + * before the executor starts running any tasks. + * + * The specific api exposed to the end users still considered to be very unstable. We will + * *hopefully* be able to keep compatability by providing default implementations for any methods + * added, but make no guarantees this will always be possible across all spark releases. + * + * Spark does nothing to verify the plugin is doing legitimate things, or to manage the resources + * it uses. A plugin acquires the same privileges as the user running the task. A bad plugin + * could also intefere with task execution and make the executor fail in unexpected ways. + */ +@DeveloperApi +public interface ExecutorPlugin { --- End diff -- Pinging @squito for this one, not sure if he specifically wanted an empty API to start (and have plugins run to their own completion or until the JVM cleans them up) or was considering adding methods later --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org