[GitHub] [spark] cfmcgrady closed pull request #32488: [SPARK-35316][SQL] UnwrapCastInBinaryComparison support In/InSet predicate
cfmcgrady closed pull request #32488: URL: https://github.com/apache/spark/pull/32488 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #32523: [SPARK-35382][PYTHON] Fix lambda variable name issues in nested DataFrame functions in Python APIs.
HyukjinKwon closed pull request #32523: URL: https://github.com/apache/spark/pull/32523 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32523: [SPARK-35382][PYTHON] Fix lambda variable name issues in nested DataFrame functions in Python APIs.
HyukjinKwon commented on pull request #32523: URL: https://github.com/apache/spark/pull/32523#issuecomment-840324993 Merged to master and branch-3.1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32448: [SPARK-35290][SQL] Use StructType merging for unionByName with null filling
SparkQA removed a comment on pull request #32448: URL: https://github.com/apache/spark/pull/32448#issuecomment-840218983 **[Test build #138481 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138481/testReport)** for PR 32448 at commit [`93b47d3`](https://github.com/apache/spark/commit/93b47d3f190369afdf5a2a5ae0ec0c6054b56c1b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32448: [SPARK-35290][SQL] Use StructType merging for unionByName with null filling
SparkQA commented on pull request #32448: URL: https://github.com/apache/spark/pull/32448#issuecomment-840324232 **[Test build #138481 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138481/testReport)** for PR 32448 at commit [`93b47d3`](https://github.com/apache/spark/commit/93b47d3f190369afdf5a2a5ae0ec0c6054b56c1b). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke
SparkQA removed a comment on pull request #32527: URL: https://github.com/apache/spark/pull/32527#issuecomment-840217408 **[Test build #138480 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138480/testReport)** for PR 32527 at commit [`2831f9c`](https://github.com/apache/spark/commit/2831f9c0b78aa21c6cc906370fb5069e166dbf39). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke
SparkQA commented on pull request #32527: URL: https://github.com/apache/spark/pull/32527#issuecomment-840322575 **[Test build #138480 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138480/testReport)** for PR 32527 at commit [`2831f9c`](https://github.com/apache/spark/commit/2831f9c0b78aa21c6cc906370fb5069e166dbf39). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page
SparkQA commented on pull request #32204: URL: https://github.com/apache/spark/pull/32204#issuecomment-840318050 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43012/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page
SparkQA commented on pull request #32204: URL: https://github.com/apache/spark/pull/32204#issuecomment-840315107 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43012/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page
HyukjinKwon edited a comment on pull request #32204: URL: https://github.com/apache/spark/pull/32204#issuecomment-840312271 @itholic: 1. Please check the option **one by one** and see if each exists, and is matched. 2. Document general options in https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html if there are missing ones 3. If you're going to do 2. separately in another PR and JIRA, don't remove general options in API documentations for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon edited a comment on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page
HyukjinKwon edited a comment on pull request #32204: URL: https://github.com/apache/spark/pull/32204#issuecomment-840312271 @itholic: 1. Please check the option **one by one** and see if each exists. 2. Document general options in https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html if there are missing ones 3. If you're going to do 2. separately in another PR and JIRA, don't remove general options in API documentations for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
AmplabJenkins removed a comment on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-840312669 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43008/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
AmplabJenkins commented on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-840312669 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43008/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
SparkQA commented on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-840312637 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.
HyukjinKwon commented on pull request #32161: URL: https://github.com/apache/spark/pull/32161#issuecomment-840312618 Same comment goes here too: https://github.com/apache/spark/pull/32204#issuecomment-840312271 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32531: [SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom file
AmplabJenkins removed a comment on pull request #32531: URL: https://github.com/apache/spark/pull/32531#issuecomment-840312131 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43011/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke
sunchao commented on a change in pull request #32527: URL: https://github.com/apache/spark/pull/32527#discussion_r631576884 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ## @@ -127,13 +128,18 @@ trait InvokeLike extends Expression with NonSQLExpression { arguments: Seq[Expression], input: InternalRow, dataType: DataType): Any = { -val args = arguments.map(e => e.eval(input).asInstanceOf[Object]) -if (needNullCheck && args.exists(_ == null)) { +var i = 0 +val len = arguments.length +while (i < len) { + evaluatedArgs(i) = arguments(i).eval(input).asInstanceOf[Object] + i += 1 +} +if (needNullCheck && evaluatedArgs.contains(null)) { // return null if one of arguments is null null } else { val ret = try { -method.invoke(obj, args: _*) +method.invoke(obj, evaluatedArgs: _*) } catch { Review comment: I'm not sure if we can do the similar thing in `Invoke.eval` though since `obj` in `obj.getClass.getMethod(functionName, argClasses: _*)` is different for each call. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page
HyukjinKwon commented on pull request #32204: URL: https://github.com/apache/spark/pull/32204#issuecomment-840312271 @itholic: 1. Please check the option **one by one** and see if each exists. 2. Document general options in https://spark.apache.org/docs/latest/sql-data-sources-generic-options.html if there are missing ones 3. If you're going to do this separately in a separate JIRA, don't remove general options in API documentations for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32531: [SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom file
AmplabJenkins commented on pull request #32531: URL: https://github.com/apache/spark/pull/32531#issuecomment-840312131 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43011/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32531: [SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom file
SparkQA commented on pull request #32531: URL: https://github.com/apache/spark/pull/32531#issuecomment-840312101 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page
HyukjinKwon commented on a change in pull request #32204: URL: https://github.com/apache/spark/pull/32204#discussion_r631576139 ## File path: python/pyspark/sql/streaming.py ## @@ -504,105 +504,15 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None, path : str string represents path to the JSON dataset, or RDD of Strings storing JSON objects. -schema : :class:`pyspark.sql.types.StructType` or str, optional Review comment: I don't think this is a general option -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page
HyukjinKwon commented on a change in pull request #32204: URL: https://github.com/apache/spark/pull/32204#discussion_r631575888 ## File path: python/pyspark/sql/readwriter.py ## @@ -1196,39 +1097,13 @@ def json(self, path, mode=None, compression=None, dateFormat=None, timestampForm -- path : str the path in any Hadoop supported file system -mode : str, optional Review comment: mode is a general option -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
AmplabJenkins removed a comment on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-840292938 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138477/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.
SparkQA commented on pull request #32161: URL: https://github.com/apache/spark/pull/32161#issuecomment-840310729 **[Test build #138497 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138497/testReport)** for PR 32161 at commit [`bb5cd45`](https://github.com/apache/spark/commit/bb5cd4529b07b05b21cdaf878b06b61ad717be79). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32410: [SPARK-35286][SQL] Replace SessionState.start with SessionState.setCurrentSessionState
SparkQA commented on pull request #32410: URL: https://github.com/apache/spark/pull/32410#issuecomment-840310594 **[Test build #138496 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138496/testReport)** for PR 32410 at commit [`4bca8ec`](https://github.com/apache/spark/commit/4bca8ecaec066ef19d04a12e134ba830320a2e0f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
SparkQA commented on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-840310493 **[Test build #138495 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138495/testReport)** for PR 32494 at commit [`1573522`](https://github.com/apache/spark/commit/1573522541ceaf1e0b6e0eccb108b88f0fb1a4c6). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
SparkQA commented on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-840310425 **[Test build #138494 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138494/testReport)** for PR 32498 at commit [`b7a6cc7`](https://github.com/apache/spark/commit/b7a6cc71110fe8de45e8c74d487ebd23b7942f34). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
SparkQA commented on pull request #32515: URL: https://github.com/apache/spark/pull/32515#issuecomment-840310366 **[Test build #138493 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138493/testReport)** for PR 32515 at commit [`b8b54ea`](https://github.com/apache/spark/commit/b8b54ea9cb3bdbb8f50bdb260567dedd2af9fe1b). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #32161: [SPARK-35025][SQL][PYTHON][DOCS] Move Parquet data source options from Python and Scala into a single page.
HyukjinKwon commented on a change in pull request #32161: URL: https://github.com/apache/spark/pull/32161#discussion_r631575367 ## File path: sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala ## @@ -812,46 +812,10 @@ class DataFrameReader private[sql](sparkSession: SparkSession) extends Logging { /** * Loads a Parquet file, returning the result as a `DataFrame`. * - * You can set the following Parquet-specific option(s) for reading Parquet files: - * - * `mergeSchema` (default is the value specified in `spark.sql.parquet.mergeSchema`): sets - * whether we should merge schemas collected from all Parquet part-files. This will override - * `spark.sql.parquet.mergeSchema`. - * `pathGlobFilter`: an optional glob pattern to only include files with paths matching - * the pattern. The syntax follows org.apache.hadoop.fs.GlobFilter. - * It does not change the behavior of partition discovery. - * `modifiedBefore` (batch only): an optional timestamp to only include files with - * modification times occurring before the specified Time. The provided timestamp - * must be in the following form: -MM-DDTHH:mm:ss (e.g. 2020-06-01T13:00:00) - * `modifiedAfter` (batch only): an optional timestamp to only include files with - * modification times occurring after the specified Time. The provided timestamp - * must be in the following form: -MM-DDTHH:mm:ss (e.g. 2020-06-01T13:00:00) - * `recursiveFileLookup`: recursively scan a directory for files. Using this option - * disables partition discovery - * `datetimeRebaseMode` (default is the value specified in the SQL config - * `spark.sql.parquet.datetimeRebaseModeInRead`): the rebasing mode for the values - * of the `DATE`, `TIMESTAMP_MICROS`, `TIMESTAMP_MILLIS` logical types from the Julian to - * Proleptic Gregorian calendar: - * - * `EXCEPTION` : Spark fails in reads of ancient dates/timestamps that are ambiguous - * between the two calendars - * `CORRECTED` : loading of dates/timestamps without rebasing - * `LEGACY` : perform rebasing of ancient dates/timestamps from the Julian to Proleptic - * Gregorian calendar - * - * - * `int96RebaseMode` (default is the value specified in the SQL config - * `spark.sql.parquet.int96RebaseModeInRead`): the rebasing mode for `INT96` timestamps - * from the Julian to Proleptic Gregorian calendar: - * - * `EXCEPTION` : Spark fails in reads of ancient `INT96` timestamps that are ambiguous - * between the two calendars - * `CORRECTED` : loading of timestamps without rebasing - * `LEGACY` : perform rebasing of ancient `INT96` timestamps from the Julian to Proleptic - * Gregorian calendar - * - * - * + * Parquet-specific option(s) for reading Parquet files can be found in + * https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#data-source-option;> + * Data Source Option in the version you use. Review comment: can you add the general options here too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
AmplabJenkins removed a comment on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-840309736 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138488/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32520: [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests
AmplabJenkins removed a comment on pull request #32520: URL: https://github.com/apache/spark/pull/32520#issuecomment-840309734 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138479/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE
AmplabJenkins removed a comment on pull request #32292: URL: https://github.com/apache/spark/pull/32292#issuecomment-840309741 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43010/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
AmplabJenkins removed a comment on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-840309740 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138478/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
AmplabJenkins removed a comment on pull request #32515: URL: https://github.com/apache/spark/pull/32515#issuecomment-840309738 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43009/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
AmplabJenkins commented on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-840309740 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138478/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE
AmplabJenkins commented on pull request #32292: URL: https://github.com/apache/spark/pull/32292#issuecomment-840309741 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43010/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
AmplabJenkins commented on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-840309736 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138488/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32520: [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests
AmplabJenkins commented on pull request #32520: URL: https://github.com/apache/spark/pull/32520#issuecomment-840309734 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138479/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
AmplabJenkins commented on pull request #32515: URL: https://github.com/apache/spark/pull/32515#issuecomment-840309738 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43009/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on a change in pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
shahidki31 commented on a change in pull request #32494: URL: https://github.com/apache/spark/pull/32494#discussion_r631574179 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/UnionEstimation.scala ## @@ -111,6 +111,44 @@ object UnionEstimation { AttributeMap.empty[ColumnStat] } +val attrToComputeNullCount = union.children.map(_.output).transpose.zipWithIndex.filter { + case (attrs, _) => attrs.zipWithIndex.forall { +case (attr, childIndex) => + val attrStats = union.children(childIndex).stats.attributeStats + attrStats.get(attr).isDefined && attrStats(attr).nullCount.isDefined + } +} + +val newAttrStats = if (attrToComputeNullCount.nonEmpty) { + val outputAttrStats = new ArrayBuffer[(Attribute, ColumnStat)]() + attrToComputeNullCount.foreach { +case (attrs, outputIndex) => + val colWithNullStatValues = attrs.zipWithIndex.foldLeft[Option[BigInt]](None) { +case (totalNullCount, (attr, childIndex)) => + val colStat = union.children(childIndex).stats.attributeStats(attr) + if (totalNullCount.isDefined) { Review comment: Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
SparkQA commented on pull request #32515: URL: https://github.com/apache/spark/pull/32515#issuecomment-840308059 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43009/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
SparkQA commented on pull request #32515: URL: https://github.com/apache/spark/pull/32515#issuecomment-840305304 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43009/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
HyukjinKwon commented on pull request #32515: URL: https://github.com/apache/spark/pull/32515#issuecomment-840303599 Looks okay to me too -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE
SparkQA commented on pull request #32292: URL: https://github.com/apache/spark/pull/32292#issuecomment-840303409 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
shahidki31 commented on a change in pull request #32498: URL: https://github.com/apache/spark/pull/32498#discussion_r631566208 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala ## @@ -283,14 +326,17 @@ class BasicStatsEstimationSuite extends PlanTest with StatsEstimationTestBase { private def checkStats( plan: LogicalPlan, expectedStatsCboOn: Statistics, - expectedStatsCboOff: Statistics): Unit = { -withSQLConf(SQLConf.CBO_ENABLED.key -> "true") { + expectedStatsCboOff: Statistics, + extraConfigs: Map[String, String] = Map.empty): Unit = { + Review comment: Yes, removed the extra line -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sunchao commented on a change in pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke
sunchao commented on a change in pull request #32527: URL: https://github.com/apache/spark/pull/32527#discussion_r631565642 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ## @@ -127,13 +128,18 @@ trait InvokeLike extends Expression with NonSQLExpression { arguments: Seq[Expression], input: InternalRow, dataType: DataType): Any = { -val args = arguments.map(e => e.eval(input).asInstanceOf[Object]) -if (needNullCheck && args.exists(_ == null)) { +var i = 0 +val len = arguments.length +while (i < len) { + evaluatedArgs(i) = arguments(i).eval(input).asInstanceOf[Object] + i += 1 +} +if (needNullCheck && evaluatedArgs.contains(null)) { // return null if one of arguments is null null } else { val ret = try { -method.invoke(obj, args: _*) +method.invoke(obj, evaluatedArgs: _*) } catch { Review comment: Yea let me try it. In the profiling after this PR, `HashMap.get` takes 7.82% from the entire `invoke` call so it seems worthwhile to do this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32520: [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests
SparkQA removed a comment on pull request #32520: URL: https://github.com/apache/spark/pull/32520#issuecomment-840197479 **[Test build #138479 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138479/testReport)** for PR 32520 at commit [`299abb5`](https://github.com/apache/spark/commit/299abb537bf715506d77079b65a4704a04a2829f). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32520: [SPARK-35385][SQL][TESTS] Skip duplicate queries in the TPCDS-related tests
SparkQA commented on pull request #32520: URL: https://github.com/apache/spark/pull/32520#issuecomment-840300886 **[Test build #138479 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138479/testReport)** for PR 32520 at commit [`299abb5`](https://github.com/apache/spark/commit/299abb537bf715506d77079b65a4704a04a2829f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
shahidki31 commented on a change in pull request #32498: URL: https://github.com/apache/spark/pull/32498#discussion_r631565143 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala ## @@ -283,14 +326,17 @@ class BasicStatsEstimationSuite extends PlanTest with StatsEstimationTestBase { private def checkStats( plan: LogicalPlan, expectedStatsCboOn: Statistics, - expectedStatsCboOff: Statistics): Unit = { -withSQLConf(SQLConf.CBO_ENABLED.key -> "true") { + expectedStatsCboOff: Statistics, + extraConfigs: Map[String, String] = Map.empty): Unit = { + Review comment: I am not sure I understand you here. Do we need to directly put the histogram configs inside this method? By default histogram is disabled and number of bins default value is 254. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
shahidki31 commented on a change in pull request #32498: URL: https://github.com/apache/spark/pull/32498#discussion_r631564790 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala ## @@ -77,12 +92,21 @@ class BasicStatsEstimationSuite extends PlanTest with StatsEstimationTestBase { max = Some(4), nullCount = Some(0), maxLen = Some(LongType.defaultSize), -avgLen = Some(LongType.defaultSize)) -checkStats(range, expectedStatsCboOn = rangeStats, expectedStatsCboOff = rangeStats) +avgLen = Some(LongType.defaultSize), +histogram = histogram) +val extraConfig = Map(SQLConf.HISTOGRAM_ENABLED.key -> "true", + SQLConf.HISTOGRAM_NUM_BINS.key -> "3") +checkStats(range, expectedStatsCboOn = rangeStats, + expectedStatsCboOff = rangeStats, extraConfig) } test("range with negative step") { val range = Range(-10, -20, -2, None) +val histogramBins = new Array[HistogramBin](3) +histogramBins(0) = HistogramBin(-18.0, -16.0, 2) +histogramBins(1) = HistogramBin(-16.0, -12.0, 2) +histogramBins(2) = HistogramBin(-12.0, -10.0, 1) Review comment: Added assert to check if `range.numElements` and `ndv` are same ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala ## @@ -97,12 +121,24 @@ class BasicStatsEstimationSuite extends PlanTest with StatsEstimationTestBase { max = Some(-10), nullCount = Some(0), maxLen = Some(LongType.defaultSize), -avgLen = Some(LongType.defaultSize)) -checkStats(range, expectedStatsCboOn = rangeStats, expectedStatsCboOff = rangeStats) +avgLen = Some(LongType.defaultSize), +histogram = histogram) +val extraConfig = Map(SQLConf.HISTOGRAM_ENABLED.key -> "true", + SQLConf.HISTOGRAM_NUM_BINS.key -> "3") +checkStats(range, expectedStatsCboOn = rangeStats, + expectedStatsCboOff = rangeStats, extraConfig) } test("range with negative step where end minus start not divisible by step") { + val range = Range(-10, -20, -3, None) + +val histogramBins = new Array[HistogramBin](3) +histogramBins(0) = HistogramBin(-19.0, -16.0, 2) +histogramBins(1) = HistogramBin(-16.0, -13.0, 1) +histogramBins(2) = HistogramBin(-13.0, -10.0, 1) Review comment: Updated -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
shahidki31 commented on a change in pull request #32498: URL: https://github.com/apache/spark/pull/32498#discussion_r631564612 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ## @@ -789,6 +797,38 @@ case class Range( } } + private def computeHistogramStatistics() = { +val numBins = conf.histogramNumBins +val height = numElements.toDouble / numBins +val percentileArray = (0 to numBins).map(i => i * height).toArray + +val binArray = new Array[HistogramBin](numBins) +var lowerIndex = percentileArray.head +var lowerBinValue = getRangeValue(0) +percentileArray.tail.zipWithIndex.foreach { case (upperIndex, binId) => + // Integer index for upper and lower values in the bin. + val upperIndexPos = math.ceil(upperIndex).toInt - 1 + val lowerIndexPos = math.ceil(lowerIndex).toInt - 1 + + val upperBinValue = getRangeValue(math.max(upperIndexPos, 0)) + val ndv = math.max(upperIndexPos - lowerIndexPos, 1) + binArray(binId) = HistogramBin(lowerBinValue, upperBinValue, ndv) + + lowerBinValue = upperBinValue + lowerIndex = upperIndex +} +Histogram(height, binArray) + } + + // Utility method to compute histogram + private def getRangeValue(index: Int): Long = { Review comment: Done ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala ## @@ -97,12 +121,24 @@ class BasicStatsEstimationSuite extends PlanTest with StatsEstimationTestBase { max = Some(-10), nullCount = Some(0), maxLen = Some(LongType.defaultSize), -avgLen = Some(LongType.defaultSize)) -checkStats(range, expectedStatsCboOn = rangeStats, expectedStatsCboOff = rangeStats) +avgLen = Some(LongType.defaultSize), +histogram = histogram) +val extraConfig = Map(SQLConf.HISTOGRAM_ENABLED.key -> "true", + SQLConf.HISTOGRAM_NUM_BINS.key -> "3") +checkStats(range, expectedStatsCboOn = rangeStats, + expectedStatsCboOff = rangeStats, extraConfig) } test("range with negative step where end minus start not divisible by step") { + Review comment: Done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shahidki31 commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
shahidki31 commented on a change in pull request #32498: URL: https://github.com/apache/spark/pull/32498#discussion_r631564557 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ## @@ -789,6 +797,38 @@ case class Range( } } + private def computeHistogramStatistics() = { +val numBins = conf.histogramNumBins +val height = numElements.toDouble / numBins +val percentileArray = (0 to numBins).map(i => i * height).toArray + +val binArray = new Array[HistogramBin](numBins) +var lowerIndex = percentileArray.head +var lowerBinValue = getRangeValue(0) Review comment: Yes, updated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
SparkQA removed a comment on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-840286547 **[Test build #138488 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138488/testReport)** for PR 32516 at commit [`702629c`](https://github.com/apache/spark/commit/702629ccead13baba006eab8a6340b49722bf60a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
SparkQA commented on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-840298542 **[Test build #138488 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138488/testReport)** for PR 32516 at commit [`702629c`](https://github.com/apache/spark/commit/702629ccead13baba006eab8a6340b49722bf60a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke
cloud-fan commented on a change in pull request #32527: URL: https://github.com/apache/spark/pull/32527#discussion_r631561074 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ## @@ -127,13 +128,18 @@ trait InvokeLike extends Expression with NonSQLExpression { arguments: Seq[Expression], input: InternalRow, dataType: DataType): Any = { -val args = arguments.map(e => e.eval(input).asInstanceOf[Object]) -if (needNullCheck && args.exists(_ == null)) { +var i = 0 +val len = arguments.length +while (i < len) { + evaluatedArgs(i) = arguments(i).eval(input).asInstanceOf[Object] + i += 1 +} +if (needNullCheck && evaluatedArgs.contains(null)) { // return null if one of arguments is null null } else { val ret = try { -method.invoke(obj, args: _*) +method.invoke(obj, evaluatedArgs: _*) } catch { Review comment: We can do the similar thing in `Invoke.eval` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke
cloud-fan commented on a change in pull request #32527: URL: https://github.com/apache/spark/pull/32527#discussion_r631560800 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala ## @@ -127,13 +128,18 @@ trait InvokeLike extends Expression with NonSQLExpression { arguments: Seq[Expression], input: InternalRow, dataType: DataType): Any = { -val args = arguments.map(e => e.eval(input).asInstanceOf[Object]) -if (needNullCheck && args.exists(_ == null)) { +var i = 0 +val len = arguments.length +while (i < len) { + evaluatedArgs(i) = arguments(i).eval(input).asInstanceOf[Object] + i += 1 +} +if (needNullCheck && evaluatedArgs.contains(null)) { // return null if one of arguments is null null } else { val ret = try { -method.invoke(obj, args: _*) +method.invoke(obj, evaluatedArgs: _*) } catch { Review comment: Can we also improve the last piece? ``` val boxedClass = ScalaReflection.typeBoxedJavaMapping.get(dataType) if (boxedClass.isDefined) { boxedClass.get.cast(ret) } else { ret } ``` We can create a function for it ``` private lazy val boxing: Any => Any = ScalaReflection.typeBoxedJavaMapping.get(dataType).map(_.cast(_)).getOrElse(identity) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
SparkQA removed a comment on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-840190295 **[Test build #138478 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138478/testReport)** for PR 32494 at commit [`c929124`](https://github.com/apache/spark/commit/c929124f5ce2045da43314941d513b57ce9d553a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
SparkQA commented on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-840293326 **[Test build #138478 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138478/testReport)** for PR 32494 at commit [`c929124`](https://github.com/apache/spark/commit/c929124f5ce2045da43314941d513b57ce9d553a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
AmplabJenkins commented on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-840292938 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138477/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
SparkQA removed a comment on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-840190243 **[Test build #138477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138477/testReport)** for PR 32498 at commit [`0bb49b3`](https://github.com/apache/spark/commit/0bb49b3a15b4bf2c59916cce91d5aba285812079). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
maropu commented on a change in pull request #32494: URL: https://github.com/apache/spark/pull/32494#discussion_r631558692 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/UnionEstimation.scala ## @@ -111,6 +111,44 @@ object UnionEstimation { AttributeMap.empty[ColumnStat] } +val attrToComputeNullCount = union.children.map(_.output).transpose.zipWithIndex.filter { + case (attrs, _) => attrs.zipWithIndex.forall { +case (attr, childIndex) => + val attrStats = union.children(childIndex).stats.attributeStats + attrStats.get(attr).isDefined && attrStats(attr).nullCount.isDefined + } +} + +val newAttrStats = if (attrToComputeNullCount.nonEmpty) { + val outputAttrStats = new ArrayBuffer[(Attribute, ColumnStat)]() + attrToComputeNullCount.foreach { +case (attrs, outputIndex) => + val colWithNullStatValues = attrs.zipWithIndex.foldLeft[Option[BigInt]](None) { +case (totalNullCount, (attr, childIndex)) => + val colStat = union.children(childIndex).stats.attributeStats(attr) + if (totalNullCount.isDefined) { Review comment: If the first element can be null only, could we remove this if? ``` val firstStat = union.children.head.stats.attributeStats(attrs.head) val firstNullCount = firstStat.nullCount.get attrs.zipWithIndex.tail.foldLeft(firstNullCount) {...} ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
SparkQA commented on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-840292283 **[Test build #138477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138477/testReport)** for PR 32498 at commit [`0bb49b3`](https://github.com/apache/spark/commit/0bb49b3a15b4bf2c59916cce91d5aba285812079). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32531: [SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom file
dongjoon-hyun commented on pull request #32531: URL: https://github.com/apache/spark/pull/32531#issuecomment-840291144 Could you review this, @attilapiros ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page
SparkQA commented on pull request #32204: URL: https://github.com/apache/spark/pull/32204#issuecomment-840291088 **[Test build #138492 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138492/testReport)** for PR 32204 at commit [`a386788`](https://github.com/apache/spark/commit/a386788b44fb5255d2784ce423e3f879ba97f53c). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32531: [SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom file
SparkQA commented on pull request #32531: URL: https://github.com/apache/spark/pull/32531#issuecomment-840290823 **[Test build #138491 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138491/testReport)** for PR 32531 at commit [`c6ce0b7`](https://github.com/apache/spark/commit/c6ce0b720c114b962e73af1a989eb113df3ec70a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request #32531: [SPARK-35394][K8S][BUILD] Move kubernetes-client.version to root pom file
dongjoon-hyun opened a new pull request #32531: URL: https://github.com/apache/spark/pull/32531 … ### What changes were proposed in this pull request? ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? ### How was this patch tested? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a change in pull request #32204: [SPARK-34494][SQL][DOCS] Move JSON data source options from Python and Scala into a single page
itholic commented on a change in pull request #32204: URL: https://github.com/apache/spark/pull/32204#discussion_r631553255 ## File path: python/pyspark/sql/streaming.py ## @@ -504,105 +504,13 @@ def json(self, path, schema=None, primitivesAsString=None, prefersDecimal=None, path : str string represents path to the JSON dataset, or RDD of Strings storing JSON objects. -schema : :class:`pyspark.sql.types.StructType` or str, optional Review comment: I added it to Data Source Options table! https://user-images.githubusercontent.com/44108233/118077601-62bc5f00-b3ef-11eb-9350-c62b370e167c.png;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
maropu commented on a change in pull request #32498: URL: https://github.com/apache/spark/pull/32498#discussion_r631552581 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala ## @@ -77,12 +92,21 @@ class BasicStatsEstimationSuite extends PlanTest with StatsEstimationTestBase { max = Some(4), nullCount = Some(0), maxLen = Some(LongType.defaultSize), -avgLen = Some(LongType.defaultSize)) -checkStats(range, expectedStatsCboOn = rangeStats, expectedStatsCboOff = rangeStats) +avgLen = Some(LongType.defaultSize), +histogram = histogram) +val extraConfig = Map(SQLConf.HISTOGRAM_ENABLED.key -> "true", + SQLConf.HISTOGRAM_NUM_BINS.key -> "3") +checkStats(range, expectedStatsCboOn = rangeStats, + expectedStatsCboOff = rangeStats, extraConfig) } test("range with negative step") { val range = Range(-10, -20, -2, None) +val histogramBins = new Array[HistogramBin](3) +histogramBins(0) = HistogramBin(-18.0, -16.0, 2) +histogramBins(1) = HistogramBin(-16.0, -12.0, 2) +histogramBins(2) = HistogramBin(-12.0, -10.0, 1) Review comment: Could you check if `range.numElements` and the total `ndv` are the same? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32199: [SPARK-35100][ML] Refactor AFT - support virtual centering
AmplabJenkins removed a comment on pull request #32199: URL: https://github.com/apache/spark/pull/32199#issuecomment-840287170 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138487/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32199: [SPARK-35100][ML] Refactor AFT - support virtual centering
AmplabJenkins commented on pull request #32199: URL: https://github.com/apache/spark/pull/32199#issuecomment-840287170 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138487/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on pull request #32199: [SPARK-35100][ML] Refactor AFT - support virtual centering
SparkQA removed a comment on pull request #32199: URL: https://github.com/apache/spark/pull/32199#issuecomment-840264912 **[Test build #138487 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138487/testReport)** for PR 32199 at commit [`6ac4590`](https://github.com/apache/spark/commit/6ac459047c8168f750fe483606c62fc85b7effec). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
maropu commented on a change in pull request #32498: URL: https://github.com/apache/spark/pull/32498#discussion_r631552121 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala ## @@ -283,14 +326,17 @@ class BasicStatsEstimationSuite extends PlanTest with StatsEstimationTestBase { private def checkStats( plan: LogicalPlan, expectedStatsCboOn: Statistics, - expectedStatsCboOff: Statistics): Unit = { -withSQLConf(SQLConf.CBO_ENABLED.key -> "true") { + expectedStatsCboOff: Statistics, + extraConfigs: Map[String, String] = Map.empty): Unit = { + Review comment: nit: remove this unnecessary change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32199: [SPARK-35100][ML] Refactor AFT - support virtual centering
SparkQA commented on pull request #32199: URL: https://github.com/apache/spark/pull/32199#issuecomment-840286890 **[Test build #138487 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138487/testReport)** for PR 32199 at commit [`6ac4590`](https://github.com/apache/spark/commit/6ac459047c8168f750fe483606c62fc85b7effec). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE
SparkQA commented on pull request #32292: URL: https://github.com/apache/spark/pull/32292#issuecomment-840286781 **[Test build #138490 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138490/testReport)** for PR 32292 at commit [`774bda1`](https://github.com/apache/spark/commit/774bda13487ab0823e20d0295c6e7108a5a62b83). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
maropu commented on a change in pull request #32498: URL: https://github.com/apache/spark/pull/32498#discussion_r631552022 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala ## @@ -97,12 +121,24 @@ class BasicStatsEstimationSuite extends PlanTest with StatsEstimationTestBase { max = Some(-10), nullCount = Some(0), maxLen = Some(LongType.defaultSize), -avgLen = Some(LongType.defaultSize)) -checkStats(range, expectedStatsCboOn = rangeStats, expectedStatsCboOff = rangeStats) +avgLen = Some(LongType.defaultSize), +histogram = histogram) +val extraConfig = Map(SQLConf.HISTOGRAM_ENABLED.key -> "true", + SQLConf.HISTOGRAM_NUM_BINS.key -> "3") +checkStats(range, expectedStatsCboOn = rangeStats, + expectedStatsCboOff = rangeStats, extraConfig) } test("range with negative step where end minus start not divisible by step") { + val range = Range(-10, -20, -3, None) + +val histogramBins = new Array[HistogramBin](3) +histogramBins(0) = HistogramBin(-19.0, -16.0, 2) +histogramBins(1) = HistogramBin(-16.0, -13.0, 1) +histogramBins(2) = HistogramBin(-13.0, -10.0, 1) Review comment: ditto -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
maropu commented on a change in pull request #32498: URL: https://github.com/apache/spark/pull/32498#discussion_r631551951 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala ## @@ -77,12 +92,21 @@ class BasicStatsEstimationSuite extends PlanTest with StatsEstimationTestBase { max = Some(4), nullCount = Some(0), maxLen = Some(LongType.defaultSize), -avgLen = Some(LongType.defaultSize)) -checkStats(range, expectedStatsCboOn = rangeStats, expectedStatsCboOff = rangeStats) +avgLen = Some(LongType.defaultSize), +histogram = histogram) +val extraConfig = Map(SQLConf.HISTOGRAM_ENABLED.key -> "true", + SQLConf.HISTOGRAM_NUM_BINS.key -> "3") +checkStats(range, expectedStatsCboOn = rangeStats, + expectedStatsCboOff = rangeStats, extraConfig) } test("range with negative step") { val range = Range(-10, -20, -2, None) +val histogramBins = new Array[HistogramBin](3) +histogramBins(0) = HistogramBin(-18.0, -16.0, 2) +histogramBins(1) = HistogramBin(-16.0, -12.0, 2) +histogramBins(2) = HistogramBin(-12.0, -10.0, 1) Review comment: nit: ``` val histogramBins = Array( HistogramBin(-18.0, -16.0, 2), HistogramBin(-16.0, -12.0, 2), HistogramBin(-12.0, -10.0, 1)) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32515: [SPARK-35380][SQL] Loading SparkSessionExtensions from ServiceLoader
SparkQA commented on pull request #32515: URL: https://github.com/apache/spark/pull/32515#issuecomment-840286591 **[Test build #138489 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138489/testReport)** for PR 32515 at commit [`ad18acc`](https://github.com/apache/spark/commit/ad18acca9e991251fa44d33f53e8c4648fbcdbb7). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32516: [SPARK-35364][PYTHON] Renaming the existing Koalas related codes
SparkQA commented on pull request #32516: URL: https://github.com/apache/spark/pull/32516#issuecomment-840286547 **[Test build #138488 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/138488/testReport)** for PR 32516 at commit [`702629c`](https://github.com/apache/spark/commit/702629ccead13baba006eab8a6340b49722bf60a). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
AmplabJenkins removed a comment on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-840286024 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43005/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke
AmplabJenkins removed a comment on pull request #32527: URL: https://github.com/apache/spark/pull/32527#issuecomment-840286021 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138475/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
AmplabJenkins removed a comment on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-840286023 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43006/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32199: [SPARK-35100][ML] Refactor AFT - support virtual centering
AmplabJenkins removed a comment on pull request #32199: URL: https://github.com/apache/spark/pull/32199#issuecomment-840286022 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43007/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on pull request #32528: [SPARK-35350][SQL] Add code-gen for left semi sort merge join
AmplabJenkins removed a comment on pull request #32528: URL: https://github.com/apache/spark/pull/32528#issuecomment-840286026 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138476/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
AmplabJenkins commented on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-840286023 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43006/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke
AmplabJenkins commented on pull request #32527: URL: https://github.com/apache/spark/pull/32527#issuecomment-840286021 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138475/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32199: [SPARK-35100][ML] Refactor AFT - support virtual centering
AmplabJenkins commented on pull request #32199: URL: https://github.com/apache/spark/pull/32199#issuecomment-840286022 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43007/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32528: [SPARK-35350][SQL] Add code-gen for left semi sort merge join
AmplabJenkins commented on pull request #32528: URL: https://github.com/apache/spark/pull/32528#issuecomment-840286026 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/138476/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
AmplabJenkins commented on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-840286024 Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/43005/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
maropu commented on a change in pull request #32498: URL: https://github.com/apache/spark/pull/32498#discussion_r631551107 ## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/BasicStatsEstimationSuite.scala ## @@ -97,12 +121,24 @@ class BasicStatsEstimationSuite extends PlanTest with StatsEstimationTestBase { max = Some(-10), nullCount = Some(0), maxLen = Some(LongType.defaultSize), -avgLen = Some(LongType.defaultSize)) -checkStats(range, expectedStatsCboOn = rangeStats, expectedStatsCboOff = rangeStats) +avgLen = Some(LongType.defaultSize), +histogram = histogram) +val extraConfig = Map(SQLConf.HISTOGRAM_ENABLED.key -> "true", + SQLConf.HISTOGRAM_NUM_BINS.key -> "3") +checkStats(range, expectedStatsCboOn = rangeStats, + expectedStatsCboOff = rangeStats, extraConfig) } test("range with negative step where end minus start not divisible by step") { + Review comment: nit: please revert this change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vinodkc commented on a change in pull request #32411: [SPARK-28551][SQL] CTAS with LOCATION should not allow to a non-empty directory.
vinodkc commented on a change in pull request #32411: URL: https://github.com/apache/spark/pull/32411#discussion_r631550114 ## File path: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ## @@ -598,6 +598,38 @@ abstract class SQLQuerySuiteBase extends QueryTest with SQLTestUtils with TestHi } } + test("SPARK-28551: CTAS Hive Table should be with non-existent or empty location") { +def executeCTASWithNonEmptyLocation(tempLocation: String) { + sql(s"CREATE TABLE ctas1(id string) stored as rcfile LOCATION " + +s"'file:$tempLocation/ctas1'") + sql("INSERT INTO TABLE ctas1 SELECT 'A' ") + sql(s"CREATE TABLE ctas_with_existing_location stored as rcfile " + +s"LOCATION 'file:$tempLocation' " + +s"AS SELECT key k, value FROM src ORDER BY k, value") +} + +Seq("false", "true").foreach { convertCTASFlag => + Seq("false", "true").foreach { allowNonEmptyDirFlag => Review comment: withSQLConf accepts pairs of String (String, String), passing (String, Boolean) will fail to compile -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
SparkQA commented on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-840283366 Kubernetes integration test status failure URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43005/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32199: [SPARK-35100][ML] Refactor AFT - support virtual centering
SparkQA commented on pull request #32199: URL: https://github.com/apache/spark/pull/32199#issuecomment-840282546 Kubernetes integration test unable to build dist. exiting with code: 1 URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43007/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32494: [SPARK-35362][SQL] Update null count in the column stats for UNION operator stats estimation
SparkQA commented on pull request #32494: URL: https://github.com/apache/spark/pull/32494#issuecomment-840282473 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE
gengliangwang commented on pull request #32292: URL: https://github.com/apache/spark/pull/32292#issuecomment-840281783 > Just out of curiosity; Any reason to pick up try_add+try_divide instead of try_add+try_multiple? IMO, divide by 0 error is more common in ETL/ML jobs than integral multiply overflow. I also talked to @bart-samwel, from his experience on Bigquery, the most desired functions are `try_cast` and `try_divide`. We can add `TRY_SUBTRACT`/`TRY_MULTIPLY` if many users want it. Before that, let's be cautious in adding such new functions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
maropu commented on a change in pull request #32498: URL: https://github.com/apache/spark/pull/32498#discussion_r631548474 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ## @@ -789,6 +797,38 @@ case class Range( } } + private def computeHistogramStatistics() = { +val numBins = conf.histogramNumBins +val height = numElements.toDouble / numBins +val percentileArray = (0 to numBins).map(i => i * height).toArray + +val binArray = new Array[HistogramBin](numBins) +var lowerIndex = percentileArray.head +var lowerBinValue = getRangeValue(0) +percentileArray.tail.zipWithIndex.foreach { case (upperIndex, binId) => + // Integer index for upper and lower values in the bin. + val upperIndexPos = math.ceil(upperIndex).toInt - 1 + val lowerIndexPos = math.ceil(lowerIndex).toInt - 1 + + val upperBinValue = getRangeValue(math.max(upperIndexPos, 0)) + val ndv = math.max(upperIndexPos - lowerIndexPos, 1) + binArray(binId) = HistogramBin(lowerBinValue, upperBinValue, ndv) + + lowerBinValue = upperBinValue + lowerIndex = upperIndex +} +Histogram(height, binArray) + } + + // Utility method to compute histogram + private def getRangeValue(index: Int): Long = { Review comment: ``` private def getRangeValue(index: Int): Long = { assert(index >= 0, "index must be greater than and equal to 0") if (step == 0) { start + (numElements.toLong - index - 1) * step } else { start + index * step } } ``` ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
SparkQA commented on pull request #32498: URL: https://github.com/apache/spark/pull/32498#issuecomment-840281207 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/43005/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #32292: [SPARK-35162][SQL] New SQL functions: TRY_ADD/TRY_DIVIDE
gengliangwang commented on a change in pull request #32292: URL: https://github.com/apache/spark/pull/32292#discussion_r631546771 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala ## @@ -320,6 +320,8 @@ object FunctionRegistry { expression[Stack]("stack"), expression[CaseWhen]("when"), +expression[TryAdd]("try_add"), +expression[TryDivide]("try_divide"), Review comment: Done ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryEval.scala ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, CodeGenerator, ExprCode} +import org.apache.spark.sql.catalyst.expressions.codegen.Block._ +import org.apache.spark.sql.types.DataType + +private[catalyst] case class TryEval(child: Expression) Review comment: Done ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryEval.scala ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, CodeGenerator, ExprCode} +import org.apache.spark.sql.catalyst.expressions.codegen.Block._ +import org.apache.spark.sql.types.DataType + +private[catalyst] case class TryEval(child: Expression) +extends UnaryExpression with NullIntolerant { + override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = { +val childGen = child.genCode(ctx) +ev.copy(code = code""" + boolean ${ev.isNull} = true; + ${CodeGenerator.javaType(dataType)} ${ev.value} = ${CodeGenerator.defaultValue(dataType)}; + try { +${childGen.code} +${ev.isNull} = ${childGen.isNull}; +${ev.value} = ${childGen.value}; + } catch (Exception e) { + }""" +) + } + + override def eval(input: InternalRow): Any = +try { + child.eval(input) +} catch { + case _: Exception => +null +} + + override def dataType: DataType = child.dataType + + override def nullable: Boolean = true + + override protected def withNewChildInternal(newChild: Expression): Expression = +copy(child = newChild) +} + +@ExpressionDescription( + usage = "_FUNC_(expr1, expr2) - Returns `expr1`+`expr2` and the result is null on overflow.", + examples = """ +Examples: + > SELECT _FUNC_(1, 2); + 3 Review comment: Done ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TryEval.scala ## @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License
[GitHub] [spark] maropu commented on a change in pull request #32498: [SPARK-35368][SQL] Update histogram statistics for RANGE operator for stats estimation
maropu commented on a change in pull request #32498: URL: https://github.com/apache/spark/pull/32498#discussion_r631546439 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala ## @@ -789,6 +797,38 @@ case class Range( } } + private def computeHistogramStatistics() = { +val numBins = conf.histogramNumBins +val height = numElements.toDouble / numBins +val percentileArray = (0 to numBins).map(i => i * height).toArray + +val binArray = new Array[HistogramBin](numBins) +var lowerIndex = percentileArray.head +var lowerBinValue = getRangeValue(0) Review comment: It looks we can remove `var` by using `foldLeft` instead of `foreach`. Could you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke
dongjoon-hyun closed pull request #32527: URL: https://github.com/apache/spark/pull/32527 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on pull request #32527: [SPARK-35384][SQL] Improve performance for InvokeLike.invoke
dongjoon-hyun commented on pull request #32527: URL: https://github.com/apache/spark/pull/32527#issuecomment-840275942 Thank you, @sunchao and all! Merged to master for Apache Spark 3.2.0. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org