[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-566426374 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-566426374 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-566426376 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115425/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
AmplabJenkins commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-566426376 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115425/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
SparkQA commented on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-566425781 **[Test build #115425 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115425/testReport)** for PR 26918 at commit [`327967e`](https://github.com/apache/spark/commit/327967e22a105386bf43e0fac4e9fa74ec70bd4e). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID
SparkQA removed a comment on issue #26918: [SPARK-30279][SQL] Support 32 or more grouping attributes for GROUPING_ID URL: https://github.com/apache/spark/pull/26918#issuecomment-566366273 **[Test build #115425 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115425/testReport)** for PR 26918 at commit [`327967e`](https://github.com/apache/spark/commit/327967e22a105386bf43e0fac4e9fa74ec70bd4e). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] steven-aerts commented on a change in pull request #26907: [SPARK-30267][SQL] Avro arrays can be of any List
steven-aerts commented on a change in pull request #26907: [SPARK-30267][SQL] Avro arrays can be of any List URL: https://github.com/apache/spark/pull/26907#discussion_r358641861 ## File path: external/avro/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala ## @@ -167,7 +167,7 @@ class AvroDeserializer(rootAvroType: Schema, rootCatalystType: DataType) { case (ARRAY, ArrayType(elementType, containsNull)) => val elementWriter = newWriter(avroType.getElementType, elementType, path) (updater, ordinal, value) => - val array = value.asInstanceOf[GenericData.Array[Any]] + val array = value.asInstanceOf[java.util.List[Any]] Review comment: Added a test for an array containing structs which was not yet covered. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
zhengruifeng commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-566419509 Moreover, for existing impl, we can use `reduceByKey`/`aggregateByKey` with feature index as a key to generate a `RDD[QuantileSummaries]` and then compute and fetch the range/median instead of the large `Array[QuantileSummaries]`, as @srowen suggested. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first
AmplabJenkins removed a comment on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first URL: https://github.com/apache/spark/pull/26921#issuecomment-566419176 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20240/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first
AmplabJenkins removed a comment on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first URL: https://github.com/apache/spark/pull/26921#issuecomment-566419171 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first
AmplabJenkins commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first URL: https://github.com/apache/spark/pull/26921#issuecomment-566419171 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first
AmplabJenkins commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first URL: https://github.com/apache/spark/pull/26921#issuecomment-566419176 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20240/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first
SparkQA commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first URL: https://github.com/apache/spark/pull/26921#issuecomment-566418536 **[Test build #115437 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115437/testReport)** for PR 26921 at commit [`db311fd`](https://github.com/apache/spark/commit/db311fd3dc95ac79413ced64d94d3caee97e423b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType
cloud-fan commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType URL: https://github.com/apache/spark/pull/26811#discussion_r358633880 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ## @@ -850,7 +850,7 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession { val errorMsg1 = s""" |Input to function array_contains should have been array followed by a - |value with same element type, but it's [array, decimal(29,29)]. + |value with same element type, but it's [array, decimal(38,29)]. Review comment: Yea I get that we can't do cast here. My question is: since we can't do cast, we should leave the expression un-touched. But now we add cast to one side and leave the expression unresolved. Where do we add that useless cast? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType
cloud-fan commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType URL: https://github.com/apache/spark/pull/26811#discussion_r358633880 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ## @@ -850,7 +850,7 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession { val errorMsg1 = s""" |Input to function array_contains should have been array followed by a - |value with same element type, but it's [array, decimal(29,29)]. + |value with same element type, but it's [array, decimal(38,29)]. Review comment: Yea I get that we can't do cast here. My question is: since we can't do cast, we should leave the expression un-touched. But now we add cast to one side and leave the expression unresolved. Where do we add that uesless cast? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures
zhengruifeng commented on issue #26803: [SPARK-30178][ML] RobustScaler support large numFeatures URL: https://github.com/apache/spark/pull/26803#issuecomment-566414715 test code: ```scala import org.apache.spark.ml.linalg._ import org.apache.spark.ml.feature._ import org.apache.spark.storage.StorageLevel val rdd = sc.range(0, 1000, 1, 100) val df = rdd.map(i => Tuple1.apply(Vectors.dense((i % 1000).toDouble / 1000))).toDF("features") df.persist(StorageLevel.MEMORY_AND_DISK) df.count val scaler = new RobustScaler().setInputCol("features") val start = System.currentTimeMillis; Seq.range(0, 100).foreach{_ => val model = scaler.fit(df)}; val end = System.currentTimeMillis end - start ``` Master: 243493 This PR: 285341 I test an edge case with only numFeatures=1, and existing impl is about 17% faster then this PR. That is to say this PR will support medium/large (>1000) numFeatures at the cost of some performance regression on low-dim cases. Or we check the numFeatures at first, and decide which method to use? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first
AmplabJenkins removed a comment on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first URL: https://github.com/apache/spark/pull/26921#issuecomment-566413592 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20239/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first
AmplabJenkins removed a comment on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first URL: https://github.com/apache/spark/pull/26921#issuecomment-566413585 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first
AmplabJenkins commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first URL: https://github.com/apache/spark/pull/26921#issuecomment-566413585 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first
AmplabJenkins commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first URL: https://github.com/apache/spark/pull/26921#issuecomment-566413592 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20239/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first
SparkQA commented on issue #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first URL: https://github.com/apache/spark/pull/26921#issuecomment-566412969 **[Test build #115436 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115436/testReport)** for PR 26921 at commit [`5d8422d`](https://github.com/apache/spark/commit/5d8422d4dc09832c81117af6e7c12f67c4b04ad4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 opened a new pull request #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first
imback82 opened a new pull request #26921: [SPARK-30282][SQL] UnresolvedV2Relation should be resolved to temp view first URL: https://github.com/apache/spark/pull/26921 ### What changes were proposed in this pull request? This is a part of effort to make the relation lookup behavior consistent: [SPARK-2990](https://issues.apache.org/jira/browse/SPARK-29900). This PR specifically addresses the V2 commands whose logical plan contains `UnresolvedV2Relation` such that if `UnresolvedV2Relation` is resolved to a temp view, those commands should error out with a message that v2 command cannot handle temp views. ### Why are the changes needed? For the following v2 commands, `Analyzer.ResolveTables` does not check against the temp views before resolving `UnresolvedV2Relation`, thus it always resolves `UnresolvedV2Relation` to a table: ``` ALTER TABLE DESCRIBE TABLE SHOW TBLPROPERTIES ``` Thus, in the following example, `t` will be resolved to a table, not a temp view: ``` sql("CREATE TEMPORARY VIEW t AS SELECT 2 AS i") sql("CREATE TABLE testcat.ns.t USING csv AS SELECT 1 AS i") sql("USE testcat.ns") sql("DESCRIBE t") // 't' is resolved to a table ``` This behavior is inconsistent with other commands which look up temp views first. ### Does this PR introduce any user-facing change? Yes, now the above example will fail as follows: ``` sql("DESCRIBE t") // 't' is now resolved to a temp view org.apache.spark.sql.AnalysisException: A temp view 't' cannot be handled by V2 commands.; at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveTables$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveTables$$resolveV2Relation(Analyzer.scala:782) ``` ### How was this patch tested? Added new tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile
AmplabJenkins removed a comment on issue #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile URL: https://github.com/apache/spark/pull/26905#issuecomment-566411082 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile
AmplabJenkins removed a comment on issue #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile URL: https://github.com/apache/spark/pull/26905#issuecomment-566411091 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20238/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25452: [SPARK-28710][SQL]to fix replace function, spark should call drop and create function
AmplabJenkins removed a comment on issue #25452: [SPARK-28710][SQL]to fix replace function, spark should call drop and create function URL: https://github.com/apache/spark/pull/25452#issuecomment-566410838 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115427/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25452: [SPARK-28710][SQL]to fix replace function, spark should call drop and create function
AmplabJenkins removed a comment on issue #25452: [SPARK-28710][SQL]to fix replace function, spark should call drop and create function URL: https://github.com/apache/spark/pull/25452#issuecomment-566410827 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile
AmplabJenkins commented on issue #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile URL: https://github.com/apache/spark/pull/26905#issuecomment-566411082 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile
AmplabJenkins commented on issue #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile URL: https://github.com/apache/spark/pull/26905#issuecomment-566411091 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20238/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25452: [SPARK-28710][SQL]to fix replace function, spark should call drop and create function
AmplabJenkins commented on issue #25452: [SPARK-28710][SQL]to fix replace function, spark should call drop and create function URL: https://github.com/apache/spark/pull/25452#issuecomment-566410827 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25452: [SPARK-28710][SQL]to fix replace function, spark should call drop and create function
AmplabJenkins commented on issue #25452: [SPARK-28710][SQL]to fix replace function, spark should call drop and create function URL: https://github.com/apache/spark/pull/25452#issuecomment-566410838 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115427/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile
SparkQA commented on issue #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile URL: https://github.com/apache/spark/pull/26905#issuecomment-566410649 **[Test build #115435 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115435/testReport)** for PR 26905 at commit [`2bcea50`](https://github.com/apache/spark/commit/2bcea5066cd3d2db04f62980517ca6c14164bd7c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25452: [SPARK-28710][SQL]to fix replace function, spark should call drop and create function
SparkQA commented on issue #25452: [SPARK-28710][SQL]to fix replace function, spark should call drop and create function URL: https://github.com/apache/spark/pull/25452#issuecomment-566410409 **[Test build #115427 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115427/testReport)** for PR 25452 at commit [`95d1e23`](https://github.com/apache/spark/commit/95d1e2391b3c5d8daeb97c41e5eb9a13b73f3743). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25452: [SPARK-28710][SQL]to fix replace function, spark should call drop and create function
SparkQA removed a comment on issue #25452: [SPARK-28710][SQL]to fix replace function, spark should call drop and create function URL: https://github.com/apache/spark/pull/25452#issuecomment-566379151 **[Test build #115427 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115427/testReport)** for PR 25452 at commit [`95d1e23`](https://github.com/apache/spark/commit/95d1e2391b3c5d8daeb97c41e5eb9a13b73f3743). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wypoon commented on a change in pull request #26895: [SPARK-17398][SQL] Fix ClassCastException when querying partitioned JSON table
wypoon commented on a change in pull request #26895: [SPARK-17398][SQL] Fix ClassCastException when querying partitioned JSON table URL: https://github.com/apache/spark/pull/26895#discussion_r358628516 ## File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ## @@ -252,10 +254,14 @@ class HadoopTableReader( partProps.asScala.foreach { case (key, value) => props.setProperty(key, value) } -deserializer.initialize(hconf, props) +DeserializerLock.synchronized { + deserializer.initialize(hconf, props) +} // get the table deserializer val tableSerDe = localTableDesc.getDeserializerClass.getConstructor().newInstance() -tableSerDe.initialize(hconf, localTableDesc.getProperties) +DeserializerLock.synchronized { + tableSerDe.initialize(hconf, tableProperties) Review comment: Yes, I did find that to be the case in my repro, that the two were the same class (JsonSerDe). However, the initialize calls on deserializer and on tableSerDe are with potentially different properties (props and tableProperties could differ), so I think I should initialize tableSerDe even if it has the same class as deserializer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres
yaooqinn commented on issue #26412: [SPARK-29774][SQL] Date and Timestamp type +/- null should be null as Postgres URL: https://github.com/apache/spark/pull/26412#issuecomment-566410341 Hi @gengliangwang, thanks for your suggestion, I have updated the description. Can you check whether it is clear enough. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource
AmplabJenkins removed a comment on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource URL: https://github.com/apache/spark/pull/26920#issuecomment-566408882 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource
AmplabJenkins removed a comment on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource URL: https://github.com/apache/spark/pull/26920#issuecomment-566408889 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20237/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource
AmplabJenkins commented on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource URL: https://github.com/apache/spark/pull/26920#issuecomment-566408889 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20237/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource
AmplabJenkins commented on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource URL: https://github.com/apache/spark/pull/26920#issuecomment-566408882 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol
SparkQA commented on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol URL: https://github.com/apache/spark/pull/26869#issuecomment-566408400 **[Test build #115434 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115434/testReport)** for PR 26869 at commit [`6258ccd`](https://github.com/apache/spark/commit/6258ccd153fec744968b6a65f6f890dbfe013ad2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource
SparkQA commented on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource URL: https://github.com/apache/spark/pull/26920#issuecomment-566408383 **[Test build #115433 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115433/testReport)** for PR 26920 at commit [`e779edc`](https://github.com/apache/spark/commit/e779edcd3edec3608456d41caca98f5f7e5884a7). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#discussion_r358626209 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/ReduceNumShufflePartitionsSuite.scala ## @@ -132,7 +133,8 @@ class ReduceNumShufflePartitionsSuite extends SparkFunSuite with BeforeAndAfterA Array( new MapOutputStatistics(0, bytesByPartitionId1), new MapOutputStatistics(1, bytesByPartitionId2)) - intercept[AssertionError](rule.estimatePartitionStartIndices(mapOutputStatistics)) + intercept[AssertionError](rule.estimatePartitionStartAndEndIndices( + mapOutputStatistics, (0 until bytesByPartitionId1.length).toSet)) Review comment: yes, we can delete it now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size
JkSelf commented on a change in pull request #26434: [SPARK-29544] [SQL] optimize skewed partition based on data size URL: https://github.com/apache/spark/pull/26434#discussion_r358626105 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/ReduceNumShufflePartitionsSuite.scala ## @@ -59,8 +59,9 @@ class ReduceNumShufflePartitionsSuite extends SparkFunSuite with BeforeAndAfterA case (bytesByPartitionId, index) => new MapOutputStatistics(index, bytesByPartitionId) } +val length = mapOutputStatistics.map(_.bytesByPartitionId.length).head val estimatedPartitionStartIndices = - rule.estimatePartitionStartIndices(mapOutputStatistics) + rule.estimatePartitionStartAndEndIndices(mapOutputStatistics).unzip._1 Review comment: Do you mean we need check there is excluded partitions? Current the check is already no excluded partitions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol
AmplabJenkins removed a comment on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol URL: https://github.com/apache/spark/pull/26869#issuecomment-566406639 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource
AmplabJenkins removed a comment on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource URL: https://github.com/apache/spark/pull/26920#issuecomment-566406677 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol
AmplabJenkins commented on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol URL: https://github.com/apache/spark/pull/26869#issuecomment-566406649 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20236/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol
AmplabJenkins removed a comment on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol URL: https://github.com/apache/spark/pull/26869#issuecomment-566406649 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20236/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource
AmplabJenkins commented on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource URL: https://github.com/apache/spark/pull/26920#issuecomment-566406685 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20235/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource
AmplabJenkins removed a comment on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource URL: https://github.com/apache/spark/pull/26920#issuecomment-566406685 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20235/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource
AmplabJenkins commented on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource URL: https://github.com/apache/spark/pull/26920#issuecomment-566406677 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol
AmplabJenkins commented on issue #26869: [SPARK-30235][CORE] Switching off host local disk reading of shuffle blocks in case of useOldFetchProtocol URL: https://github.com/apache/spark/pull/26869#issuecomment-566406639 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource
SparkQA commented on issue #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource URL: https://github.com/apache/spark/pull/26920#issuecomment-566406291 **[Test build #115432 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115432/testReport)** for PR 26920 at commit [`4b99e61`](https://github.com/apache/spark/commit/4b99e612e994073e995a337a94db123193af2b5f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on a change in pull request #22952: [SPARK-20568][SS] Provide option to clean up completed files in streaming query
HeartSaVioR commented on a change in pull request #22952: [SPARK-20568][SS] Provide option to clean up completed files in streaming query URL: https://github.com/apache/spark/pull/22952#discussion_r358624511 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala ## @@ -330,4 +341,96 @@ object FileStreamSource { def size: Int = map.size() } + + private[sql] trait FileStreamSourceCleaner { +def clean(entry: FileEntry): Unit + } + + private[sql] object FileStreamSourceCleaner { +def apply( +fileSystem: FileSystem, +sourcePath: Path, +option: FileStreamOptions, +hadoopConf: Configuration): Option[FileStreamSourceCleaner] = option.cleanSource match { + case CleanSourceMode.ARCHIVE => +require(option.sourceArchiveDir.isDefined) +val path = new Path(option.sourceArchiveDir.get) +val archiveFs = path.getFileSystem(hadoopConf) +val qualifiedArchivePath = archiveFs.makeQualified(path) +Some(new SourceFileArchiver(fileSystem, sourcePath, archiveFs, qualifiedArchivePath)) + + case CleanSourceMode.DELETE => +Some(new SourceFileRemover(fileSystem)) + + case _ => None +} + } + + private[sql] class SourceFileArchiver( + fileSystem: FileSystem, + sourcePath: Path, + baseArchiveFileSystem: FileSystem, + baseArchivePath: Path) extends FileStreamSourceCleaner with Logging { +assertParameters() + +private def assertParameters(): Unit = { + require(fileSystem.getUri == baseArchiveFileSystem.getUri, "Base archive path is located " + +s"on a different file system than the source files. source path: $sourcePath" + +s" / base archive path: $baseArchivePath") + + /** + * FileStreamSource reads the files which one of below conditions is met: + * 1) file itself is matched with source path + * 2) parent directory is matched with source path Review comment: FYI, just filed https://issues.apache.org/jira/browse/SPARK-30281 and raised a patch with picking the option 2. #26845 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR opened a new pull request #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource
HeartSaVioR opened a new pull request #26920: [SPARK-30281][SS] Consider partitioned/recursive option while verifying archive path on FileStreamSource URL: https://github.com/apache/spark/pull/26920 ### What changes were proposed in this pull request? This patch renews the verification logic of archive path for FileStreamSource, as we found the logic doesn't take partitioned/recursive options into account. Before the patch, it only requires the archive path to have depth more than 2 (two subdirectories from root), leveraging the fact FileStreamSource normally reads the files where the parent directory matches the pattern or the file itself matches the pattern. Given 'archive' operation moves the files to the base archive path with retaining the full path, archive path is tend to be safe if the depth is more than 2, meaning FileStreamSource doesn't re-read archived files as new source files. WIth partitioned/recursive options, the fact is invalid, as FileStreamSource can read any files in any depth of subdirectories for source pattern. To deal with this correctly, we have to renew the verification logic, which may not intuitive and simple but works for all cases. The new verification logic prevents both cases: 1) archive path matches with source pattern as "prefix" (the depth of archive path > the depth of source pattern) e.g. * source pattern: `/hello*/spar?` * archive path: `/hello/spark/structured/streaming` Any files in archive path will match with source pattern when recursive option is enabled. 2) source pattern matches with archive path as "prefix" (the depth of source pattern > the depth of archive path) e.g. * source pattern: `/hello*/spar?/structured/hello2*` * archive path: `/hello/spark/structured` Some archive files will not match with source pattern, e.g. file path: `/hello/spark/structured/hello2`, then final archived path: `/hello/spark/structured/hello/spark/structured/hello2`. But some other archive files will still match with source pattern, e.g. file path: `/hello2/spark/structured/hello2`, then final archived path: `/hello/spark/structured/hello2/spark/structured/hello2` which matches with source pattern when recursive is enabled. Implicitly it also prevents archive path matches with source pattern as full match (same depth). We would want to prevent any source files to be archived and added to new source files again, so the patch takes most restrictive approach to prevent the possible cases. ### Why are the changes needed? Without this patch, there's a chance archived files are included as new source files when partitioned/recursive option is enabled, as current condition doesn't take these options into account. ### Does this PR introduce any user-facing change? Only for Spark 3.0.0-preview 1 - end users are required to provide archive path with ensuring a bit complicated conditions, instead of simply higher than 2 depths. ### How was this patch tested? New UT. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
AmplabJenkins removed a comment on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#issuecomment-566404516 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
AmplabJenkins removed a comment on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#issuecomment-566404525 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20234/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
AmplabJenkins commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#issuecomment-566404525 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20234/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
AmplabJenkins commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#issuecomment-566404516 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
beliefer commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#issuecomment-566404539 @cloud-fan @maropu @viirya @dongjoon-hyun In order to make review more convenient and easy, the current PR only supports situations where DISTINCT and FILTER do not occur at the same time. I create another ticket SPARK-30276 and will create another PR to support DISTINCT and FILTER occur at the same time. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression
SparkQA commented on issue #26656: [SPARK-27986][SQL] Support ANSI SQL filter clause for aggregate expression URL: https://github.com/apache/spark/pull/26656#issuecomment-566404107 **[Test build #115431 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115431/testReport)** for PR 26656 at commit [`b29ef0f`](https://github.com/apache/spark/commit/b29ef0fd901ccebafd176492a61f0b6d96e330ba). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on issue #26882: [SPARK-30247][PySpark] GaussianMixtureModel in py side should expose gaussian
zhengruifeng commented on issue #26882: [SPARK-30247][PySpark] GaussianMixtureModel in py side should expose gaussian URL: https://github.com/apache/spark/pull/26882#issuecomment-566404111 Sorry for the late reply. It seems we need to add a corresponding class `MultivariateGaussian` containing a vector and a matrix in the py side, otherwise the `gaussian` can not be used in the py side. ```python In [8]: model.gaussians Out[8]: [{'__class__': 'org.apache.spark.ml.stat.distribution.MultivariateGaussian'}, {'__class__': 'org.apache.spark.ml.stat.distribution.MultivariateGaussian'}, {'__class__': 'org.apache.spark.ml.stat.distribution.MultivariateGaussian'}] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache
yaooqinn commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache URL: https://github.com/apache/spark/pull/26906#issuecomment-566403842 thanks for merging and reviewing. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache
maropu commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache URL: https://github.com/apache/spark/pull/26906#issuecomment-566403582 Thanks! Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu closed pull request #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache
maropu closed pull request #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache URL: https://github.com/apache/spark/pull/26906 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache
AmplabJenkins commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache URL: https://github.com/apache/spark/pull/26906#issuecomment-566402107 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115420/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #26858: [SPARK-30120][ML] Use BoundedPriorityQueue for small dataset in LSH approxNearestNeighbors
zhengruifeng commented on a change in pull request #26858: [SPARK-30120][ML] Use BoundedPriorityQueue for small dataset in LSH approxNearestNeighbors URL: https://github.com/apache/spark/pull/26858#discussion_r358620968 ## File path: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ## @@ -138,21 +139,31 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]] // Limit the use of hashDist since it's controversial val hashDistUDF = udf((x: Seq[Vector]) => hashDistance(x, keyHash), DataTypes.DoubleType) val hashDistCol = hashDistUDF(col($(outputCol))) - - // Compute threshold to get around k elements. - // To guarantee to have enough neighbors in one pass, we need (p - err) * N >= M - // so we pick quantile p = M / N + err - // M: the number of nearest neighbors; N: the number of elements in dataset - val relativeError = 0.05 - val approxQuantile = numNearestNeighbors.toDouble / count + relativeError val modelDatasetWithDist = modelDataset.withColumn(distCol, hashDistCol) - if (approxQuantile >= 1) { -modelDatasetWithDist + // for a small dataset, use BoundedPriorityQueue + if (count < 1000) { +val queue = new BoundedPriorityQueue[Double](count.toInt)(Ordering[Double]) Review comment: A slight performance gain may come from that ` BoundedPriorityQueue` do not need a `count` job to compute the var `approxQuantile`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache
AmplabJenkins removed a comment on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache URL: https://github.com/apache/spark/pull/26906#issuecomment-566402102 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache
AmplabJenkins removed a comment on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache URL: https://github.com/apache/spark/pull/26906#issuecomment-566402107 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115420/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache
AmplabJenkins commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache URL: https://github.com/apache/spark/pull/26906#issuecomment-566402102 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache
SparkQA removed a comment on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache URL: https://github.com/apache/spark/pull/26906#issuecomment-566350550 **[Test build #115420 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115420/testReport)** for PR 26906 at commit [`11b7f71`](https://github.com/apache/spark/commit/11b7f718e53c17e9a7c2946bddcaf8b860562d31). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache
SparkQA commented on issue #26906: [SPARK-30066][SQL][FOLLOWUP] Remove size field for interval column cache URL: https://github.com/apache/spark/pull/26906#issuecomment-566401668 **[Test build #115420 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115420/testReport)** for PR 26906 at commit [`11b7f71`](https://github.com/apache/spark/commit/11b7f718e53c17e9a7c2946bddcaf8b860562d31). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #26858: [SPARK-30120][ML] Use BoundedPriorityQueue for small dataset in LSH approxNearestNeighbors
zhengruifeng commented on a change in pull request #26858: [SPARK-30120][ML] Use BoundedPriorityQueue for small dataset in LSH approxNearestNeighbors URL: https://github.com/apache/spark/pull/26858#discussion_r358620329 ## File path: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ## @@ -138,21 +139,31 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]] // Limit the use of hashDist since it's controversial val hashDistUDF = udf((x: Seq[Vector]) => hashDistance(x, keyHash), DataTypes.DoubleType) val hashDistCol = hashDistUDF(col($(outputCol))) - - // Compute threshold to get around k elements. - // To guarantee to have enough neighbors in one pass, we need (p - err) * N >= M - // so we pick quantile p = M / N + err - // M: the number of nearest neighbors; N: the number of elements in dataset - val relativeError = 0.05 - val approxQuantile = numNearestNeighbors.toDouble / count + relativeError val modelDatasetWithDist = modelDataset.withColumn(distCol, hashDistCol) - if (approxQuantile >= 1) { -modelDatasetWithDist + // for a small dataset, use BoundedPriorityQueue + if (count < 1000) { +val queue = new BoundedPriorityQueue[Double](count.toInt)(Ordering[Double]) Review comment: I wrongly thought that the `approxNearestNeighbors` only return an approximate threshold, then we can use top-k to obtain an exact threshold. Since the `approxNearestNeighbors` already gaurantee an enough threshold which had already taken the relative error into account, so **I guess we no longer need a top-k solution.** A `BoundedPriorityQueue` only maintains the topK entries, so it should be much smaller than a `QuantileSummaries`, however since there is only one column to process, so there should be no performance gain. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #26858: [SPARK-30120][ML] Use BoundedPriorityQueue for small dataset in LSH approxNearestNeighbors
zhengruifeng commented on a change in pull request #26858: [SPARK-30120][ML] Use BoundedPriorityQueue for small dataset in LSH approxNearestNeighbors URL: https://github.com/apache/spark/pull/26858#discussion_r358620083 ## File path: mllib/src/main/scala/org/apache/spark/ml/feature/LSH.scala ## @@ -138,21 +139,31 @@ private[ml] abstract class LSHModel[T <: LSHModel[T]] // Limit the use of hashDist since it's controversial val hashDistUDF = udf((x: Seq[Vector]) => hashDistance(x, keyHash), DataTypes.DoubleType) val hashDistCol = hashDistUDF(col($(outputCol))) - - // Compute threshold to get around k elements. - // To guarantee to have enough neighbors in one pass, we need (p - err) * N >= M - // so we pick quantile p = M / N + err - // M: the number of nearest neighbors; N: the number of elements in dataset - val relativeError = 0.05 - val approxQuantile = numNearestNeighbors.toDouble / count + relativeError val modelDatasetWithDist = modelDataset.withColumn(distCol, hashDistCol) - if (approxQuantile >= 1) { -modelDatasetWithDist + // for a small dataset, use BoundedPriorityQueue + if (count < 1000) { +val queue = new BoundedPriorityQueue[Double](count.toInt)(Ordering[Double]) Review comment: I wrongly thought that the `approxNearestNeighbors` only return an approximate threshold, then we can use top-k to obtain an exact threshold. Since the `approxNearestNeighbors` already gaurantee an enough threshold which had already taken the relative error into account, so **I guess we no longer need a top-k solution.** A `BoundedPriorityQueue` only maintains the topK entries, so it should be much smaller than a `QuantileSummaries`, however since there is only one column to process, so there should be no performance gain. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized
SparkQA commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized URL: https://github.com/apache/spark/pull/26696#issuecomment-566399899 **[Test build #115430 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115430/testReport)** for PR 26696 at commit [`168ab30`](https://github.com/apache/spark/commit/168ab306c9df7568a1075ef3314a47dbec26ed95). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] bmarcott commented on a change in pull request #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized
bmarcott commented on a change in pull request #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized URL: https://github.com/apache/spark/pull/26696#discussion_r358617814 ## File path: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ## @@ -403,6 +406,9 @@ private[spark] class TaskSchedulerImpl( if (!executorIdToRunningTaskIds.contains(o.executorId)) { hostToExecutors(o.host) += o.executorId executorAdded(o.executorId, o.host) +// Assumes the first offer will include all cores (free cores == all cores) Review comment: can anyone confirm whether it is true that the first resource offer for an executor will include all cores? even if this is true it feels odd to rely on it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType
amanomer commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType URL: https://github.com/apache/spark/pull/26811#discussion_r358618026 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ## @@ -850,7 +850,7 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession { val errorMsg1 = s""" |Input to function array_contains should have been array followed by a - |value with same element type, but it's [array, decimal(29,29)]. + |value with same element type, but it's [array, decimal(38,29)]. Review comment: cc @maropu @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType
amanomer commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType URL: https://github.com/apache/spark/pull/26811#discussion_r358076319 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ## @@ -850,7 +850,7 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession { val errorMsg1 = s""" |Input to function array_contains should have been array followed by a - |value with same element type, but it's [array, decimal(29,29)]. + |value with same element type, but it's [array, decimal(38,29)]. Review comment: cc @cloud-fan @maropu This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized
AmplabJenkins removed a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized URL: https://github.com/apache/spark/pull/26696#issuecomment-566398230 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20233/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] amanomer commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType
amanomer commented on a change in pull request #26811: [SPARK-29600][SQL] ArrayContains function may return incorrect result for DecimalType URL: https://github.com/apache/spark/pull/26811#discussion_r358617660 ## File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ## @@ -850,7 +850,7 @@ class DataFrameFunctionsSuite extends QueryTest with SharedSparkSession { val errorMsg1 = s""" |Input to function array_contains should have been array followed by a - |value with same element type, but it's [array, decimal(29,29)]. + |value with same element type, but it's [array, decimal(38,29)]. Review comment: Do you mean why in above test case query, `ArrayContains` is throwing `AnalysisException` instead of casting integer to Decimal? An integer cannot be casted to decimal with scale > 28. ``` decimalWith28Zeroes = 1. SELECT array_contains(array(1), decimalWith28Zeroes); Result =>> true ``` ``` decimalWith29Zeroes = 1.0 SELECT array_contains(array(1), decimalWith29Zeroes); Result =>> AnalysisException ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized
AmplabJenkins removed a comment on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized URL: https://github.com/apache/spark/pull/26696#issuecomment-566398221 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized
AmplabJenkins commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized URL: https://github.com/apache/spark/pull/26696#issuecomment-566398230 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20233/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized
AmplabJenkins commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized URL: https://github.com/apache/spark/pull/26696#issuecomment-566398221 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] bmarcott commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized
bmarcott commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized URL: https://github.com/apache/spark/pull/26696#issuecomment-566398156 back from vacation and updated the PR trying to address all the comments so far. let me know if I left one of your comments out. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
SparkQA commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams URL: https://github.com/apache/spark/pull/26838#issuecomment-566395699 **[Test build #115429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115429/testReport)** for PR 26838 at commit [`7a98ffb`](https://github.com/apache/spark/commit/7a98ffbc780770bfb6454ea72a597b1c0fb168d1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
AmplabJenkins commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams URL: https://github.com/apache/spark/pull/26838#issuecomment-566394210 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20232/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
AmplabJenkins commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams URL: https://github.com/apache/spark/pull/26838#issuecomment-566394205 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
AmplabJenkins removed a comment on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams URL: https://github.com/apache/spark/pull/26838#issuecomment-566394210 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20232/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
AmplabJenkins removed a comment on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams URL: https://github.com/apache/spark/pull/26838#issuecomment-566394205 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] maropu commented on issue #26917: [SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes
maropu commented on issue #26917: [SPARK-30278][SQL][DOC] Update Spark SQL document menu for new changes URL: https://github.com/apache/spark/pull/26917#issuecomment-566394194 How about adding the docs about the SQL case, too? https://github.com/apache/spark/pull/25464 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] huaxingao commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams
huaxingao commented on issue #26838: [SPARK-30144][ML][PySpark] Make MultilayerPerceptronClassificationModel extend MultilayerPerceptronParams URL: https://github.com/apache/spark/pull/26838#issuecomment-566393965 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] planga82 commented on a change in pull request #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution
planga82 commented on a change in pull request #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution URL: https://github.com/apache/spark/pull/26890#discussion_r358613325 ## File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala ## @@ -474,48 +474,48 @@ class ResolveSessionCatalog( tbl.asTableIdentifier, propertyKey) -case DescribeFunctionStatement(CatalogAndIdentifier(catalog, functionIdent), extended) => - val functionIdentifier = if (isSessionCatalog(catalog)) { -functionIdent.asMultipartIdentifier match { - case Seq(db, fn) => FunctionIdentifier(fn, Some(db)) - case Seq(fn) => FunctionIdentifier(fn, None) - case _ => -throw new AnalysisException(s"Unsupported function name '${functionIdent.quoted}'") -} - } else { -throw new AnalysisException ("DESCRIBE FUNCTION is only supported in v1 catalog") - } - DescribeFunctionCommand(functionIdentifier, extended) +case DescribeFunctionStatement(CatalogAndIdentifier(catalog, ident), extended) => + val functionIdent = +parseSessionCatalogFunctionIdentifier("DESCRIBE FUNCTION", catalog, ident) + DescribeFunctionCommand(functionIdent, extended) case ShowFunctionsStatement(userScope, systemScope, pattern, fun) => val (database, function) = fun match { -case Some(CatalogAndIdentifier(catalog, functionIdent)) => - if (isSessionCatalog(catalog)) { -functionIdent.asMultipartIdentifier match { - case Seq(db, fn) => (Some(db), Some(fn)) - case Seq(fn) => (None, Some(fn)) - case _ => -throw new AnalysisException(s"Unsupported function name '${functionIdent.quoted}'") -} - } else { -throw new AnalysisException ("SHOW FUNCTIONS is only supported in v1 catalog") - } +case Some(CatalogAndIdentifier(catalog, ident)) => + val FunctionIdentifier(fn, db) = +parseSessionCatalogFunctionIdentifier("SHOW FUNCTIONS", catalog, ident) + (db, Some(fn)) case None => (None, pattern) } ShowFunctionsCommand(database, function, userScope, systemScope) -case DropFunctionStatement(CatalogAndIdentifier(catalog, functionIdent), ifExists, isTemp) => - if (isSessionCatalog(catalog)) { -val (database, function) = functionIdent.asMultipartIdentifier match { - case Seq(db, fn) => (Some(db), fn) - case Seq(fn) => (None, fn) - case _ => -throw new AnalysisException(s"Unsupported function name '${functionIdent.quoted}'") -} -DropFunctionCommand(database, function, ifExists, isTemp) - } else { -throw new AnalysisException("DROP FUNCTION is only supported in v1 catalog") +case DropFunctionStatement(CatalogAndIdentifier(catalog, ident), ifExists, isTemp) => + val FunctionIdentifier(function, database) = +parseSessionCatalogFunctionIdentifier("DROP FUNCTION", catalog, ident) + DropFunctionCommand(database, function, ifExists, isTemp) + +case CreateFunctionStatement(CatalogAndIdentifier(catalog, ident), + className, resources, isTemp, ignoreIfExists, replace) => + val FunctionIdentifier(function, database) = +parseSessionCatalogFunctionIdentifier("CREATE FUNCTION", catalog, ident) + CreateFunctionCommand(database, function, className, resources, isTemp, ignoreIfExists, +replace) + } + + private def parseSessionCatalogFunctionIdentifier( + sql: String, + catalog: CatalogPlugin, + functionIdent: Identifier): FunctionIdentifier = { +if (isSessionCatalog(catalog)) { + functionIdent.asMultipartIdentifier match { +case Seq(db, fn) => FunctionIdentifier(fn, Some(db)) +case Seq(fn) => FunctionIdentifier(fn, None) +case _ => + throw new AnalysisException(s"Unsupported function name '${functionIdent.quoted}'") Review comment: In the show columns statement we decide this message > Namespace name should have only one part if specified How about the same message or similar? Both are clearer This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] planga82 commented on a change in pull request #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution
planga82 commented on a change in pull request #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution URL: https://github.com/apache/spark/pull/26890#discussion_r358613325 ## File path: sql/core/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveSessionCatalog.scala ## @@ -474,48 +474,48 @@ class ResolveSessionCatalog( tbl.asTableIdentifier, propertyKey) -case DescribeFunctionStatement(CatalogAndIdentifier(catalog, functionIdent), extended) => - val functionIdentifier = if (isSessionCatalog(catalog)) { -functionIdent.asMultipartIdentifier match { - case Seq(db, fn) => FunctionIdentifier(fn, Some(db)) - case Seq(fn) => FunctionIdentifier(fn, None) - case _ => -throw new AnalysisException(s"Unsupported function name '${functionIdent.quoted}'") -} - } else { -throw new AnalysisException ("DESCRIBE FUNCTION is only supported in v1 catalog") - } - DescribeFunctionCommand(functionIdentifier, extended) +case DescribeFunctionStatement(CatalogAndIdentifier(catalog, ident), extended) => + val functionIdent = +parseSessionCatalogFunctionIdentifier("DESCRIBE FUNCTION", catalog, ident) + DescribeFunctionCommand(functionIdent, extended) case ShowFunctionsStatement(userScope, systemScope, pattern, fun) => val (database, function) = fun match { -case Some(CatalogAndIdentifier(catalog, functionIdent)) => - if (isSessionCatalog(catalog)) { -functionIdent.asMultipartIdentifier match { - case Seq(db, fn) => (Some(db), Some(fn)) - case Seq(fn) => (None, Some(fn)) - case _ => -throw new AnalysisException(s"Unsupported function name '${functionIdent.quoted}'") -} - } else { -throw new AnalysisException ("SHOW FUNCTIONS is only supported in v1 catalog") - } +case Some(CatalogAndIdentifier(catalog, ident)) => + val FunctionIdentifier(fn, db) = +parseSessionCatalogFunctionIdentifier("SHOW FUNCTIONS", catalog, ident) + (db, Some(fn)) case None => (None, pattern) } ShowFunctionsCommand(database, function, userScope, systemScope) -case DropFunctionStatement(CatalogAndIdentifier(catalog, functionIdent), ifExists, isTemp) => - if (isSessionCatalog(catalog)) { -val (database, function) = functionIdent.asMultipartIdentifier match { - case Seq(db, fn) => (Some(db), fn) - case Seq(fn) => (None, fn) - case _ => -throw new AnalysisException(s"Unsupported function name '${functionIdent.quoted}'") -} -DropFunctionCommand(database, function, ifExists, isTemp) - } else { -throw new AnalysisException("DROP FUNCTION is only supported in v1 catalog") +case DropFunctionStatement(CatalogAndIdentifier(catalog, ident), ifExists, isTemp) => + val FunctionIdentifier(function, database) = +parseSessionCatalogFunctionIdentifier("DROP FUNCTION", catalog, ident) + DropFunctionCommand(database, function, ifExists, isTemp) + +case CreateFunctionStatement(CatalogAndIdentifier(catalog, ident), + className, resources, isTemp, ignoreIfExists, replace) => + val FunctionIdentifier(function, database) = +parseSessionCatalogFunctionIdentifier("CREATE FUNCTION", catalog, ident) + CreateFunctionCommand(database, function, className, resources, isTemp, ignoreIfExists, +replace) + } + + private def parseSessionCatalogFunctionIdentifier( + sql: String, + catalog: CatalogPlugin, + functionIdent: Identifier): FunctionIdentifier = { +if (isSessionCatalog(catalog)) { + functionIdent.asMultipartIdentifier match { +case Seq(db, fn) => FunctionIdentifier(fn, Some(db)) +case Seq(fn) => FunctionIdentifier(fn, None) +case _ => + throw new AnalysisException(s"Unsupported function name '${functionIdent.quoted}'") Review comment: In the show columns statement we decide this message > Namespace name should have only one part if specified How about the same message or similar? Both are clearer This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #26807: [SPARK-30181][SQL] Only add string or integral type column to metastore partition filter
cloud-fan commented on issue #26807: [SPARK-30181][SQL] Only add string or integral type column to metastore partition filter URL: https://github.com/apache/spark/pull/26807#issuecomment-566391709 Did you use the latest branch-2.4? I'm a little confused about why the bug is still there. No matter we do cast or not, we check the type of the column, not the type of the cast expression. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile
yaooqinn commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile URL: https://github.com/apache/spark/pull/26905#discussion_r358607182 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala ## @@ -83,32 +83,37 @@ case class ApproximatePercentile( } // Mark as lazy so that accuracyExpression is not evaluated during tree transformation. - private lazy val accuracy: Int = accuracyExpression.eval().asInstanceOf[Int] - - override def inputTypes: Seq[AbstractDataType] = { -// Support NumericType, DateType and TimestampType since their internal types are all numeric, -// and can be easily cast to double for processing. -Seq(TypeCollection(NumericType, DateType, TimestampType), - TypeCollection(DoubleType, ArrayType(DoubleType)), IntegerType) - } + private lazy val accuracy: Long = accuracyExpression.eval().asInstanceOf[Number].longValue() // Mark as lazy so that percentageExpression is not evaluated during tree transformation. private lazy val (returnPercentileArray: Boolean, percentages: Array[Double]) = -percentageExpression.eval() match { - // Rule ImplicitTypeCasts can cast other numeric types to double - case num: Double => (false, Array(num)) - case arrayData: ArrayData => (true, arrayData.toDoubleArray()) +percentageExpression.dataType match { + case DoubleType => (false, Array(percentageExpression.eval().asInstanceOf[Double])) + case _: NumericType => Review comment: Ok, I' ll check on this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26919: [SPARK-30280][DOC] Update docs for make Hive 2.3 dependency by default
dongjoon-hyun commented on a change in pull request #26919: [SPARK-30280][DOC] Update docs for make Hive 2.3 dependency by default URL: https://github.com/apache/spark/pull/26919#discussion_r358606356 ## File path: docs/building-spark.md ## @@ -83,13 +83,10 @@ Example: To enable Hive integration for Spark SQL along with its JDBC server and CLI, add the `-Phive` and `-Phive-thriftserver` profiles to your existing build options. -By default, Spark will use Hive 1.2.1 with the `hadoop-2.7` profile, and Hive 2.3.6 with the `hadoop-3.2` profile. - -# With Hive 1.2.1 support -./build/mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean package +By default Spark will build with Hive 2.3.6. # With Hive 2.3.6 support -./build/mvn -Pyarn -Phive -Phive-thriftserver -Phadoop-3.2 -DskipTests clean package +./build/mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean package Review comment: Shall we add the Hive 1.2.1 example back after this for users? ``` # With Hive 1.2.1 support ./build/mvn -Pyarn -Phive -Phive-thriftserver -Phive-1.2 -DskipTests clean package ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26919: [SPARK-30280][DOC] Update docs for make Hive 2.3 dependency by default
dongjoon-hyun commented on a change in pull request #26919: [SPARK-30280][DOC] Update docs for make Hive 2.3 dependency by default URL: https://github.com/apache/spark/pull/26919#discussion_r358606489 ## File path: docs/building-spark.md ## @@ -83,13 +83,10 @@ Example: To enable Hive integration for Spark SQL along with its JDBC server and CLI, add the `-Phive` and `-Phive-thriftserver` profiles to your existing build options. -By default, Spark will use Hive 1.2.1 with the `hadoop-2.7` profile, and Hive 2.3.6 with the `hadoop-3.2` profile. - -# With Hive 1.2.1 support -./build/mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean package +By default Spark will build with Hive 2.3.6. # With Hive 2.3.6 support -./build/mvn -Pyarn -Phive -Phive-thriftserver -Phadoop-3.2 -DskipTests clean package +./build/mvn -Pyarn -Phive -Phive-thriftserver -DskipTests clean package Review comment: cc @gatorsmile and @srowen This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile
cloud-fan commented on a change in pull request #26905: [SPARK-30266][SQL] Avoid overflow and match error in ApproximatePercentile URL: https://github.com/apache/spark/pull/26905#discussion_r358606175 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproximatePercentile.scala ## @@ -83,32 +83,37 @@ case class ApproximatePercentile( } // Mark as lazy so that accuracyExpression is not evaluated during tree transformation. - private lazy val accuracy: Int = accuracyExpression.eval().asInstanceOf[Int] - - override def inputTypes: Seq[AbstractDataType] = { -// Support NumericType, DateType and TimestampType since their internal types are all numeric, -// and can be easily cast to double for processing. -Seq(TypeCollection(NumericType, DateType, TimestampType), - TypeCollection(DoubleType, ArrayType(DoubleType)), IntegerType) - } + private lazy val accuracy: Long = accuracyExpression.eval().asInstanceOf[Number].longValue() // Mark as lazy so that percentageExpression is not evaluated during tree transformation. private lazy val (returnPercentileArray: Boolean, percentages: Array[Double]) = -percentageExpression.eval() match { - // Rule ImplicitTypeCasts can cast other numeric types to double - case num: Double => (false, Array(num)) - case arrayData: ArrayData => (true, arrayData.toDoubleArray()) +percentageExpression.dataType match { + case DoubleType => (false, Array(percentageExpression.eval().asInstanceOf[Double])) + case _: NumericType => Review comment: We can do custom type coercion (don't extend ImplicitCastInputTypes) and disallow casting long to int implicitly for this function. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26919: [SPARK-30280][DOC] Update docs for make Hive 2.3 dependency by default
AmplabJenkins removed a comment on issue #26919: [SPARK-30280][DOC] Update docs for make Hive 2.3 dependency by default URL: https://github.com/apache/spark/pull/26919#issuecomment-566388749 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115428/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #26919: [SPARK-30280][DOC] Update docs for make Hive 2.3 dependency by default
AmplabJenkins removed a comment on issue #26919: [SPARK-30280][DOC] Update docs for make Hive 2.3 dependency by default URL: https://github.com/apache/spark/pull/26919#issuecomment-566388743 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #26919: [SPARK-30280][DOC] Update docs for make Hive 2.3 dependency by default
AmplabJenkins commented on issue #26919: [SPARK-30280][DOC] Update docs for make Hive 2.3 dependency by default URL: https://github.com/apache/spark/pull/26919#issuecomment-566388749 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115428/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org