[GitHub] [spark] zhengruifeng commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML
zhengruifeng commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1129079073 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connect.ml + +import scala.reflect.ClassTag + +import org.apache.spark.connect.proto +import org.apache.spark.ml.param.{ParamMap, Params} +import org.apache.spark.sql.{DataFrame, Dataset} +import org.apache.spark.sql.connect.planner.SparkConnectPlanner +import org.apache.spark.sql.connect.service.SessionHolder + +object MLUtils { + + def setInstanceParams(instance: Params, paramsProto: proto.Params): Unit = { +import scala.collection.JavaConverters._ +paramsProto.getParamsMap.asScala.foreach { case (paramName, paramValueProto) => + val paramDef = instance.getParam(paramName) + val paramValue = parseParamValue(paramDef.paramValueClassTag.runtimeClass, paramValueProto) + instance.set(paramDef, paramValue) +} +paramsProto.getDefaultParamsMap.asScala.foreach { case (paramName, paramValueProto) => + val paramDef = instance.getParam(paramName) + val paramValue = parseParamValue(paramDef.paramValueClassTag.runtimeClass, paramValueProto) + instance._setDefault(paramDef -> paramValue) +} + } + + def parseParamValue(paramType: Class[_], paramValueProto: proto.Expression.Literal): Any = { Review Comment: I see, we don't have `IntParam` in PySpark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1127834899 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/Serializer.scala: ## @@ -0,0 +1,87 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connect.ml + +import org.apache.spark.connect.proto +import org.apache.spark.ml.linalg.{Matrix, Vector} + +object Serializer { Review Comment: ~~Good idea.~~ We need address this issue first https://github.com/apache/spark/pull/40297#discussion_r1129072923 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1129073881 ## connector/connect/common/src/main/protobuf/spark/connect/ml.proto: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +syntax = 'proto3'; + +package spark.connect; + +import "spark/connect/expressions.proto"; +import "spark/connect/relations.proto"; +import "spark/connect/ml_common.proto"; + +option java_multiple_files = true; +option java_package = "org.apache.spark.connect.proto"; + + +message Evaluator { + string name = 1; + Params params = 2; + string uid = 3; +} + + +message MlCommand { + oneof ml_command_type { +Fit fit = 1; +ModelAttr model_attr = 2; +ModelSummaryAttr model_summary_attr = 3; +LoadModel load_model = 4; +SaveModel save_model = 5; +Evaluate evaluate = 6; + } + + message Fit { +Stage estimator = 1; +Relation dataset = 2; + } + + message Evaluate { +Evaluator evaluator = 1; + } + + message LoadModel { Review Comment: For estimator, we need define a new LoadEstimator command -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1129073675 ## connector/connect/common/src/main/protobuf/spark/connect/ml.proto: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +syntax = 'proto3'; + +package spark.connect; + +import "spark/connect/expressions.proto"; +import "spark/connect/relations.proto"; +import "spark/connect/ml_common.proto"; + +option java_multiple_files = true; +option java_package = "org.apache.spark.connect.proto"; + + +message Evaluator { + string name = 1; + Params params = 2; + string uid = 3; +} + + +message MlCommand { + oneof ml_command_type { +Fit fit = 1; +ModelAttr model_attr = 2; +ModelSummaryAttr model_summary_attr = 3; +LoadModel load_model = 4; +SaveModel save_model = 5; +Evaluate evaluate = 6; + } + + message Fit { +Stage estimator = 1; +Relation dataset = 2; + } + + message Evaluate { +Evaluator evaluator = 1; + } + + message LoadModel { Review Comment: I will rename it to `LoadStage` to support estimator too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1129073121 ## connector/connect/common/src/main/protobuf/spark/connect/ml.proto: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +syntax = 'proto3'; + +package spark.connect; + +import "spark/connect/expressions.proto"; +import "spark/connect/relations.proto"; +import "spark/connect/ml_common.proto"; + +option java_multiple_files = true; +option java_package = "org.apache.spark.connect.proto"; + + +message Evaluator { + string name = 1; + Params params = 2; + string uid = 3; +} + + +message MlCommand { + oneof ml_command_type { +Fit fit = 1; +ModelAttr model_attr = 2; +ModelSummaryAttr model_summary_attr = 3; +LoadModel load_model = 4; +SaveModel save_model = 5; +Evaluate evaluate = 6; + } + + message Fit { +Stage estimator = 1; +Relation dataset = 2; + } + + message Evaluate { +Evaluator evaluator = 1; + } + + message LoadModel { Review Comment: Sure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] WeichenXu123 commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML
WeichenXu123 commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1129072923 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connect.ml + +import scala.reflect.ClassTag + +import org.apache.spark.connect.proto +import org.apache.spark.ml.param.{ParamMap, Params} +import org.apache.spark.sql.{DataFrame, Dataset} +import org.apache.spark.sql.connect.planner.SparkConnectPlanner +import org.apache.spark.sql.connect.service.SessionHolder + +object MLUtils { + + def setInstanceParams(instance: Params, paramsProto: proto.Params): Unit = { +import scala.collection.JavaConverters._ +paramsProto.getParamsMap.asScala.foreach { case (paramName, paramValueProto) => + val paramDef = instance.getParam(paramName) + val paramValue = parseParamValue(paramDef.paramValueClassTag.runtimeClass, paramValueProto) + instance.set(paramDef, paramValue) +} +paramsProto.getDefaultParamsMap.asScala.foreach { case (paramName, paramValueProto) => + val paramDef = instance.getParam(paramName) + val paramValue = parseParamValue(paramDef.paramValueClassTag.runtimeClass, paramValueProto) + instance._setDefault(paramDef -> paramValue) +} + } + + def parseParamValue(paramType: Class[_], paramValueProto: proto.Expression.Literal): Any = { Review Comment: > for int/long case, we can specify the DataType in Python Client This does not help. Because the pyspark estimator Params it does not record the type attribute, so we cannot distinguish int/long, double/float for python side param values. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error
shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1459668923 > You first defined a case-sensitive data set, then queried in a case-insensitive way, I guess the error is expected. In the physical plan, both id and ID columns are projected to the same column in the dataframe: _1#6 _1#6 AS id#17, _1#6 AS ID#17 So, there is no ambiguity, Also, in the matched attributes, results are same: attributes: Vector(id#17, id#17) Just because, we have duplicates in the matched result, it's being considered as ambiguous. If the matched attribute result was Vector(id#17, ID#17) , then it would have been valid error. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error
yaooqinn commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1459654900 You first defined a case-sensitive data set, then queried in a case-insensitive way, I guess the error is expected. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error
shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1459652942 > Can you try `set spark.sql.caseSensitive=true`? Yes, I have tried it. With caseSensitive set to false, it will work as then id and ID will be treated as separate columns. Issue is when columns names are supposed to considered as case insensitive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored
mridulm commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1459652457 Sure ! Please go ahead :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error
yaooqinn commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1459648316 Can you try `set spark.sql.caseSensitive=true`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] MaxGekk commented on a diff in pull request #40126: [SPARK-40822][SQL] Stable derived column aliases
MaxGekk commented on code in PR #40126: URL: https://github.com/apache/spark/pull/40126#discussion_r1129047990 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -465,7 +465,20 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor } /** - * Replaces [[UnresolvedAlias]]s with concrete aliases. + * Replaces [[UnresolvedAlias]]s with concrete aliases by applying the following rules: + * 1. Use the specified name of named expressions; + * 2. Derive a stable alias from the original normalized SQL text except of: + * 2.1. A multipart identifier (column, field, mapkey, ...) -> the right most identifier. + * For example: a.b.c => c + * 2.2. A CAST, or try_cast -> the argument of the cast. + * For example: CAST(c1 AS INT) => c1 + * 2.3. A map lookup with a literal -> the map key. + * For example: map[5] => 5 + * 2.4. Squeezing out unwanted whitespace or comments. + * For example: T.c1 + /* test */ foo( 5 ) => T.c1+foo(5) + * 2.5. Normalize SQL string literals when ANSI mode is enabled and the SQL config + * `spark.sql.ansi.doubleQuotedIdentifiers` is set to `true`. + * For example: "abc" => 'abc' Review Comment: Updated the comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #40317: [SPARK-42700][BUILD] Add `h2` as test dependency of connect-server module
LuciferYang commented on PR #40317: URL: https://github.com/apache/spark/pull/40317#issuecomment-1459646333 Thanks @HyukjinKwon @hvanhovell @dongjoon-hyun @beliefer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng opened a new pull request, #40331: [SPARK-42713][PYTHON][DOCS] Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference
zhengruifeng opened a new pull request, #40331: URL: https://github.com/apache/spark/pull/40331 ### What changes were proposed in this pull request? Add '__getattr__' and '__getitem__' of DataFrame and Column to API reference ### Why are the changes needed? '__getattr__' and '__getitem__' are widely used, but we did not document them. ### Does this PR introduce _any_ user-facing change? yes, new doc ### How was this patch tested? added doctests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xinrong-meng commented on pull request #40329: [SPARK-42710][CONNECT][PYTHON] Rename FrameMap proto to MapPartitions
xinrong-meng commented on PR #40329: URL: https://github.com/apache/spark/pull/40329#issuecomment-1459636445 CC @HyukjinKwon @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xinrong-meng opened a new pull request, #40330: [SPARK-42712][PYTHON][DOC] Improve docstring of mapInPandas and mapInArrow
xinrong-meng opened a new pull request, #40330: URL: https://github.com/apache/spark/pull/40330 ### What changes were proposed in this pull request? Improve docstring of mapInPandas and mapInArrow ### Why are the changes needed? For readability. We call out they are not scalar - the input and output of the function might be of different sizes. ### Does this PR introduce _any_ user-facing change? No. Doc change only. ### How was this patch tested? Existing tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xinrong-meng commented on pull request #40330: [SPARK-42712][PYTHON][DOC] Improve docstring of mapInPandas and mapInArrow
xinrong-meng commented on PR #40330: URL: https://github.com/apache/spark/pull/40330#issuecomment-1459635902 CC @HyukjinKwon @hvanhovell -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jerqi commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored
jerqi commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1459630906 > @jerqi the basic issue here is, `getPreferredLocations` in `ShuffledRowRDD` should return `Nil` at the very beginning in case `spark.shuffle.reduceLocality.enabled = false` (conceptually). > > This logic is pushed into MapOutputTracker though - and `getPreferredLocationsForShuffle` honors `spark.shuffle.reduceLocality.enabled` - but `getMapLocation` does not. > > So the fix would be to fix `getMapLocation` to honor the parameter. > > We should fix this in a different PR though. > Could I raise another pr to fix this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jerqi commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored
jerqi commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1459630519 > Could I raise another pr to fix this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #40322: [SPARK-41775][PYTHON][FOLLOW-UP] Updating error message for training using PyTorch functions
zhengruifeng commented on PR #40322: URL: https://github.com/apache/spark/pull/40322#issuecomment-1459625070 merged into master/branch-3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng closed pull request #40322: [SPARK-41775][PYTHON][FOLLOW-UP] Updating error message for training using PyTorch functions
zhengruifeng closed pull request #40322: [SPARK-41775][PYTHON][FOLLOW-UP] Updating error message for training using PyTorch functions URL: https://github.com/apache/spark/pull/40322 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored
mridulm commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1459618430 @jerqi the basic issue here is, `getPreferredLocations` in `ShuffledRowRDD` should return `Nil` at the very beginning in case `spark.shuffle.reduceLocality.enabled = false` We should fix this in a different PR though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] grundprinzip commented on a diff in pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command
grundprinzip commented on code in PR #40323: URL: https://github.com/apache/spark/pull/40323#discussion_r1129021709 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1508,8 +1508,10 @@ class SparkConnectPlanner(val session: SparkSession) { maxRecordsPerBatch, maxBatchSize, timeZoneId) - assert(batches.size == 1) - batches.next() + assert(batches.hasNext) Review Comment: I see thank you. My understanding with iterators was that non rewindable iterators simply do not have a size method. But something new learned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin commented on a diff in pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command
ueshin commented on code in PR #40323: URL: https://github.com/apache/spark/pull/40323#discussion_r1129020413 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1508,8 +1508,10 @@ class SparkConnectPlanner(val session: SparkSession) { maxRecordsPerBatch, maxBatchSize, timeZoneId) - assert(batches.size == 1) - batches.next() + assert(batches.hasNext) Review Comment: The `batches` is an iterator, so `batches.size` consumes all the data in the iterator to calculate the size. Then `batches.next()` would return an empty data? Usually it should throw an Exception, though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xinrong-meng commented on pull request #40244: [SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined functions
xinrong-meng commented on PR #40244: URL: https://github.com/apache/spark/pull/40244#issuecomment-1459608526 Merged to master and branch-3.4, thanks all! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xinrong-meng closed pull request #40244: [SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined functions
xinrong-meng closed pull request #40244: [SPARK-42643][CONNECT][PYTHON] Register Java (aggregate) user-defined functions URL: https://github.com/apache/spark/pull/40244 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] xinrong-meng opened a new pull request, #40329: Rename FrameMap proto to MapPartitions
xinrong-meng opened a new pull request, #40329: URL: https://github.com/apache/spark/pull/40329 ### What changes were proposed in this pull request? Rename FrameMap proto to MapPartitions. ### Why are the changes needed? For readability. Frame Map API refers to mapInPandas and mapInArrow, which are equivalent to MapPartitions. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML
zhengruifeng commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1129010791 ## connector/connect/common/src/main/protobuf/spark/connect/ml.proto: ## @@ -0,0 +1,135 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +syntax = 'proto3'; + +package spark.connect; + +import "spark/connect/expressions.proto"; +import "spark/connect/relations.proto"; +import "spark/connect/ml_common.proto"; + +option java_multiple_files = true; +option java_package = "org.apache.spark.connect.proto"; + + +message Evaluator { + string name = 1; + Params params = 2; + string uid = 3; +} + + +message MlCommand { + oneof ml_command_type { +Fit fit = 1; +ModelAttr model_attr = 2; +ModelSummaryAttr model_summary_attr = 3; +LoadModel load_model = 4; +SaveModel save_model = 5; +Evaluate evaluate = 6; + } + + message Fit { +Stage estimator = 1; +Relation dataset = 2; + } + + message Evaluate { +Evaluator evaluator = 1; + } + + message LoadModel { Review Comment: will `LodeModel` and `SaveModel` also support load/save estimator/transformer? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names
zhengruifeng commented on PR #40287: URL: https://github.com/apache/spark/pull/40287#issuecomment-1459594072 If the nested lambda issue also exists in the Scala Client, do we need to fix it in the same way? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] grundprinzip commented on a diff in pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command
grundprinzip commented on code in PR #40323: URL: https://github.com/apache/spark/pull/40323#discussion_r1129008040 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala: ## @@ -1508,8 +1508,10 @@ class SparkConnectPlanner(val session: SparkSession) { maxRecordsPerBatch, maxBatchSize, timeZoneId) - assert(batches.size == 1) - batches.next() + assert(batches.hasNext) Review Comment: Sorry for the late reply but how is this code different to the existing one? val bytes = batches.next() bytes Is the same as batches.next() The asserts in between don't count as they don't have side effects. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML
zhengruifeng commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1128994906 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/AlgorithmRegisty.scala: ## @@ -0,0 +1,104 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connect.ml + +import org.apache.spark.connect.proto +import org.apache.spark.ml +import org.apache.spark.ml.{Estimator, Model} +import org.apache.spark.ml.classification.TrainingSummary +import org.apache.spark.sql.DataFrame + + +object AlgorithmRegistry { + + def get(name: String): Algorithm = { +name match { + case "LogisticRegression" => new LogisticRegressionAlgorithm + case _ => +throw new IllegalArgumentException() +} + } + +} + + +abstract class Algorithm { + + def initiateEstimator(uid: String): Estimator[_] + + def getModelAttr(model: Model[_], name: String): Either[proto.MlCommandResponse, DataFrame] Review Comment: ok. I am neutral on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shrprasa commented on pull request #40258: [SPARK-42655][SQL] Incorrect ambiguous column reference error
shrprasa commented on PR #40258: URL: https://github.com/apache/spark/pull/40258#issuecomment-1459535564 Gentle ping @dongjoon-hyun @mridulm @HyukjinKwon @yaooqinn Can you please review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request, #40328: [SPARK-42709][PYTHON] Remove the assumption of `__file__` being available
HyukjinKwon opened a new pull request, #40328: URL: https://github.com/apache/spark/pull/40328 ### What changes were proposed in this pull request? This PR proposes to add a check for `__file__` attributes. ### Why are the changes needed? `__file__` might not be available everywhere. See also https://github.com/scikit-learn/scikit-learn/issues/20081 ### Does this PR introduce _any_ user-facing change? If users' Python environment does not have `__file__`, now users can use PySpark in their environment too. ### How was this patch tested? Manually tested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shrprasa commented on a diff in pull request #40128: [SPARK-42466][K8S]: Cleanup k8s upload directory when job terminates
shrprasa commented on code in PR #40128: URL: https://github.com/apache/spark/pull/40128#discussion_r1128983339 ## resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/submit/KubernetesClientApplication.scala: ## @@ -143,6 +144,9 @@ private[spark] class Client( logError("Please check \"kubectl auth can-i create [resource]\" first." + " It should be yes. And please also check your feature step implementation.") kubernetesClient.resourceList(preKubernetesResources: _*).delete() +// register shutdownhook for cleaning up the upload dir only Review Comment: @holdenk do you have any suggestion to handle this in a better way? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #40233: [WIP][SPARK-42630][CONNECT][PYTHON] Make `parse_data_type` use new proto message `DDLParse`
zhengruifeng commented on PR #40233: URL: https://github.com/apache/spark/pull/40233#issuecomment-1459503928 close in favor of https://github.com/apache/spark/pull/40260 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng closed pull request #40233: [WIP][SPARK-42630][CONNECT][PYTHON] Make `parse_data_type` use new proto message `DDLParse`
zhengruifeng closed pull request #40233: [WIP][SPARK-42630][CONNECT][PYTHON] Make `parse_data_type` use new proto message `DDLParse` URL: https://github.com/apache/spark/pull/40233 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] shrprasa commented on pull request #37880: [SPARK-39399] [CORE] [K8S]: Fix proxy-user authentication for Spark on k8s in cluster deploy mode
shrprasa commented on PR #37880: URL: https://github.com/apache/spark/pull/37880#issuecomment-1459499404 Thanks @yaooqinn for merging the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dtenedor commented on a diff in pull request #40299: [SPARK-42684][SQL] v2 catalog should not allow column default value by default
dtenedor commented on code in PR #40299: URL: https://github.com/apache/spark/pull/40299#discussion_r1128979585 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ResolveDefaultColumnsUtil.scala: ## @@ -137,6 +113,49 @@ object ResolveDefaultColumns { } } + // Fails if the given catalog does not support column default value. + def validateCatalogForDefaultValue( + schema: StructType, + catalog: TableCatalog, + ident: Identifier): Unit = { +if (SQLConf.get.enableDefaultColumns && + schema.exists(_.metadata.contains(CURRENT_DEFAULT_COLUMN_METADATA_KEY))) { + if (!catalog.capabilities().contains( Review Comment: We can combine with the previous if statement using `&&`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #40325: [SPARK-42707][CONNECT][DOCS] Update developer documentation about API stability warning
zhengruifeng commented on PR #40325: URL: https://github.com/apache/spark/pull/40325#issuecomment-1459464317 merged into master/3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng closed pull request #40325: [SPARK-42707][CONNECT][DOCS] Update developer documentation about API stability warning
zhengruifeng closed pull request #40325: [SPARK-42707][CONNECT][DOCS] Update developer documentation about API stability warning URL: https://github.com/apache/spark/pull/40325 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #40317: [SPARK-42700][BUILD] Add `h2` as test dependency of connect-server module
HyukjinKwon closed pull request #40317: [SPARK-42700][BUILD] Add `h2` as test dependency of connect-server module URL: https://github.com/apache/spark/pull/40317 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #40317: [SPARK-42700][BUILD] Add `h2` as test dependency of connect-server module
HyukjinKwon commented on PR #40317: URL: https://github.com/apache/spark/pull/40317#issuecomment-1459462492 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40325: [SPARK-42707][CONNECT][DOCS] Update developer documentation about API stability warning
HyukjinKwon commented on code in PR #40325: URL: https://github.com/apache/spark/pull/40325#discussion_r1128968097 ## python/pyspark/sql/connect/__init__.py: ## @@ -15,5 +15,4 @@ # limitations under the License. # -"""Currently Spark Connect is very experimental and the APIs to interact with -Spark through this API are can be changed at any time without warning.""" +"""Spark Connect cleint""" Review Comment: ```suggestion """Spark Connect client""" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #40327: [SPARK-42266][PYTHON] Remove the parent directory in shell.py execution when IPython is used
HyukjinKwon commented on PR #40327: URL: https://github.com/apache/spark/pull/40327#issuecomment-1459455421 cc @zhengruifeng @ueshin @grundprinzip FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request, #40327: [SPARK-42266][PYTHON] Remove the parent directory in shell.py execution when IPython is used
HyukjinKwon opened a new pull request, #40327: URL: https://github.com/apache/spark/pull/40327 ### What changes were proposed in this pull request? This PR proposes to remove the parent directory in `shell.py` execution when IPython is used. This is a general issue for PySpark shell specifically with IPython - IPython temporarily adds the parent directory of the script into the Python path (`sys.path`), which results in searching packages under `pyspark` directory. For example, `import pandas` attempts to import `pyspark.pandas`. So far, we haven't had such cases within PySpark itself importing code path, but Spark Connect now has the case via checking dependency checking (which attempts to import pandas) which exposes the actual problem. Running it with IPython can easily reproduce the error: ```bash PYSPARK_PYTHON=ipython bin/pyspark --remote "local[*]" ``` ### Why are the changes needed? To make PySpark shell properly import other packages even when the names conflict with subpackages (e.g., `pyspark.pandas` vs `pandas`) ### Does this PR introduce _any_ user-facing change? No to the end users: - Because this path is only inserted for `shell.py` execution, and thankfully we didn't have such relative import case so far. - It fixes the issue in the unreleased, Spark Connect. ### How was this patch tested? Manually tested. ```bash PYSPARK_PYTHON=ipython bin/pyspark --remote "local[*]" ``` **Before:** ``` Python 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:42:20) Type 'copyright', 'credits' or 'license' for more information IPython 8.10.0 -- An enhanced Interactive Python. Type '?' for help. /.../spark/python/pyspark/shell.py:45: UserWarning: Failed to initialize Spark session. warnings.warn("Failed to initialize Spark session.") Traceback (most recent call last): File "/.../spark/python/pyspark/shell.py", line 40, in spark = SparkSession.builder.getOrCreate() File "/.../spark/python/pyspark/sql/session.py", line 437, in getOrCreate from pyspark.sql.connect.session import SparkSession as RemoteSparkSession File "/.../spark/python/pyspark/sql/connect/session.py", line 19, in check_dependencies(__name__, __file__) File "/.../spark/python/pyspark/sql/connect/utils.py", line 33, in check_dependencies require_minimum_pandas_version() File "/.../spark/python/pyspark/sql/pandas/utils.py", line 27, in require_minimum_pandas_version import pandas File "/.../spark/python/pyspark/pandas/__init__.py", line 29, in from pyspark.pandas.missing.general_functions import MissingPandasLikeGeneralFunctions File "/.../spark/python/pyspark/pandas/__init__.py", line 34, in require_minimum_pandas_version() File "/.../spark/python/pyspark/sql/pandas/utils.py", line 37, in require_minimum_pandas_version if LooseVersion(pandas.__version__) < LooseVersion(minimum_pandas_version): AttributeError: partially initialized module 'pandas' has no attribute '__version__' (most likely due to a circular import) ... ``` **After:** ``` Python 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:42:20) Type 'copyright', 'credits' or 'license' for more information IPython 8.10.0 -- An enhanced Interactive Python. Type '?' for help. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). 23/03/08 13:30:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 3.5.0.dev0 /_/ Using Python version 3.9.16 (main, Feb 1 2023 21:42:20) Client connected to the Spark Connect server at localhost SparkSession available as 'spark'. In [1]: ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on pull request #40299: [SPARK-42684][SQL] v2 catalog should not allow column default value by default
cloud-fan commented on PR #40299: URL: https://github.com/apache/spark/pull/40299#issuecomment-1459426907 cc @gengliangwang @dtenedor -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a diff in pull request #40297: [SPARK-42412][WIP] Initial PR of Spark connect ML
zhengruifeng commented on code in PR #40297: URL: https://github.com/apache/spark/pull/40297#discussion_r1128955568 ## connector/connect/server/src/main/scala/org/apache/spark/sql/connect/ml/MLUtils.scala: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.connect.ml + +import scala.reflect.ClassTag + +import org.apache.spark.connect.proto +import org.apache.spark.ml.param.{ParamMap, Params} +import org.apache.spark.sql.{DataFrame, Dataset} +import org.apache.spark.sql.connect.planner.SparkConnectPlanner +import org.apache.spark.sql.connect.service.SessionHolder + +object MLUtils { + + def setInstanceParams(instance: Params, paramsProto: proto.Params): Unit = { +import scala.collection.JavaConverters._ +paramsProto.getParamsMap.asScala.foreach { case (paramName, paramValueProto) => + val paramDef = instance.getParam(paramName) + val paramValue = parseParamValue(paramDef.paramValueClassTag.runtimeClass, paramValueProto) + instance.set(paramDef, paramValue) +} +paramsProto.getDefaultParamsMap.asScala.foreach { case (paramName, paramValueProto) => + val paramDef = instance.getParam(paramName) + val paramValue = parseParamValue(paramDef.paramValueClassTag.runtimeClass, paramValueProto) + instance._setDefault(paramDef -> paramValue) +} + } + + def parseParamValue(paramType: Class[_], paramValueProto: proto.Expression.Literal): Any = { Review Comment: for int/long case, we can specify the DataType in Python Client https://github.com/apache/spark/blob/c99a632fea74136964b27b28563115fe2d7667b3/python/pyspark/sql/connect/expressions.py#L165-L171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] Hisoka-X opened a new pull request, #40326: [SPARK-42708] [Docs] Improve doc about protobuf java file can't be indexed.
Hisoka-X opened a new pull request, #40326: URL: https://github.com/apache/spark/pull/40326 ### What changes were proposed in this pull request? Improve README doc for developers about protobuf java file can't be indexed. ### Why are the changes needed? To make sure developers to start spark easier. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang closed pull request #40295: [SPARK-42681][SQL] Relax ordering constraint for ALTER TABLE ADD|REPLACE column descriptor
gengliangwang closed pull request #40295: [SPARK-42681][SQL] Relax ordering constraint for ALTER TABLE ADD|REPLACE column descriptor URL: https://github.com/apache/spark/pull/40295 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on pull request #40295: [SPARK-42681][SQL] Relax ordering constraint for ALTER TABLE ADD|REPLACE column descriptor
gengliangwang commented on PR #40295: URL: https://github.com/apache/spark/pull/40295#issuecomment-1459387912 Thanks, merging to master -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #40317: [SPARK-42700][BUILD] Add `h2` as test dependency of connect-server module
beliefer commented on PR #40317: URL: https://github.com/apache/spark/pull/40317#issuecomment-1459375825 @LuciferYang Thank you for the job. https://github.com/apache/spark/pull/40291 need this one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1128943046 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor } else { v2Write } - - case u: UpdateTable if !u.skipSchemaResolution && u.resolved => Review Comment: One notable difference is not using `AssertNotNull` and relying on the same framework we have for V2 tables. In particular, we check the assignment mode and whether in and out attributes are compatible. As a consequence, we get analysis errors and not runtime errors. However, the main contribution of this PR is `AssignmentUtils` that aligns assignments with table attributes. It is a prerequisite for rewriting UPDATEs. Right now, we only rewrite DELETEs. The new utility also handles casts and char/varchar types, which we already did here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1128945996 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AlignRowLevelCommandAssignments.scala: ## @@ -0,0 +1,55 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import org.apache.spark.sql.catalyst.plans.logical.{LogicalPlan, UpdateTable} +import org.apache.spark.sql.catalyst.rules.Rule +import org.apache.spark.sql.catalyst.util.CharVarcharUtils +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.execution.datasources.v2.DataSourceV2Relation +import org.apache.spark.sql.internal.SQLConf.StoreAssignmentPolicy + +/** + * A rule that aligns assignments in row-level operations. + * + * Note that this rule must be run after resolving default values but before rewriting row-level + * commands into executable plans. This rule does not apply to tables that accept any schema. + * Such tables must inject their own rules to align assignments. + */ +object AlignRowLevelCommandAssignments extends Rule[LogicalPlan] { + + override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { +case u: UpdateTable if !u.skipSchemaResolution && u.resolved && !u.aligned => Review Comment: We can apply this rule only if `table` implements `SupportsRowLevelOperations` if that feels safer? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1128944608 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AssignmentUtils.scala: ## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import scala.collection.mutable + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.SQLConfHelper +import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, CreateNamedStruct, Expression, ExtractValue, GetStructField, Literal, NamedExpression} +import org.apache.spark.sql.catalyst.plans.logical.Assignment +import org.apache.spark.sql.catalyst.util.CharVarcharUtils +import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf.StoreAssignmentPolicy +import org.apache.spark.sql.types.{ArrayType, DataType, MapType, StructType} + +object AssignmentUtils extends SQLConfHelper with CastSupport { + + private case class ColumnUpdate(ref: Seq[String], expr: Expression) + + /** Review Comment: @cloud-fan, this doc gives a bit more details about why this PR is a prerequisite for rewriting UPDATEs. Let me know if this makes sense! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1128943046 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor } else { v2Write } - - case u: UpdateTable if !u.skipSchemaResolution && u.resolved => Review Comment: One notable difference is not using `AssertNotNull` and relying on the same framework we have for V2 tables. In particular, we check the assignment mode and whether in and out attributes are compatible. As a consequence, we get analysis errors and not runtime errors. However, the main contribution of this PR is `AssignmentUtils` that aligns assignments with table attributes. It is a prerequisite for rewriting UPDATEs. Right now, we only rewrite DELETEs. The new utility also handles casts and char/varchar types. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1128943386 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AssignmentUtils.scala: ## @@ -0,0 +1,275 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.analysis + +import scala.collection.mutable + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.catalyst.SQLConfHelper +import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, CreateNamedStruct, Expression, ExtractValue, GetStructField, Literal, NamedExpression} +import org.apache.spark.sql.catalyst.plans.logical.Assignment +import org.apache.spark.sql.catalyst.util.CharVarcharUtils +import org.apache.spark.sql.connector.catalog.CatalogV2Implicits._ +import org.apache.spark.sql.errors.QueryCompilationErrors +import org.apache.spark.sql.internal.SQLConf.StoreAssignmentPolicy +import org.apache.spark.sql.types.{ArrayType, DataType, MapType, StructType} + +object AssignmentUtils extends SQLConfHelper with CastSupport { + + private case class ColumnUpdate(ref: Seq[String], expr: Expression) + + /** + * Aligns assignments to match table columns. + * + * This method processes and reorders given assignments so that each target column gets + * an expression it should be set to. If a column does not have a matching assignment, + * it will be set to its current value. For example, if one passes table attributes c1, c2 + * and an assignment c2 = 1, this method will return c1 = c1, c2 = 1. + * + * This method also handles updates to nested columns. If there is an assignment to a particular + * nested field, this method will construct a new struct with one field updated preserving other + * fields that have not been modified. For example, if one passes table attributes c1, c2 + * where c2 is a struct with fields n1 and n2 and an assignment c2.n2 = 1, this method will + * return c1 = c1, c2 = struct(c2.n1, 1). + * + * @param attrs table attributes + * @param assignments assignments to align + * @return aligned assignments that match table columns + */ + def alignAssignments( + attrs: Seq[Attribute], + assignments: Seq[Assignment]): Seq[Assignment] = { + +val errors = new mutable.ArrayBuffer[String]() + +val output = applyUpdates( + updates = assignments.map(toColumnUpdate), + cols = attrs.map(restoreActualType), + colExprs = attrs, + addError = err => errors += err) + +if (errors.nonEmpty) { + throw QueryCompilationErrors.invalidRowLevelOperationAssignments(assignments, errors.toSeq) +} + +attrs.zip(output).map { case (attr, expr) => Assignment(attr, expr) } + } + + private def toColumnUpdate(assignment: Assignment): ColumnUpdate = { +ColumnUpdate(toRef(assignment.key), assignment.value) + } + + private def restoreActualType(attr: Attribute): Attribute = { + attr.withDataType(CharVarcharUtils.getRawType(attr.metadata).getOrElse(attr.dataType)) + } + + private def applyUpdates( + updates: Seq[ColumnUpdate], + cols: Seq[Attribute], + colExprs: Seq[Expression], + addError: String => Unit, + colPath: Seq[String] = Nil): Seq[Expression] = { + +// iterate through columns at the current level and find matching updates +cols.zip(colExprs).map { case (col, colExpr) => + // find matches for this column or any of its children + val prefixMatchedUpdates = updates.filter(update => conf.resolver(update.ref.head, col.name)) + prefixMatchedUpdates match { +// if there is no exact match and no match for children, return the column expr as is +case matchedUpdates if matchedUpdates.isEmpty => + colExpr Review Comment: Note: I am skipping char/varchar length checks when a field is assigned to itself (i.e. did not change). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact
[GitHub] [spark] aokolnychyi commented on a diff in pull request #40308: [SPARK-42151][SQL] Align UPDATE assignments with table attributes
aokolnychyi commented on code in PR #40308: URL: https://github.com/apache/spark/pull/40308#discussion_r1128943046 ## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala: ## @@ -3344,43 +3345,6 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor } else { v2Write } - - case u: UpdateTable if !u.skipSchemaResolution && u.resolved => Review Comment: One notable difference is not using `AssertNotNull` and relying on the same framework we have for V2 tables. In particular, we check the assignment mode and whether in and out attributes are compatible. The main contribution of this PR is `AssignmentUtils` that aligns assignments with table attributes. It is a prerequisite for rewriting UPDATEs. Right now, we only rewrite DELETEs. It also handles casts and char/varchar types. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a diff in pull request #40325: [SPARK-42707][CONNECT][DOCS] Update developer documentation about API stability warning
yaooqinn commented on code in PR #40325: URL: https://github.com/apache/spark/pull/40325#discussion_r1128941874 ## python/pyspark/sql/connect/__init__.py: ## @@ -15,5 +15,4 @@ # limitations under the License. # -"""Currently Spark Connect is very experimental and the APIs to interact with -Spark through this API are can be changed at any time without warning.""" +"""Spark Connect cleint""" Review Comment: nit: client -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on pull request #37880: [SPARK-39399] [CORE] [K8S]: Fix proxy-user authentication for Spark on k8s in cluster deploy mode
yaooqinn commented on PR #37880: URL: https://github.com/apache/spark/pull/37880#issuecomment-1459335005 thanks @shrprasa @holdenk, merged to master and brand-3.4/3.3/3.2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on pull request #40287: [SPARK-42562][CONNECT] UnresolvedNamedLambdaVariable in python do not need unique names
beliefer commented on PR #40287: URL: https://github.com/apache/spark/pull/40287#issuecomment-1459329942 @hvanhovell Do we still need this change ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn closed pull request #37880: [SPARK-39399] [CORE] [K8S]: Fix proxy-user authentication for Spark on k8s in cluster deploy mode
yaooqinn closed pull request #37880: [SPARK-39399] [CORE] [K8S]: Fix proxy-user authentication for Spark on k8s in cluster deploy mode URL: https://github.com/apache/spark/pull/37880 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon opened a new pull request, #40325: [SPARK-42707][CONNECT][DOCS] Update developer documentation about API stability warning
HyukjinKwon opened a new pull request, #40325: URL: https://github.com/apache/spark/pull/40325 ### What changes were proposed in this pull request? This PR updates the developer documentation by removing the warnings about API compatibility. ### Why are the changes needed? We actually are going to keep the compatibility there. Such warnings cause a misconception that we can just break the compatibility there. ### Does this PR introduce _any_ user-facing change? No, it's developer-only docs. ### How was this patch tested? Linters in CI should test it out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command
zhengruifeng commented on PR #40323: URL: https://github.com/apache/spark/pull/40323#issuecomment-1459254286 thank you all, merged into master/branch-3.4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command
LuciferYang commented on PR #40323: URL: https://github.com/apache/spark/pull/40323#issuecomment-1459252795 > @LuciferYang This PR fix it in the connect planner, so should also works for the Scala Client. OK, got it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng closed pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command
zhengruifeng closed pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command URL: https://github.com/apache/spark/pull/40323 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command
zhengruifeng commented on PR #40323: URL: https://github.com/apache/spark/pull/40323#issuecomment-1459251014 @LuciferYang This PR fix it in the connect planner, so should also works for the Scala Client. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command
LuciferYang commented on PR #40323: URL: https://github.com/apache/spark/pull/40323#issuecomment-1459249689 Is there a chance to add a similar case in `ClientE2ETestSuite`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on pull request #40314: [SPARK-42698][CORE] SparkSubmit should pass exitCode to AM side
AngersZh commented on PR #40314: URL: https://github.com/apache/spark/pull/40314#issuecomment-1459223471 > Hi, @AngersZh . > This PR seems to have insufficient information. Could you provide more details about how to validate this in what environment? We run a client mode SparkSubmit job and throw below exception ``` 23/03/07 18:34:50 INFO YarnClientSchedulerBackend: Shutting down all executors 23/03/07 18:34:50 INFO YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 23/03/07 18:34:50 INFO YarnClientSchedulerBackend: YARN client scheduler backend Stopped 23/03/07 18:34:50 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 23/03/07 18:34:50 INFO BlockManager: BlockManager stopped 23/03/07 18:34:50 INFO BlockManagerMaster: BlockManagerMaster stopped 23/03/07 18:34:50 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 23/03/07 18:34:50 INFO SparkContext: Successfully stopped SparkContext Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or view not found: xxx.xxx; line 1 pos 14; 'GlobalLimit 1 +- 'LocalLimit 1 +- 'Project [*] +- 'UnresolvedRelation [xxx, xxx], [], false at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1(CheckAnalysis.scala:115) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.$anonfun$checkAnalysis$1$adapted(CheckAnalysis.scala:95) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:184) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:183) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:183) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:183) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:183) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:183) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:183) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1(TreeNode.scala:183) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$foreachUp$1$adapted(TreeNode.scala:183) at scala.collection.immutable.List.foreach(List.scala:392) at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:183) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis(CheckAnalysis.scala:95) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis.checkAnalysis$(CheckAnalysis.scala:92) at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:155) at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:178) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:228) at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:175) at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:73) at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:143) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:778) at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:143) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:73) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:71) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:63) at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:98) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:778) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96) at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:621) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:778) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:616) at org.apache.spark.sql.auth.QueryAuthChecker$.main(QueryAuthChecker.scala:33) at org.apache.spark.sql.auth.QueryAuthChecker.main(QueryAuthChecker.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[GitHub] [spark] itholic commented on a diff in pull request #40316: [SPARK-42679][CONNECT] createDataFrame doesn't work with non-nullable schema
itholic commented on code in PR #40316: URL: https://github.com/apache/spark/pull/40316#discussion_r1128906598 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -2876,6 +2876,13 @@ def test_unsupported_io_functions(self): with self.assertRaises(NotImplementedError): getattr(df.write, f)() +def test_inferred_schema(self): Review Comment: Can we enrich the test by also including the test for schema_true and pandas DataFrame?? such as: ```python schema_true = StructType([StructField("id", IntegerType(), True)]) cdf2 = self.connect.createDataFrame([[1]], schema=schema_true) sdf2 = self.spark.createDataFrame([[1]], schema=schema_true) self.assertEqual(cdf2.schema, sdf2.schema) self.assertEqual(cdf2.collect(), sdf2.collect()) pdf1 = cdf1.toPandas() cdf3 = self.connect.createDataFrame(pdf1, cdf1.schema) sdf3 = self.spark.createDataFrame(pdf1, cdf1.schema) self.assertEqual(cdf3.schema, sdf3.schema) self.assertEqual(cdf3.collect(), sdf3.collect()) pdf2 = cdf2.toPandas() cdf4 = self.connect.createDataFrame(pdf2, cdf2.schema) sdf4 = self.spark.createDataFrame(pdf2, cdf2.schema) self.assertEqual(cdf4.schema, sdf4.schema) self.assertEqual(cdf4.collect(), sdf4.collect()) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on a diff in pull request #40316: [SPARK-42679][CONNECT] createDataFrame doesn't work with non-nullable schema
itholic commented on code in PR #40316: URL: https://github.com/apache/spark/pull/40316#discussion_r1128906598 ## python/pyspark/sql/tests/connect/test_connect_basic.py: ## @@ -2876,6 +2876,13 @@ def test_unsupported_io_functions(self): with self.assertRaises(NotImplementedError): getattr(df.write, f)() +def test_inferred_schema(self): Review Comment: Can we enrich the test by also including the test for schema_true and pandas DataFrame?? such as: ``` schema_true = StructType([StructField("id", IntegerType(), True)]) cdf2 = self.connect.createDataFrame([[1]], schema=schema_true) sdf2 = self.spark.createDataFrame([[1]], schema=schema_true) self.assertEqual(cdf2.schema, sdf2.schema) self.assertEqual(cdf2.collect(), sdf2.collect()) pdf1 = cdf1.toPandas() cdf3 = self.connect.createDataFrame(pdf1, cdf1.schema) sdf3 = self.spark.createDataFrame(pdf1, cdf1.schema) self.assertEqual(cdf3.schema, sdf3.schema) self.assertEqual(cdf3.collect(), sdf3.collect()) pdf2 = cdf2.toPandas() cdf4 = self.connect.createDataFrame(pdf2, cdf2.schema) sdf4 = self.spark.createDataFrame(pdf2, cdf2.schema) self.assertEqual(cdf4.schema, sdf4.schema) self.assertEqual(cdf4.collect(), sdf4.collect()) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] allanf-db opened a new pull request, #40324: [WIP][SPARK-42496][CONNECT][DOCS] Adding Spark Connect to the Spark 3.4 documentation
allanf-db opened a new pull request, #40324: URL: https://github.com/apache/spark/pull/40324 ### What changes were proposed in this pull request? Adding a Spark Connect overview page to the Spark 3.4 documentation. ### Why are the changes needed? The first version of Spark Connect is released as part of Spark 3.4.0 and this adds an overview for it to the documentation. ### Does this PR introduce _any_ user-facing change? Yes, the user facing documentation is updated. ### How was this patch tested? Built the doc website locally and tested the pages. SKIP_SCALADOC=1 SKIP_RDOC=1 bundle exec jekyll build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a diff in pull request #40314: [SPARK-42698][CORE] SparkSubmit should pass exitCode to AM side
AngersZh commented on code in PR #40314: URL: https://github.com/apache/spark/pull/40314#discussion_r1128902772 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -1005,17 +1005,20 @@ private[spark] class SparkSubmit extends Logging { e } +var exitCode = 0 + try { app.start(childArgs.toArray, sparkConf) } catch { case t: Throwable => +exitCode = -1 throw findCause(t) } finally { if (args.master.startsWith("k8s") && !isShell(args.primaryResource) && !isSqlShell(args.mainClass) && !isThriftServer(args.mainClass) && !isConnectServer(args.mainClass)) { try { - SparkContext.getActive.foreach(_.stop()) + SparkContext.getActive.foreach(_.stop(exitCode)) Review Comment: I don't know too much about K8S scheduler, but for yarn client mode we also need to keep a same exit code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a diff in pull request #40314: [SPARK-42698][CORE] SparkSubmit should pass exitCode to AM side
AngersZh commented on code in PR #40314: URL: https://github.com/apache/spark/pull/40314#discussion_r1128899260 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -1005,17 +1005,20 @@ private[spark] class SparkSubmit extends Logging { e } +var exitCode = 0 + try { app.start(childArgs.toArray, sparkConf) } catch { case t: Throwable => +exitCode = -1 throw findCause(t) } finally { if (args.master.startsWith("k8s") && !isShell(args.primaryResource) && !isSqlShell(args.mainClass) && !isThriftServer(args.mainClass) && !isConnectServer(args.mainClass)) { try { - SparkContext.getActive.foreach(_.stop()) + SparkContext.getActive.foreach(_.stop(exitCode)) Review Comment: Code in spark-3.1.2 is ``` if (!isShell(args.primaryResource) && !isSqlShell(args.mainClass) && !isThriftServer(args.mainClass)) { ``` Seems https://github.com/apache/spark/pull/33403 change the behavior -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin commented on pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command
ueshin commented on PR #40323: URL: https://github.com/apache/spark/pull/40323#issuecomment-1459184767 > Is there a similar case on Scala connect client ? I haven't tried Scala client, but yes, it would happen, and this will fix both. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] panbingkun commented on pull request #40316: [SPARK-42679][CONNECT] createDataFrame doesn't work with non-nullable schema
panbingkun commented on PR #40316: URL: https://github.com/apache/spark/pull/40316#issuecomment-1459182974 cc @itholic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AngersZhuuuu commented on a diff in pull request #40315: [SPARK-42699][CONNECT] SparkConnectServer should make client and AM same exit code
AngersZh commented on code in PR #40315: URL: https://github.com/apache/spark/pull/40315#discussion_r112640 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -736,13 +736,15 @@ class SparkSession private( } // scalastyle:on + def stop(): Unit = stop(0) + /** * Stop the underlying `SparkContext`. * * @since 2.0.0 Review Comment: How about current? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yaooqinn commented on a diff in pull request #40313: [SPARK-42697][WEBUI] Fix /api/v1/applications to return total uptime instead of 0 for the duration field
yaooqinn commented on code in PR #40313: URL: https://github.com/apache/spark/pull/40313#discussion_r1128885696 ## core/src/main/scala/org/apache/spark/ui/SparkUI.scala: ## @@ -167,7 +167,7 @@ private[spark] class SparkUI private ( attemptId = None, startTime = new Date(startTime), endTime = new Date(-1), -duration = 0, +duration = System.currentTimeMillis() - startTime, Review Comment: This is SparkUI only, not used by HistoryServer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jerqi commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored
jerqi commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1459167441 > @jerqi Agree that we should have a way to specify locality preference for disaggregated shuffle implementations to spark scheduler - so that shuffle tasks are closer to the data. > > @otterc is relooking at [SPARK-25299](https://issues.apache.org/jira/browse/SPARK-25299), given the platform changes that have gone in since it was proposed - and propose changes in the upcoming months. We can add this to that discussion/jira. > > Hope that sounds fine. For now, as @pan3793 suggested, turning off reduce locality for Uniffle should remove the need for the code patch (once this PR is merged) - if there are gaps, we can address those specifically to enforce the behavior of `spark.shuffle.reduceLocality.enabled`. There is some gaps. If we store data in the DFS or in disaggregated shuffle cluster. We should return `Nil` in the `ShuffledRowRDD#getPreferredLocations`. For `CoalescedPartitionSpec`, `spark.shuffle.reduceLocality.enabled` will help solve this problem. For `PartialReducerPartitionSpec` and `PartialMapperPartitionSpec`, `spark.shuffle.reduceLocality.enabled` can't help us. Because we call the method `MapOutpuTracker#getMapLocation`. Could we modify this method in this pr like? ``` def getMapLocation( dep: ShuffleDependency[_, _, _], startMapIndex: Int, endMapIndex: Int): Seq[String] = { val shuffleStatus = shuffleStatuses.get(dep.shuffleId).orNull if (shuffleStatus != null || shuffleLocalityEnabled) { shuffleStatus.withMapStatuses { statuses => if (startMapIndex < endMapIndex && (startMapIndex >= 0 && endMapIndex <= statuses.length)) { val statusesPicked = statuses.slice(startMapIndex, endMapIndex).filter(_ != null) statusesPicked.map(_.location.host).toSeq } else { Nil } } } else { Nil } } ``` Or we should raise an another pr to fix this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command
LuciferYang commented on PR #40323: URL: https://github.com/apache/spark/pull/40323#issuecomment-1459158853 Is there a similar case on Scala connect client ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] LuciferYang commented on pull request #40317: [SPARK-42700][BUILD] Add `h2` as test dependency of connect-server module
LuciferYang commented on PR #40317: URL: https://github.com/apache/spark/pull/40317#issuecomment-1459157408 re-triggered GA -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jerqi commented on pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored
jerqi commented on PR #40307: URL: https://github.com/apache/spark/pull/40307#issuecomment-1459156913 > > @jerqi locality may still have benefits when RSS works in hybrid deployments, besides, there is a dedicated configuration for that `spark.shuffle.reduceLocality.enabled` > > `ShuffledRdd#getPreferredLocations` will call the method `MapOutputTracker#getPreferredLocationsForShuffle` and `MapOutputTracker#getMapLocation`. For `MapOutpuTracker#getMapLocation`, `spark.shuffle.reduceLocality.enabled` is useless. My mistake. I means that `ShuffledRowRdd#getPreferredLocations` instead of `ShuffledRdd#getPreferredLocations` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40315: [SPARK-42699][CONNECT] SparkConnectServer should make client and AM same exit code
HyukjinKwon commented on code in PR #40315: URL: https://github.com/apache/spark/pull/40315#discussion_r1128823249 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -736,13 +736,15 @@ class SparkSession private( } // scalastyle:on + def stop(): Unit = stop(0) + /** * Stop the underlying `SparkContext`. * * @since 2.0.0 Review Comment: You should have the new docs for the one for `exitCode` with new `@since` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40315: [SPARK-42699][CONNECT] SparkConnectServer should make client and AM same exit code
HyukjinKwon commented on code in PR #40315: URL: https://github.com/apache/spark/pull/40315#discussion_r1128822955 ## sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala: ## @@ -736,13 +736,15 @@ class SparkSession private( } // scalastyle:on + def stop(): Unit = stop(0) + /** * Stop the underlying `SparkContext`. * * @since 2.0.0 */ - def stop(): Unit = { -sparkContext.stop() + def stop(exitCode: Int = 0): Unit = { Review Comment: Can you remove the default value here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on pull request #40282: [SPARK-42672][PYTHON][DOCS] Document error class list
itholic commented on PR #40282: URL: https://github.com/apache/spark/pull/40282#issuecomment-1459096241 Just created ticket for SQL side: https://issues.apache.org/jira/browse/SPARK-42706. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mridulm commented on a diff in pull request #40307: [DRAFT][SPARK-42689][CORE][SHUFFLE]: Allow ShuffleDriverComponent to declare if shuffle data is reliably stored
mridulm commented on code in PR #40307: URL: https://github.com/apache/spark/pull/40307#discussion_r1128807716 ## core/src/test/scala/org/apache/spark/scheduler/SparkListenerSuite.scala: ## @@ -456,7 +456,9 @@ class SparkListenerSuite extends SparkFunSuite with LocalSparkContext with Match val conf = new SparkConf().setMaster("local").setAppName("test") .set(EXTRA_LISTENERS, listeners.map(_.getName)) sc = new SparkContext(conf) -sc.listenerBus.listeners.asScala.count(_.isInstanceOf[BasicJobCounter]) should be (1) +// sc.listenerBus.listeners.asScala.count(_.isInstanceOf[BasicJobCounter]) should be (1) + assert(sc.listenerBus.listeners.asScala.count(_.isInstanceOf[BasicJobCounter]) == 1, + s"listeners = ${sc.listenerBus.listeners.asScala.mkString(", ")}") Review Comment: This change, and the change to LogUrlsStaqndaloneSuite was to debug why these assertions failed. But looks like it is just flakey test probably (on my local box) - so removing them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ivoson commented on a diff in pull request #40286: [SPARK-42577][CORE] Add max attempts limitation for stages to avoid potential infinite retry
ivoson commented on code in PR #40286: URL: https://github.com/apache/spark/pull/40286#discussion_r1128798692 ## core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala: ## @@ -4572,6 +4572,48 @@ class DAGSchedulerSuite extends SparkFunSuite with TempLocalSparkContext with Ti } } + test("SPARK-42577: fail the job if a shuffle map stage attempts beyond the limitation") { +setupStageAbortTest(sc) +doAnswer(_ => 1).when(scheduler).maxStageAttempts Review Comment: Thanks, will make the change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #40310: [SPARK-42022][CONNECT][PYTHON] Fix createDataFrame to autogenerate missing column names
HyukjinKwon closed pull request #40310: [SPARK-42022][CONNECT][PYTHON] Fix createDataFrame to autogenerate missing column names URL: https://github.com/apache/spark/pull/40310 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on pull request #40310: [SPARK-42022][CONNECT][PYTHON] Fix createDataFrame to autogenerate missing column names
HyukjinKwon commented on PR #40310: URL: https://github.com/apache/spark/pull/40310#issuecomment-1459084648 Merged to master and branch-3.4. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] github-actions[bot] commented on pull request #38740: [SQL] Add product encoders for local classes
github-actions[bot] commented on PR #38740: URL: https://github.com/apache/spark/pull/38740#issuecomment-1459074618 We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ueshin opened a new pull request, #40323: [SPARK-42705][CONNECT] Fix spark.sql to return values from the command
ueshin opened a new pull request, #40323: URL: https://github.com/apache/spark/pull/40323 ### What changes were proposed in this pull request? Fixes `spark.sql` to return values from the command. ### Why are the changes needed? Currently `spark.sql` doesn't return the result from the commands. ```py >>> spark.sql("show functions").show() ++ |function| ++ ++ ``` ### Does this PR introduce _any_ user-facing change? `spark.sql` with commands will return the values. ### How was this patch tested? Added a test. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rithwik-db opened a new pull request, #40322: Added small fix
rithwik-db opened a new pull request, #40322: URL: https://github.com/apache/spark/pull/40322 ### What changes were proposed in this pull request? I added a better way to show the error instead of having it be confusing for the reader. ### Why are the changes needed? User experience. ### Does this PR introduce _any_ user-facing change? Just the error that will be shown to the user. ### How was this patch tested? Tested it out locally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic closed pull request #40288: [SPARK-42496][CONNECT][DOCS] Introduction Spark Connect at main page.
itholic closed pull request #40288: [SPARK-42496][CONNECT][DOCS] Introduction Spark Connect at main page. URL: https://github.com/apache/spark/pull/40288 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] itholic commented on pull request #40288: [SPARK-42496][CONNECT][DOCS] Introduction Spark Connect at main page.
itholic commented on PR #40288: URL: https://github.com/apache/spark/pull/40288#issuecomment-1459047783 Let me close this for now, since the contents in this PR will be included in the future Spark Connect documents soon. cc @allanf-db FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ryan-johnson-databricks commented on pull request #40321: [SPARK-42704] SubqueryAlias propagates metadata columns that child outputs
ryan-johnson-databricks commented on PR #40321: URL: https://github.com/apache/spark/pull/40321#issuecomment-1459045889 Something went wrong with [Run spark on kubernetes integration test](https://github.com/ryan-johnson-databricks/spark/actions/runs/4358877500/jobs/7620022040): ``` [info] *** Test still running after 2 minutes, 57 seconds: suite name: KubernetesSuite, test name: Test decommissioning with dynamic allocation & shuffle cleanups. [info] - Test decommissioning with dynamic allocation & shuffle cleanups (3 minutes, 3 seconds) [info] - Test decommissioning timeouts (1 minute) [info] - SPARK-37576: Rolling decommissioning (1 minute, 11 seconds) [info] org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite *** ABORTED *** (25 minutes, 32 seconds) [info] io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://192.168.49.2:8443/api/v1/namespaces. Message: object is being deleted: namespaces "spark-6bff7607e9884740a4bac53b1fb655ae" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=namespaces, name=spark-6bff7607e9884740a4bac53b1fb655ae, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=object is being deleted: namespaces "spark-6bff7607e9884740a4bac53b1fb655ae" already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}). [info] at io.fabric8.kubernetes.client.KubernetesClientException.copyAsCause(KubernetesClientException.java:238) [info] at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:538) ... [info] at io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:42) [info] at org.apache.spark.deploy.k8s.integrationtest.KubernetesTestComponents.createNamespace(KubernetesTestComponents.scala:51) [info] at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.setUpTest(KubernetesSuite.scala:202) ... [info] at org.apache.spark.deploy.k8s.integrationtest.KubernetesSuite.runTest(KubernetesSuite.scala:45) ... [info] Cause: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://192.168.49.2:8443/api/v1/namespaces. Message: object is being deleted: namespaces "spark-6bff7607e9884740a4bac53b1fb655ae" already exists. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=null, kind=namespaces, name=spark-6bff7607e9884740a4bac53b1fb655ae, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=object is being deleted: namespaces "spark-6bff7607e9884740a4bac53b1fb655ae" already exists, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=AlreadyExists, status=Failure, additionalProperties={}). ``` (not sure how that could be related to this PR?) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] eric-maynard closed pull request #39519: [SPARK-41995][SQL] Accept non-foldable expressions in schema_of_json
eric-maynard closed pull request #39519: [SPARK-41995][SQL] Accept non-foldable expressions in schema_of_json URL: https://github.com/apache/spark/pull/39519 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] eric-maynard commented on pull request #39519: [SPARK-41995][SQL] Accept non-foldable expressions in schema_of_json
eric-maynard commented on PR #39519: URL: https://github.com/apache/spark/pull/39519#issuecomment-1458985753 Closing for inactivity -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #40314: [SPARK-42698][CORE] SparkSubmit should pass exitCode to AM side
dongjoon-hyun commented on code in PR #40314: URL: https://github.com/apache/spark/pull/40314#discussion_r1128691347 ## core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala: ## @@ -1005,17 +1005,20 @@ private[spark] class SparkSubmit extends Logging { e } +var exitCode = 0 + try { app.start(childArgs.toArray, sparkConf) } catch { case t: Throwable => +exitCode = -1 throw findCause(t) } finally { if (args.master.startsWith("k8s") && !isShell(args.primaryResource) && !isSqlShell(args.mainClass) && !isThriftServer(args.mainClass) && !isConnectServer(args.mainClass)) { try { - SparkContext.getActive.foreach(_.stop()) + SparkContext.getActive.foreach(_.stop(exitCode)) Review Comment: I don't think this is related to YARN AM because this is guarded by `if (args.master.startsWith("k8s")`. Is this K8s patch instead of YARN AM? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] vitaliili-db commented on pull request #40295: [SPARK-42681] Relax ordering constraint for ALTER TABLE ADD|REPLACE column options
vitaliili-db commented on PR #40295: URL: https://github.com/apache/spark/pull/40295#issuecomment-1458880038 @gengliangwang great catch, yes, we should follow standard. Renamed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR closed pull request #40215: [SPARK-42591][SS][DOCS] Add examples of unblocked workloads after SPARK-42376
HeartSaVioR closed pull request #40215: [SPARK-42591][SS][DOCS] Add examples of unblocked workloads after SPARK-42376 URL: https://github.com/apache/spark/pull/40215 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HeartSaVioR commented on pull request #40215: [SPARK-42591][SS][DOCS] Add examples of unblocked workloads after SPARK-42376
HeartSaVioR commented on PR #40215: URL: https://github.com/apache/spark/pull/40215#issuecomment-1458815185 Thanks for reviewing! Merging to master. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org