[jira] [Commented] (SPARK-42849) Session variables
[ https://issues.apache.org/jira/browse/SPARK-42849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702069#comment-17702069 ] Apache Spark commented on SPARK-42849: -- User 'srielau' has created a pull request for this issue: https://github.com/apache/spark/pull/40474 > Session variables > - > > Key: SPARK-42849 > URL: https://issues.apache.org/jira/browse/SPARK-42849 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Serge Rielau >Priority: Major > > Provide a type-safe, engine controlled session variable: > CREATE [ OR REPLACE } TEMPORARY VARIABLE [ IF NOT EXISTS ]var_name [ type ][ > DEFAULT expresion ] > SET { variable = expression | ( variable [, ...] ) = ( subquery | expression > [, ...] ) > DROP VARIABLE [ IF EXISTS ]variable_name -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42849) Session variables
[ https://issues.apache.org/jira/browse/SPARK-42849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42849: Assignee: Apache Spark > Session variables > - > > Key: SPARK-42849 > URL: https://issues.apache.org/jira/browse/SPARK-42849 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Serge Rielau >Assignee: Apache Spark >Priority: Major > > Provide a type-safe, engine controlled session variable: > CREATE [ OR REPLACE } TEMPORARY VARIABLE [ IF NOT EXISTS ]var_name [ type ][ > DEFAULT expresion ] > SET { variable = expression | ( variable [, ...] ) = ( subquery | expression > [, ...] ) > DROP VARIABLE [ IF EXISTS ]variable_name -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42849) Session variables
[ https://issues.apache.org/jira/browse/SPARK-42849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702068#comment-17702068 ] Apache Spark commented on SPARK-42849: -- User 'srielau' has created a pull request for this issue: https://github.com/apache/spark/pull/40474 > Session variables > - > > Key: SPARK-42849 > URL: https://issues.apache.org/jira/browse/SPARK-42849 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Serge Rielau >Priority: Major > > Provide a type-safe, engine controlled session variable: > CREATE [ OR REPLACE } TEMPORARY VARIABLE [ IF NOT EXISTS ]var_name [ type ][ > DEFAULT expresion ] > SET { variable = expression | ( variable [, ...] ) = ( subquery | expression > [, ...] ) > DROP VARIABLE [ IF EXISTS ]variable_name -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42849) Session variables
[ https://issues.apache.org/jira/browse/SPARK-42849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42849: Assignee: (was: Apache Spark) > Session variables > - > > Key: SPARK-42849 > URL: https://issues.apache.org/jira/browse/SPARK-42849 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.5.0 >Reporter: Serge Rielau >Priority: Major > > Provide a type-safe, engine controlled session variable: > CREATE [ OR REPLACE } TEMPORARY VARIABLE [ IF NOT EXISTS ]var_name [ type ][ > DEFAULT expresion ] > SET { variable = expression | ( variable [, ...] ) = ( subquery | expression > [, ...] ) > DROP VARIABLE [ IF EXISTS ]variable_name -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42834) Divided by zero occurs in PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse
[ https://issues.apache.org/jira/browse/SPARK-42834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Ying resolved SPARK-42834. - Resolution: Won't Do > Divided by zero occurs in > PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse > > > Key: SPARK-42834 > URL: https://issues.apache.org/jira/browse/SPARK-42834 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Li Ying >Priority: Major > > {color:#22}Sometimes when run a SQL job with push based shuffle, > exception occurs as below. It seems that there’s no element in the bitmaps > which stores merge chunk meta. {color} > {color:#22}Is it a bug that we should not createChunkBlockInfos when > bitmaps is empty or the bitmaps should never be empty here ?{color} > > {code:java} > Caused by: java.lang.ArithmeticException: / by zero > at > org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:117) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:980) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84) > {code} > related code: > {code:java} > def createChunkBlockInfosFromMetaResponse( > shuffleId: Int, > shuffleMergeId: Int, > reduceId: Int, > blockSize: Long, > bitmaps: Array[RoaringBitmap]): ArrayBuffer[(BlockId, Long, Int)] = { > val approxChunkSize = blockSize / bitmaps.length > val blocksToFetch = new ArrayBuffer[(BlockId, Long, Int)]() > for (i <- bitmaps.indices) { > val blockChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, > reduceId, i) > chunksMetaMap.put(blockChunkId, bitmaps(i)) > logDebug(s"adding block chunk $blockChunkId of size $approxChunkSize") > blocksToFetch += ((blockChunkId, approxChunkSize, SHUFFLE_PUSH_MAP_ID)) > } > blocksToFetch > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-42834) Divided by zero occurs in PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse
[ https://issues.apache.org/jira/browse/SPARK-42834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Ying closed SPARK-42834. --- > Divided by zero occurs in > PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse > > > Key: SPARK-42834 > URL: https://issues.apache.org/jira/browse/SPARK-42834 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Li Ying >Priority: Major > > {color:#22}Sometimes when run a SQL job with push based shuffle, > exception occurs as below. It seems that there’s no element in the bitmaps > which stores merge chunk meta. {color} > {color:#22}Is it a bug that we should not createChunkBlockInfos when > bitmaps is empty or the bitmaps should never be empty here ?{color} > > {code:java} > Caused by: java.lang.ArithmeticException: / by zero > at > org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:117) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:980) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84) > {code} > related code: > {code:java} > def createChunkBlockInfosFromMetaResponse( > shuffleId: Int, > shuffleMergeId: Int, > reduceId: Int, > blockSize: Long, > bitmaps: Array[RoaringBitmap]): ArrayBuffer[(BlockId, Long, Int)] = { > val approxChunkSize = blockSize / bitmaps.length > val blocksToFetch = new ArrayBuffer[(BlockId, Long, Int)]() > for (i <- bitmaps.indices) { > val blockChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, > reduceId, i) > chunksMetaMap.put(blockChunkId, bitmaps(i)) > logDebug(s"adding block chunk $blockChunkId of size $approxChunkSize") > blocksToFetch += ((blockChunkId, approxChunkSize, SHUFFLE_PUSH_MAP_ID)) > } > blocksToFetch > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42834) Divided by zero occurs in PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse
[ https://issues.apache.org/jira/browse/SPARK-42834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702050#comment-17702050 ] Li Ying commented on SPARK-42834: - [~csingh] Thanks for help. I would take this fix :) > Divided by zero occurs in > PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse > > > Key: SPARK-42834 > URL: https://issues.apache.org/jira/browse/SPARK-42834 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Li Ying >Priority: Major > > {color:#22}Sometimes when run a SQL job with push based shuffle, > exception occurs as below. It seems that there’s no element in the bitmaps > which stores merge chunk meta. {color} > {color:#22}Is it a bug that we should not createChunkBlockInfos when > bitmaps is empty or the bitmaps should never be empty here ?{color} > > {code:java} > Caused by: java.lang.ArithmeticException: / by zero > at > org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:117) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:980) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84) > {code} > related code: > {code:java} > def createChunkBlockInfosFromMetaResponse( > shuffleId: Int, > shuffleMergeId: Int, > reduceId: Int, > blockSize: Long, > bitmaps: Array[RoaringBitmap]): ArrayBuffer[(BlockId, Long, Int)] = { > val approxChunkSize = blockSize / bitmaps.length > val blocksToFetch = new ArrayBuffer[(BlockId, Long, Int)]() > for (i <- bitmaps.indices) { > val blockChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, > reduceId, i) > chunksMetaMap.put(blockChunkId, bitmaps(i)) > logDebug(s"adding block chunk $blockChunkId of size $approxChunkSize") > blocksToFetch += ((blockChunkId, approxChunkSize, SHUFFLE_PUSH_MAP_ID)) > } > blocksToFetch > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42803) Use getParameterCount function instead of getParameterTypes.length
[ https://issues.apache.org/jira/browse/SPARK-42803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-42803. -- Fix Version/s: 3.5.0 (was: 3.3.2) Assignee: Narek Karapetian Resolution: Fixed Resolved by https://github.com/apache/spark/pull/40422 > Use getParameterCount function instead of getParameterTypes.length > -- > > Key: SPARK-42803 > URL: https://issues.apache.org/jira/browse/SPARK-42803 > Project: Spark > Issue Type: Improvement > Components: ML, Spark Core, SQL >Affects Versions: 3.3.3 >Reporter: Narek Karapetian >Assignee: Narek Karapetian >Priority: Minor > Fix For: 3.5.0 > > > Since jdk1.8 there is an additional function in reflection API > {{{}getParameterCount{}}}, it is better to use that function instead of > {{getParameterTypes.length}} because {{getParameterTypes}} function makes a > copy of the parameter types array every invocation. > This will help to avoid redundant arrays creation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression
[ https://issues.apache.org/jira/browse/SPARK-42851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42851: Assignee: Apache Spark > EquivalentExpressions methods need to be consistently guarded by > supportedExpression > > > Key: SPARK-42851 > URL: https://issues.apache.org/jira/browse/SPARK-42851 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2, 3.4.0 >Reporter: Kris Mok >Assignee: Apache Spark >Priority: Major > > SPARK-41468 tried to fix a bug but introduced a new regression. Its change to > {{EquivalentExpressions}} added a {{supportedExpression()}} guard to the > {{addExprTree()}} and {{getExprState()}} methods, but didn't add the same > guard to the other "add" entry point -- {{addExpr()}}. > As such, uses that add single expressions to CSE via {{addExpr()}} may > succeed, but upon retrieval via {{getExprState()}} it'd inconsistently get a > {{None}} due to failing the guard. > We need to make sure the "add" and "get" methods are consistent. It could be > done by one of: > 1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or > 2. Removing the guard from {{getExprState()}}, relying solely on the guard on > the "add" path to make sure only intended state is added. > (or other alternative refactorings to fuse the guard into various methods to > make it more efficient) > There are pros and cons to the two directions above, because {{addExpr()}} > used to allow (potentially incorrect) more expressions to get CSE'd, making > it more restrictive may cause performance regressions (for the cases that > happened to work). > Example: > {code:sql} > select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) > from range(2) > {code} > Running this query on Spark 3.2 branch returns the correct value: > {code} > scala> spark.sql("select max(transform(array(id), x -> x)), > max(transform(array(id), x -> x)) from range(2)").collect > res0: Array[org.apache.spark.sql.Row] = > Array([WrappedArray(1),WrappedArray(1)]) > {code} > Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was > (potentially unsafely) recognized by {{addExpr()}} as a common subexpression, > and {{getExprState()}} doesn't do extra guarding, so during physical > planning, in {{PhysicalAggregation}} this expression gets CSE'd in both the > aggregation expression list and the result expressions list. > {code} > AdaptiveSparkPlan isFinalPlan=false > +- SortAggregate(key=[], functions=[max(transform(array(id#0L), > lambdafunction(lambda x#1L, lambda x#1L, false)))]) >+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11] > +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), > lambdafunction(lambda x#1L, lambda x#1L, false)))]) > +- Range (0, 2, step=1, splits=16) > {code} > Running the same query on current master triggers an error when binding the > result expression to the aggregate expression in the Aggregate operators (for > a WSCG-enabled operator like {{HashAggregateExec}}, the same error would show > up during codegen): > {code} > ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 > (TID 16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): > java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), > lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in > [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, > false)))#3] > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode
[jira] [Assigned] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression
[ https://issues.apache.org/jira/browse/SPARK-42851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42851: Assignee: (was: Apache Spark) > EquivalentExpressions methods need to be consistently guarded by > supportedExpression > > > Key: SPARK-42851 > URL: https://issues.apache.org/jira/browse/SPARK-42851 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2, 3.4.0 >Reporter: Kris Mok >Priority: Major > > SPARK-41468 tried to fix a bug but introduced a new regression. Its change to > {{EquivalentExpressions}} added a {{supportedExpression()}} guard to the > {{addExprTree()}} and {{getExprState()}} methods, but didn't add the same > guard to the other "add" entry point -- {{addExpr()}}. > As such, uses that add single expressions to CSE via {{addExpr()}} may > succeed, but upon retrieval via {{getExprState()}} it'd inconsistently get a > {{None}} due to failing the guard. > We need to make sure the "add" and "get" methods are consistent. It could be > done by one of: > 1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or > 2. Removing the guard from {{getExprState()}}, relying solely on the guard on > the "add" path to make sure only intended state is added. > (or other alternative refactorings to fuse the guard into various methods to > make it more efficient) > There are pros and cons to the two directions above, because {{addExpr()}} > used to allow (potentially incorrect) more expressions to get CSE'd, making > it more restrictive may cause performance regressions (for the cases that > happened to work). > Example: > {code:sql} > select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) > from range(2) > {code} > Running this query on Spark 3.2 branch returns the correct value: > {code} > scala> spark.sql("select max(transform(array(id), x -> x)), > max(transform(array(id), x -> x)) from range(2)").collect > res0: Array[org.apache.spark.sql.Row] = > Array([WrappedArray(1),WrappedArray(1)]) > {code} > Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was > (potentially unsafely) recognized by {{addExpr()}} as a common subexpression, > and {{getExprState()}} doesn't do extra guarding, so during physical > planning, in {{PhysicalAggregation}} this expression gets CSE'd in both the > aggregation expression list and the result expressions list. > {code} > AdaptiveSparkPlan isFinalPlan=false > +- SortAggregate(key=[], functions=[max(transform(array(id#0L), > lambdafunction(lambda x#1L, lambda x#1L, false)))]) >+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11] > +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), > lambdafunction(lambda x#1L, lambda x#1L, false)))]) > +- Range (0, 2, step=1, splits=16) > {code} > Running the same query on current master triggers an error when binding the > result expression to the aggregate expression in the Aggregate operators (for > a WSCG-enabled operator like {{HashAggregateExec}}, the same error would show > up during codegen): > {code} > ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 > (TID 16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): > java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), > lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in > [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, > false)))#3] > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:532) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:517) > at >
[jira] [Commented] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression
[ https://issues.apache.org/jira/browse/SPARK-42851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702033#comment-17702033 ] Apache Spark commented on SPARK-42851: -- User 'rednaxelafx' has created a pull request for this issue: https://github.com/apache/spark/pull/40473 > EquivalentExpressions methods need to be consistently guarded by > supportedExpression > > > Key: SPARK-42851 > URL: https://issues.apache.org/jira/browse/SPARK-42851 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.2, 3.4.0 >Reporter: Kris Mok >Priority: Major > > SPARK-41468 tried to fix a bug but introduced a new regression. Its change to > {{EquivalentExpressions}} added a {{supportedExpression()}} guard to the > {{addExprTree()}} and {{getExprState()}} methods, but didn't add the same > guard to the other "add" entry point -- {{addExpr()}}. > As such, uses that add single expressions to CSE via {{addExpr()}} may > succeed, but upon retrieval via {{getExprState()}} it'd inconsistently get a > {{None}} due to failing the guard. > We need to make sure the "add" and "get" methods are consistent. It could be > done by one of: > 1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or > 2. Removing the guard from {{getExprState()}}, relying solely on the guard on > the "add" path to make sure only intended state is added. > (or other alternative refactorings to fuse the guard into various methods to > make it more efficient) > There are pros and cons to the two directions above, because {{addExpr()}} > used to allow (potentially incorrect) more expressions to get CSE'd, making > it more restrictive may cause performance regressions (for the cases that > happened to work). > Example: > {code:sql} > select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) > from range(2) > {code} > Running this query on Spark 3.2 branch returns the correct value: > {code} > scala> spark.sql("select max(transform(array(id), x -> x)), > max(transform(array(id), x -> x)) from range(2)").collect > res0: Array[org.apache.spark.sql.Row] = > Array([WrappedArray(1),WrappedArray(1)]) > {code} > Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was > (potentially unsafely) recognized by {{addExpr()}} as a common subexpression, > and {{getExprState()}} doesn't do extra guarding, so during physical > planning, in {{PhysicalAggregation}} this expression gets CSE'd in both the > aggregation expression list and the result expressions list. > {code} > AdaptiveSparkPlan isFinalPlan=false > +- SortAggregate(key=[], functions=[max(transform(array(id#0L), > lambdafunction(lambda x#1L, lambda x#1L, false)))]) >+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11] > +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), > lambdafunction(lambda x#1L, lambda x#1L, false)))]) > +- Range (0, 2, step=1, splits=16) > {code} > Running the same query on current master triggers an error when binding the > result expression to the aggregate expression in the Aggregate operators (for > a WSCG-enabled operator like {{HashAggregateExec}}, the same error would show > up during codegen): > {code} > ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job > org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in > stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 > (TID 16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): > java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), > lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in > [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, > false)))#3] > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:532)
[jira] [Created] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression
Kris Mok created SPARK-42851: Summary: EquivalentExpressions methods need to be consistently guarded by supportedExpression Key: SPARK-42851 URL: https://issues.apache.org/jira/browse/SPARK-42851 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.2, 3.4.0 Reporter: Kris Mok SPARK-41468 tried to fix a bug but introduced a new regression. Its change to {{EquivalentExpressions}} added a {{supportedExpression()}} guard to the {{addExprTree()}} and {{getExprState()}} methods, but didn't add the same guard to the other "add" entry point -- {{addExpr()}}. As such, uses that add single expressions to CSE via {{addExpr()}} may succeed, but upon retrieval via {{getExprState()}} it'd inconsistently get a {{None}} due to failing the guard. We need to make sure the "add" and "get" methods are consistent. It could be done by one of: 1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or 2. Removing the guard from {{getExprState()}}, relying solely on the guard on the "add" path to make sure only intended state is added. (or other alternative refactorings to fuse the guard into various methods to make it more efficient) There are pros and cons to the two directions above, because {{addExpr()}} used to allow (potentially incorrect) more expressions to get CSE'd, making it more restrictive may cause performance regressions (for the cases that happened to work). Example: {code:sql} select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) from range(2) {code} Running this query on Spark 3.2 branch returns the correct value: {code} scala> spark.sql("select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) from range(2)").collect res0: Array[org.apache.spark.sql.Row] = Array([WrappedArray(1),WrappedArray(1)]) {code} Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was (potentially unsafely) recognized by {{addExpr()}} as a common subexpression, and {{getExprState()}} doesn't do extra guarding, so during physical planning, in {{PhysicalAggregation}} this expression gets CSE'd in both the aggregation expression list and the result expressions list. {code} AdaptiveSparkPlan isFinalPlan=false +- SortAggregate(key=[], functions=[max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, false)))]) +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11] +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, false)))]) +- Range (0, 2, step=1, splits=16) {code} Running the same query on current master triggers an error when binding the result expression to the aggregate expression in the Aggregate operators (for a WSCG-enabled operator like {{HashAggregateExec}}, the same error would show up during codegen): {code} ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, false)))#3] at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249) at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248) at org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:532) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:517) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:456) at org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:73) at org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:94) at scala.coll
[jira] [Assigned] (SPARK-42247) Standardize `returnType` property of UserDefinedFunction
[ https://issues.apache.org/jira/browse/SPARK-42247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42247: Assignee: (was: Apache Spark) > Standardize `returnType` property of UserDefinedFunction > > > Key: SPARK-42247 > URL: https://issues.apache.org/jira/browse/SPARK-42247 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > There are checks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42247) Standardize `returnType` property of UserDefinedFunction
[ https://issues.apache.org/jira/browse/SPARK-42247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702022#comment-17702022 ] Apache Spark commented on SPARK-42247: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40472 > Standardize `returnType` property of UserDefinedFunction > > > Key: SPARK-42247 > URL: https://issues.apache.org/jira/browse/SPARK-42247 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > There are checks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42247) Standardize `returnType` property of UserDefinedFunction
[ https://issues.apache.org/jira/browse/SPARK-42247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42247: Assignee: Apache Spark > Standardize `returnType` property of UserDefinedFunction > > > Key: SPARK-42247 > URL: https://issues.apache.org/jira/browse/SPARK-42247 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > There are checks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42850) Remove duplicated rule CombineFilters in Optimizer
[ https://issues.apache.org/jira/browse/SPARK-42850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42850: Assignee: Apache Spark (was: Gengliang Wang) > Remove duplicated rule CombineFilters in Optimizer > -- > > Key: SPARK-42850 > URL: https://issues.apache.org/jira/browse/SPARK-42850 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.1 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42850) Remove duplicated rule CombineFilters in Optimizer
[ https://issues.apache.org/jira/browse/SPARK-42850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42850: Assignee: Gengliang Wang (was: Apache Spark) > Remove duplicated rule CombineFilters in Optimizer > -- > > Key: SPARK-42850 > URL: https://issues.apache.org/jira/browse/SPARK-42850 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.1 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42850) Remove duplicated rule CombineFilters in Optimizer
[ https://issues.apache.org/jira/browse/SPARK-42850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702021#comment-17702021 ] Apache Spark commented on SPARK-42850: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/40471 > Remove duplicated rule CombineFilters in Optimizer > -- > > Key: SPARK-42850 > URL: https://issues.apache.org/jira/browse/SPARK-42850 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.1 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42850) Remove duplicated rule CombineFilters in Optimizer
Gengliang Wang created SPARK-42850: -- Summary: Remove duplicated rule CombineFilters in Optimizer Key: SPARK-42850 URL: https://issues.apache.org/jira/browse/SPARK-42850 Project: Spark Issue Type: Task Components: SQL Affects Versions: 3.4.1 Reporter: Gengliang Wang Assignee: Gengliang Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41843) Implement SparkSession.udf
[ https://issues.apache.org/jira/browse/SPARK-41843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702006#comment-17702006 ] Apache Spark commented on SPARK-41843: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40470 > Implement SparkSession.udf > -- > > Key: SPARK-41843 > URL: https://issues.apache.org/jira/browse/SPARK-41843 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 2331, in pyspark.sql.connect.functions.call_udf > Failed example: > _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType()) > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType()) > AttributeError: 'SparkSession' object has no attribute 'udf'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41818) Support DataFrameWriter.saveAsTable
[ https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702003#comment-17702003 ] Apache Spark commented on SPARK-41818: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40470 > Support DataFrameWriter.saveAsTable > --- > > Key: SPARK-41818 > URL: https://issues.apache.org/jira/browse/SPARK-41818 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.4.0 > > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto > Failed example: > df.write.saveAsTable("tblA") > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in > > df.write.saveAsTable("tblA") > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 350, in saveAsTable > > self._spark.client.execute_command(self._write.command(self._spark.client)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 459, in execute_command > self._execute(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 547, in _execute > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 623, in _handle_error > raise SparkConnectException(status.message, info.reason) from None > pyspark.sql.connect.client.SparkConnectException: > (java.lang.ClassNotFoundException) .DefaultSource{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41843) Implement SparkSession.udf
[ https://issues.apache.org/jira/browse/SPARK-41843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702004#comment-17702004 ] Apache Spark commented on SPARK-41843: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40470 > Implement SparkSession.udf > -- > > Key: SPARK-41843 > URL: https://issues.apache.org/jira/browse/SPARK-41843 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 2331, in pyspark.sql.connect.functions.call_udf > Failed example: > _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType()) > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType()) > AttributeError: 'SparkSession' object has no attribute 'udf'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41843) Implement SparkSession.udf
[ https://issues.apache.org/jira/browse/SPARK-41843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702005#comment-17702005 ] Apache Spark commented on SPARK-41843: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40470 > Implement SparkSession.udf > -- > > Key: SPARK-41843 > URL: https://issues.apache.org/jira/browse/SPARK-41843 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > Fix For: 3.4.0 > > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", > line 2331, in pyspark.sql.connect.functions.call_udf > Failed example: > _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType()) > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File "", line 1, in > > _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType()) > AttributeError: 'SparkSession' object has no attribute 'udf'{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-41818) Support DataFrameWriter.saveAsTable
[ https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702002#comment-17702002 ] Apache Spark commented on SPARK-41818: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40470 > Support DataFrameWriter.saveAsTable > --- > > Key: SPARK-41818 > URL: https://issues.apache.org/jira/browse/SPARK-41818 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Assignee: Takuya Ueshin >Priority: Major > Fix For: 3.4.0 > > > {code:java} > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto > Failed example: > df.write.saveAsTable("tblA") > Exception raised: > Traceback (most recent call last): > File > "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py", > line 1350, in __run > exec(compile(example.source, filename, "single", > File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in > > df.write.saveAsTable("tblA") > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", > line 350, in saveAsTable > > self._spark.client.execute_command(self._write.command(self._spark.client)) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 459, in execute_command > self._execute(req) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 547, in _execute > self._handle_error(rpc_error) > File > "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", > line 623, in _handle_error > raise SparkConnectException(status.message, info.reason) from None > pyspark.sql.connect.client.SparkConnectException: > (java.lang.ClassNotFoundException) .DefaultSource{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42849) Session variables
Serge Rielau created SPARK-42849: Summary: Session variables Key: SPARK-42849 URL: https://issues.apache.org/jira/browse/SPARK-42849 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 3.5.0 Reporter: Serge Rielau Provide a type-safe, engine controlled session variable: CREATE [ OR REPLACE } TEMPORARY VARIABLE [ IF NOT EXISTS ]var_name [ type ][ DEFAULT expresion ] SET { variable = expression | ( variable [, ...] ) = ( subquery | expression [, ...] ) DROP VARIABLE [ IF EXISTS ]variable_name -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42546) SPARK-42045 is incomplete in supporting ANSI_MODE fro round() and bround()
[ https://issues.apache.org/jira/browse/SPARK-42546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701987#comment-17701987 ] Daniel Davies commented on SPARK-42546: --- Can I take this? > SPARK-42045 is incomplete in supporting ANSI_MODE fro round() and bround() > -- > > Key: SPARK-42546 > URL: https://issues.apache.org/jira/browse/SPARK-42546 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.2, 3.4.0 >Reporter: Serge Rielau >Priority: Major > > under ANSI mode SPARK-42045 added error conditions insetad of silent > overflows for edge cases in round() and bround(). > However it appears this fix works only for the INT data type. Trying it on a > e.g. SMALLINT the function still returns wrong results: > {code:java} > spark-sql> select round(2147483647, -1); > [ARITHMETIC_OVERFLOW] Overflow. If necessary set "spark.sql.ansi.enabled" to > "false" to bypass this error.{code} > {code:java} > spark-sql> select round(127y, -1); > -126 {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42833) Refactor `applyExtensions` in `SparkSession`
[ https://issues.apache.org/jira/browse/SPARK-42833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42833. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40465 [https://github.com/apache/spark/pull/40465] > Refactor `applyExtensions` in `SparkSession` > > > Key: SPARK-42833 > URL: https://issues.apache.org/jira/browse/SPARK-42833 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Minor > Fix For: 3.5.0 > > > Refactor `applyExtensions` in `SparkSession` in order to reduce the > duplicated codes -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42833) Refactor `applyExtensions` in `SparkSession`
[ https://issues.apache.org/jira/browse/SPARK-42833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42833: - Assignee: Kazuyuki Tanimura > Refactor `applyExtensions` in `SparkSession` > > > Key: SPARK-42833 > URL: https://issues.apache.org/jira/browse/SPARK-42833 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Minor > > Refactor `applyExtensions` in `SparkSession` in order to reduce the > duplicated codes -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42779) Allow V2 writes to indicate advisory partition size
[ https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-42779: - Assignee: Anton Okolnychyi > Allow V2 writes to indicate advisory partition size > --- > > Key: SPARK-42779 > URL: https://issues.apache.org/jira/browse/SPARK-42779 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > > Data sources may request a particular distribution and ordering of data for > V2 writes. If AQE is enabled, the default session advisory partition size > (64MB) will be used as guidance. Unfortunately, this default value can still > lead to small files because the written data can be compressed nicely using > columnar file formats. Spark should allow data sources to indicate the > advisory shuffle partition size, just like it lets data sources request a > particular number of partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42779) Allow V2 writes to indicate advisory partition size
[ https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-42779. --- Fix Version/s: 3.5.0 Resolution: Fixed Issue resolved by pull request 40421 [https://github.com/apache/spark/pull/40421] > Allow V2 writes to indicate advisory partition size > --- > > Key: SPARK-42779 > URL: https://issues.apache.org/jira/browse/SPARK-42779 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Anton Okolnychyi >Assignee: Anton Okolnychyi >Priority: Major > Fix For: 3.5.0 > > > Data sources may request a particular distribution and ordering of data for > V2 writes. If AQE is enabled, the default session advisory partition size > (64MB) will be used as guidance. Unfortunately, this default value can still > lead to small files because the written data can be compressed nicely using > columnar file formats. Spark should allow data sources to indicate the > advisory shuffle partition size, just like it lets data sources request a > particular number of partitions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42437) Pyspark catalog.cacheTable allow to specify storage level Connect add support Storagelevel
[ https://issues.apache.org/jira/browse/SPARK-42437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Khalid Mammadov updated SPARK-42437: Target Version/s: (was: 3.5.0) Affects Version/s: 3.4.0 (was: 3.5.0) Summary: Pyspark catalog.cacheTable allow to specify storage level Connect add support Storagelevel (was: Pyspark catalog.cacheTable allow to specify storage level) > Pyspark catalog.cacheTable allow to specify storage level Connect add support > Storagelevel > -- > > Key: SPARK-42437 > URL: https://issues.apache.org/jira/browse/SPARK-42437 > Project: Spark > Issue Type: Improvement > Components: Connect, PySpark >Affects Versions: 3.4.0 >Reporter: Khalid Mammadov >Priority: Major > > Currently PySpark version of catalog.cacheTable function does not support to > specify storage level. This is to add that. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42848) Implement DataFrame.registerTempTable
[ https://issues.apache.org/jira/browse/SPARK-42848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701934#comment-17701934 ] Apache Spark commented on SPARK-42848: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40469 > Implement DataFrame.registerTempTable > - > > Key: SPARK-42848 > URL: https://issues.apache.org/jira/browse/SPARK-42848 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42848) Implement DataFrame.registerTempTable
[ https://issues.apache.org/jira/browse/SPARK-42848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701933#comment-17701933 ] Apache Spark commented on SPARK-42848: -- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/40469 > Implement DataFrame.registerTempTable > - > > Key: SPARK-42848 > URL: https://issues.apache.org/jira/browse/SPARK-42848 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42848) Implement DataFrame.registerTempTable
[ https://issues.apache.org/jira/browse/SPARK-42848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42848: Assignee: Apache Spark > Implement DataFrame.registerTempTable > - > > Key: SPARK-42848 > URL: https://issues.apache.org/jira/browse/SPARK-42848 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42848) Implement DataFrame.registerTempTable
[ https://issues.apache.org/jira/browse/SPARK-42848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42848: Assignee: (was: Apache Spark) > Implement DataFrame.registerTempTable > - > > Key: SPARK-42848 > URL: https://issues.apache.org/jira/browse/SPARK-42848 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42848) Implement DataFrame.registerTempTable
Takuya Ueshin created SPARK-42848: - Summary: Implement DataFrame.registerTempTable Key: SPARK-42848 URL: https://issues.apache.org/jira/browse/SPARK-42848 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-41922) Implement DataFrame `semanticHash`
[ https://issues.apache.org/jira/browse/SPARK-41922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-41922. --- Resolution: Duplicate > Implement DataFrame `semanticHash` > -- > > Key: SPARK-41922 > URL: https://issues.apache.org/jira/browse/SPARK-41922 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Sandeep Singh >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42834) Divided by zero occurs in PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse
[ https://issues.apache.org/jira/browse/SPARK-42834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701908#comment-17701908 ] Chandni Singh commented on SPARK-42834: --- We don't expect the `numChunks` to be zero or `bitmaps` to be empty. There was a bug in 3.2.0 which was fixed with https://issues.apache.org/jira/browse/SPARK-37675 Can you please check if you have this fix? > Divided by zero occurs in > PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse > > > Key: SPARK-42834 > URL: https://issues.apache.org/jira/browse/SPARK-42834 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Li Ying >Priority: Major > > {color:#22}Sometimes when run a SQL job with push based shuffle, > exception occurs as below. It seems that there’s no element in the bitmaps > which stores merge chunk meta. {color} > {color:#22}Is it a bug that we should not createChunkBlockInfos when > bitmaps is empty or the bitmaps should never be empty here ?{color} > > {code:java} > Caused by: java.lang.ArithmeticException: / by zero > at > org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:117) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:980) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84) > {code} > related code: > {code:java} > def createChunkBlockInfosFromMetaResponse( > shuffleId: Int, > shuffleMergeId: Int, > reduceId: Int, > blockSize: Long, > bitmaps: Array[RoaringBitmap]): ArrayBuffer[(BlockId, Long, Int)] = { > val approxChunkSize = blockSize / bitmaps.length > val blocksToFetch = new ArrayBuffer[(BlockId, Long, Int)]() > for (i <- bitmaps.indices) { > val blockChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, > reduceId, i) > chunksMetaMap.put(blockChunkId, bitmaps(i)) > logDebug(s"adding block chunk $blockChunkId of size $approxChunkSize") > blocksToFetch += ((blockChunkId, approxChunkSize, SHUFFLE_PUSH_MAP_ID)) > } > blocksToFetch > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42833) Refactor `applyExtensions` in `SparkSession`
[ https://issues.apache.org/jira/browse/SPARK-42833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701903#comment-17701903 ] Apache Spark commented on SPARK-42833: -- User 'kazuyukitanimura' has created a pull request for this issue: https://github.com/apache/spark/pull/40465 > Refactor `applyExtensions` in `SparkSession` > > > Key: SPARK-42833 > URL: https://issues.apache.org/jira/browse/SPARK-42833 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Priority: Minor > > Refactor `applyExtensions` in `SparkSession` in order to reduce the > duplicated codes -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42760) The partition of result data frame of join is always 1
[ https://issues.apache.org/jira/browse/SPARK-42760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701742#comment-17701742 ] binyang commented on SPARK-42760: - Disabling AQE solved my problem. Thank you! > The partition of result data frame of join is always 1 > -- > > Key: SPARK-42760 > URL: https://issues.apache.org/jira/browse/SPARK-42760 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.3.2 > Environment: standard spark 3.0.3/3.3.2, using in jupyter notebook, > local mode >Reporter: binyang >Priority: Major > > I am using pyspark. The partition of result data frame of join is always 1. > Here is my code from > https://stackoverflow.com/questions/51876281/is-partitioning-retained-after-a-spark-sql-join > > print(spark.version) > def example_shuffle_partitions(data_partitions=10, shuffle_partitions=4): > spark.conf.set("spark.sql.shuffle.partitions", shuffle_partitions) > spark.sql("SET spark.sql.autoBroadcastJoinThreshold=-1") > df1 = spark.range(1, 1000).repartition(data_partitions) > df2 = spark.range(1, 2000).repartition(data_partitions) > df3 = spark.range(1, 3000).repartition(data_partitions) > print("Data partitions is: {}. Shuffle partitions is > {}".format(data_partitions, shuffle_partitions)) > print("Data partitions before join: > {}".format(df1.rdd.getNumPartitions())) > df = (df1.join(df2, df1.id == df2.id) > .join(df3, df1.id == df3.id)) > print("Data partitions after join : {}".format(df.rdd.getNumPartitions())) > example_shuffle_partitions() > > In Spark 3.0.3, it prints out: > 3.0.3 > Data partitions is: 10. Shuffle partitions is 4 > Data partitions before join: 10 > Data partitions after join : 4 > However, it prints out the following in the latest 3.3.2 > 3.3.2 > Data partitions is: 10. Shuffle partitions is 4 > Data partitions before join: 10 > Data partitions after join : 1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42847) Assign a name to the error class _LEGACY_ERROR_TEMP_2013
[ https://issues.apache.org/jira/browse/SPARK-42847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-42847: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2013* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2011* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2013 > > > Key: SPARK-42847 > URL: https://issues.apache.org/jira/browse/SPARK-42847 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2013* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42847) Assign a name to the error class _LEGACY_ERROR_TEMP_2013
Max Gekk created SPARK-42847: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2013 Key: SPARK-42847 URL: https://issues.apache.org/jira/browse/SPARK-42847 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2011* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42846) Assign a name to the error class _LEGACY_ERROR_TEMP_2011
Max Gekk created SPARK-42846: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2011 Key: SPARK-42846 URL: https://issues.apache.org/jira/browse/SPARK-42846 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2010* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42846) Assign a name to the error class _LEGACY_ERROR_TEMP_2011
[ https://issues.apache.org/jira/browse/SPARK-42846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-42846: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2011* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2010* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2011 > > > Key: SPARK-42846 > URL: https://issues.apache.org/jira/browse/SPARK-42846 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2011* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42845) Assign a name to the error class _LEGACY_ERROR_TEMP_2010
Max Gekk created SPARK-42845: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2010 Key: SPARK-42845 URL: https://issues.apache.org/jira/browse/SPARK-42845 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2008* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42845) Assign a name to the error class _LEGACY_ERROR_TEMP_2010
[ https://issues.apache.org/jira/browse/SPARK-42845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-42845: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2010* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2008* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2010 > > > Key: SPARK-42845 > URL: https://issues.apache.org/jira/browse/SPARK-42845 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2010* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42843) Assign a name to the error class _LEGACY_ERROR_TEMP_2007
Max Gekk created SPARK-42843: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2007 Key: SPARK-42843 URL: https://issues.apache.org/jira/browse/SPARK-42843 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2006* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42844) Assign a name to the error class _LEGACY_ERROR_TEMP_2008
[ https://issues.apache.org/jira/browse/SPARK-42844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-42844: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2008* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2007* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2008 > > > Key: SPARK-42844 > URL: https://issues.apache.org/jira/browse/SPARK-42844 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2008* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42844) Assign a name to the error class _LEGACY_ERROR_TEMP_2008
Max Gekk created SPARK-42844: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2008 Key: SPARK-42844 URL: https://issues.apache.org/jira/browse/SPARK-42844 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2007* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42843) Assign a name to the error class _LEGACY_ERROR_TEMP_2007
[ https://issues.apache.org/jira/browse/SPARK-42843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-42843: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2007* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2006* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2007 > > > Key: SPARK-42843 > URL: https://issues.apache.org/jira/browse/SPARK-42843 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2007* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42842) Assign a name to the error class _LEGACY_ERROR_TEMP_2006
[ https://issues.apache.org/jira/browse/SPARK-42842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-42842: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2006* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2005* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2006 > > > Key: SPARK-42842 > URL: https://issues.apache.org/jira/browse/SPARK-42842 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2006* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42842) Assign a name to the error class _LEGACY_ERROR_TEMP_2006
Max Gekk created SPARK-42842: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2006 Key: SPARK-42842 URL: https://issues.apache.org/jira/browse/SPARK-42842 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2005* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42636) Audit annotation usage
[ https://issues.apache.org/jira/browse/SPARK-42636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701703#comment-17701703 ] jiaan.geng commented on SPARK-42636: I will take a look! > Audit annotation usage > -- > > Key: SPARK-42636 > URL: https://issues.apache.org/jira/browse/SPARK-42636 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > Annotation usage is not entirely consistent in the client. We should probably > remove all Stable annotations and add a few DevelopApi ones. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-42584) Improve output of Column.explain
[ https://issues.apache.org/jira/browse/SPARK-42584 ] jiaan.geng deleted comment on SPARK-42584: was (Author: beliefer): I will take a look! > Improve output of Column.explain > > > Key: SPARK-42584 > URL: https://issues.apache.org/jira/browse/SPARK-42584 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > We currently display the structure of the proto in both the regular and > extended version of explain. We should display a more compact sql-a-like > string for the regular version. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42840) Assign a name to the error class _LEGACY_ERROR_TEMP_2004
Max Gekk created SPARK-42840: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2004 Key: SPARK-42840 URL: https://issues.apache.org/jira/browse/SPARK-42840 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42841) Assign a name to the error class _LEGACY_ERROR_TEMP_2005
Max Gekk created SPARK-42841: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2005 Key: SPARK-42841 URL: https://issues.apache.org/jira/browse/SPARK-42841 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2004* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42841) Assign a name to the error class _LEGACY_ERROR_TEMP_2005
[ https://issues.apache.org/jira/browse/SPARK-42841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-42841: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2005* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2004* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2005 > > > Key: SPARK-42841 > URL: https://issues.apache.org/jira/browse/SPARK-42841 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2005* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42840) Assign a name to the error class _LEGACY_ERROR_TEMP_2004
[ https://issues.apache.org/jira/browse/SPARK-42840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-42840: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2004* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2004 > > > Key: SPARK-42840 > URL: https://issues.apache.org/jira/browse/SPARK-42840 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2004* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42838) Assign a name to the error class _LEGACY_ERROR_TEMP_2000
[ https://issues.apache.org/jira/browse/SPARK-42838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-42838: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2000 > > > Key: SPARK-42838 > URL: https://issues.apache.org/jira/browse/SPARK-42838 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003
[ https://issues.apache.org/jira/browse/SPARK-42839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-42839: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2003 > > > Key: SPARK-42839 > URL: https://issues.apache.org/jira/browse/SPARK-42839 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003
[ https://issues.apache.org/jira/browse/SPARK-42839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-42839: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). If you cannot reproduce the error from user space (using SQL query), replace the error by an internal error, see {*}SparkException.internalError(){*}. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2003 > > > Key: SPARK-42839 > URL: https://issues.apache.org/jira/browse/SPARK-42839 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see > {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-42760) The partition of result data frame of join is always 1
[ https://issues.apache.org/jira/browse/SPARK-42760 ] Yuming Wang deleted comment on SPARK-42760: - was (Author: apachespark): User '1511351836' has created a pull request for this issue: https://github.com/apache/spark/pull/40380 > The partition of result data frame of join is always 1 > -- > > Key: SPARK-42760 > URL: https://issues.apache.org/jira/browse/SPARK-42760 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 3.3.2 > Environment: standard spark 3.0.3/3.3.2, using in jupyter notebook, > local mode >Reporter: binyang >Priority: Major > > I am using pyspark. The partition of result data frame of join is always 1. > Here is my code from > https://stackoverflow.com/questions/51876281/is-partitioning-retained-after-a-spark-sql-join > > print(spark.version) > def example_shuffle_partitions(data_partitions=10, shuffle_partitions=4): > spark.conf.set("spark.sql.shuffle.partitions", shuffle_partitions) > spark.sql("SET spark.sql.autoBroadcastJoinThreshold=-1") > df1 = spark.range(1, 1000).repartition(data_partitions) > df2 = spark.range(1, 2000).repartition(data_partitions) > df3 = spark.range(1, 3000).repartition(data_partitions) > print("Data partitions is: {}. Shuffle partitions is > {}".format(data_partitions, shuffle_partitions)) > print("Data partitions before join: > {}".format(df1.rdd.getNumPartitions())) > df = (df1.join(df2, df1.id == df2.id) > .join(df3, df1.id == df3.id)) > print("Data partitions after join : {}".format(df.rdd.getNumPartitions())) > example_shuffle_partitions() > > In Spark 3.0.3, it prints out: > 3.0.3 > Data partitions is: 10. Shuffle partitions is 4 > Data partitions before join: 10 > Data partitions after join : 4 > However, it prints out the following in the latest 3.3.2 > 3.3.2 > Data partitions is: 10. Shuffle partitions is 4 > Data partitions before join: 10 > Data partitions after join : 1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003
Max Gekk created SPARK-42839: Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2003 Key: SPARK-42839 URL: https://issues.apache.org/jira/browse/SPARK-42839 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003
[ https://issues.apache.org/jira/browse/SPARK-42839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-42839: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] > Assign a name to the error class _LEGACY_ERROR_TEMP_2003 > > > Key: SPARK-42839 > URL: https://issues.apache.org/jira/browse/SPARK-42839 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42584) Improve output of Column.explain
[ https://issues.apache.org/jira/browse/SPARK-42584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42584: Assignee: Apache Spark > Improve output of Column.explain > > > Key: SPARK-42584 > URL: https://issues.apache.org/jira/browse/SPARK-42584 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Assignee: Apache Spark >Priority: Major > > We currently display the structure of the proto in both the regular and > extended version of explain. We should display a more compact sql-a-like > string for the regular version. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42584) Improve output of Column.explain
[ https://issues.apache.org/jira/browse/SPARK-42584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42584: Assignee: (was: Apache Spark) > Improve output of Column.explain > > > Key: SPARK-42584 > URL: https://issues.apache.org/jira/browse/SPARK-42584 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > We currently display the structure of the proto in both the regular and > extended version of explain. We should display a more compact sql-a-like > string for the regular version. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42584) Improve output of Column.explain
[ https://issues.apache.org/jira/browse/SPARK-42584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701690#comment-17701690 ] Apache Spark commented on SPARK-42584: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40467 > Improve output of Column.explain > > > Key: SPARK-42584 > URL: https://issues.apache.org/jira/browse/SPARK-42584 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.0 >Reporter: Herman van Hövell >Priority: Major > > We currently display the structure of the proto in both the regular and > extended version of explain. We should display a more compact sql-a-like > string for the regular version. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42838) Assign a name to the error class _LEGACY_ERROR_TEMP_2000
[ https://issues.apache.org/jira/browse/SPARK-42838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-42838: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code if such test still doesn't exist. Check exception fields by using {*}checkError(){*}. The last function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Migrate other tests that might trigger the error onto checkError(). Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: * [https://github.com/apache/spark/pull/38685] * [https://github.com/apache/spark/pull/38656] * [https://github.com/apache/spark/pull/38490] was: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code. Check exception field by using {*}checkError(){*}. The function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: [https://github.com/apache/spark/pull/38685] [https://github.com/apache/spark/pull/38656] https://github.com/apache/spark/pull/38490 > Assign a name to the error class _LEGACY_ERROR_TEMP_2000 > > > Key: SPARK-42838 > URL: https://issues.apache.org/jira/browse/SPARK-42838 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42838) Assign a name to the error class _LEGACY_ERROR_TEMP_2000
[ https://issues.apache.org/jira/browse/SPARK-42838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-42838: - Description: Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code. Check exception field by using {*}checkError(){*}. The function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. Please, look at the PR below as examples: [https://github.com/apache/spark/pull/38685] [https://github.com/apache/spark/pull/38656] https://github.com/apache/spark/pull/38490 was: Choose a proper name for the error class _LEGACY_ERROR_TEMP_2000 defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code. Check exception field by using {*}checkError(){*}. The function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. > Assign a name to the error class _LEGACY_ERROR_TEMP_2000 > > > Key: SPARK-42838 > URL: https://issues.apache.org/jira/browse/SPARK-42838 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code. Check exception field by > using {*}checkError(){*}. The function checks valuable error fields only, and > avoids dependencies from error text message. In this way, tech editors can > modify error format in error-classes.json, and don't worry of Spark's > internal tests. > > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > [https://github.com/apache/spark/pull/38685] > [https://github.com/apache/spark/pull/38656] > https://github.com/apache/spark/pull/38490 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42838) Assign a name to the error class _LEGACY_ERROR_TEMP_2000
[ https://issues.apache.org/jira/browse/SPARK-42838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-42838: - Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2000 (was: Assing a name to the error class _LEGACY_ERROR_TEMP_2000) > Assign a name to the error class _LEGACY_ERROR_TEMP_2000 > > > Key: SPARK-42838 > URL: https://issues.apache.org/jira/browse/SPARK-42838 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: starter > > Choose a proper name for the error class _LEGACY_ERROR_TEMP_2000 defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code. Check exception field by > using {*}checkError(){*}. The function checks valuable error fields only, and > avoids dependencies from error text message. In this way, tech editors can > modify error format in error-classes.json, and don't worry of Spark's > internal tests. > > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42838) Assing a name to the error class _LEGACY_ERROR_TEMP_2000
Max Gekk created SPARK-42838: Summary: Assing a name to the error class _LEGACY_ERROR_TEMP_2000 Key: SPARK-42838 URL: https://issues.apache.org/jira/browse/SPARK-42838 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.5.0 Reporter: Max Gekk Choose a proper name for the error class _LEGACY_ERROR_TEMP_2000 defined in {*}core/src/main/resources/error/error-classes.json{*}. The name should be short but complete (look at the example in error-classes.json). Add a test which triggers the error from user code. Check exception field by using {*}checkError(){*}. The function checks valuable error fields only, and avoids dependencies from error text message. In this way, tech editors can modify error format in error-classes.json, and don't worry of Spark's internal tests. Improve the error message format in error-classes.json if the current is not clear. Propose a solution to users how to avoid and fix such kind of errors. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42837) spark-submit - issue when resolving dependencies hosted on a private repository in kubernetes cluster mode
lione Herbet created SPARK-42837: Summary: spark-submit - issue when resolving dependencies hosted on a private repository in kubernetes cluster mode Key: SPARK-42837 URL: https://issues.apache.org/jira/browse/SPARK-42837 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 3.3.2 Reporter: lione Herbet When using [spark operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator], if dependencies are hosted on a private repository with authentication needed (like S3 or OCI) the spark operator submitting the job need to have all the secrets to access all dependencies. If not the spark-submit fails. On a multi tenant kubernetes cluster where the spark operator and spark jobs execution are on seperate namespaces, it involves duplicating all secrets or it won't work. It seems that spark-submit need to acces dependencies (with credentials) only to resolveGlobPath ([https://github.com/apache/spark/blob/v3.3.2/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L364-L367)] . It seems to me (but need to be confirmed by someone more skilled than me on spark internals behavior) that this resolveGlobPath task is also done when the driver is downloading the jars. Would it be possible to have this resolveGlobPath task skipped when running on a Kubernetes Cluster in cluster mode ? For example add a condition like this arround the 364-367 lines : {code:java} if (isKubernetesCluster) { ... } {code} We could even, for compatibility reason with old behavior if needed, add also a condition on a spark parameter like this : {code:java} if (isKubernetesCluster && sparkConf.getBoolean("spark.kubernetes.resolevGlobPathsInSubmit", true)) { ... }{code} i tested both solution locally and it seems to resolve the case. Do yout think I need to consider other elements ? I may submit a patch depending on your feedback -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42836) Support for recursive queries
[ https://issues.apache.org/jira/browse/SPARK-42836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max updated SPARK-42836: Component/s: SQL > Support for recursive queries > - > > Key: SPARK-42836 > URL: https://issues.apache.org/jira/browse/SPARK-42836 > Project: Spark > Issue Type: Question > Components: Java API, SQL >Affects Versions: 3.4.0 >Reporter: Max >Priority: Blocker > > Hello, a subtask was created a long time ago > https://issues.apache.org/jira/browse/SPARK-24497 > When will this task be completed? We really miss this. > Thx. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42836) Support for recursive queries
Max created SPARK-42836: --- Summary: Support for recursive queries Key: SPARK-42836 URL: https://issues.apache.org/jira/browse/SPARK-42836 Project: Spark Issue Type: Question Components: Java API Affects Versions: 3.4.0 Reporter: Max Hello, a subtask was created a long time ago https://issues.apache.org/jira/browse/SPARK-24497 When will this task be completed? We really miss this. Thx. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42835) Add test cases for Column.explain
[ https://issues.apache.org/jira/browse/SPARK-42835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701623#comment-17701623 ] Apache Spark commented on SPARK-42835: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40466 > Add test cases for Column.explain > - > > Key: SPARK-42835 > URL: https://issues.apache.org/jira/browse/SPARK-42835 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42835) Add test cases for Column.explain
[ https://issues.apache.org/jira/browse/SPARK-42835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42835: Assignee: (was: Apache Spark) > Add test cases for Column.explain > - > > Key: SPARK-42835 > URL: https://issues.apache.org/jira/browse/SPARK-42835 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42835) Add test cases for Column.explain
[ https://issues.apache.org/jira/browse/SPARK-42835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701621#comment-17701621 ] Apache Spark commented on SPARK-42835: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/40466 > Add test cases for Column.explain > - > > Key: SPARK-42835 > URL: https://issues.apache.org/jira/browse/SPARK-42835 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42835) Add test cases for Column.explain
[ https://issues.apache.org/jira/browse/SPARK-42835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42835: Assignee: Apache Spark > Add test cases for Column.explain > - > > Key: SPARK-42835 > URL: https://issues.apache.org/jira/browse/SPARK-42835 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42835) Add test cases for Column.explain
jiaan.geng created SPARK-42835: -- Summary: Add test cases for Column.explain Key: SPARK-42835 URL: https://issues.apache.org/jira/browse/SPARK-42835 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.5.0 Reporter: jiaan.geng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42834) Divided by zero occurs in PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse
[ https://issues.apache.org/jira/browse/SPARK-42834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701610#comment-17701610 ] Li Ying commented on SPARK-42834: - [~csingh] Could you please help confirm this? > Divided by zero occurs in > PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse > > > Key: SPARK-42834 > URL: https://issues.apache.org/jira/browse/SPARK-42834 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Li Ying >Priority: Major > > {color:#22}Sometimes when run a SQL job with push based shuffle, > exception occurs as below. It seems that there’s no element in the bitmaps > which stores merge chunk meta. {color} > {color:#22}Is it a bug that we should not createChunkBlockInfos when > bitmaps is empty or the bitmaps should never be empty here ?{color} > > {code:java} > Caused by: java.lang.ArithmeticException: / by zero > at > org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:117) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:980) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84) > {code} > related code: > {code:java} > def createChunkBlockInfosFromMetaResponse( > shuffleId: Int, > shuffleMergeId: Int, > reduceId: Int, > blockSize: Long, > bitmaps: Array[RoaringBitmap]): ArrayBuffer[(BlockId, Long, Int)] = { > val approxChunkSize = blockSize / bitmaps.length > val blocksToFetch = new ArrayBuffer[(BlockId, Long, Int)]() > for (i <- bitmaps.indices) { > val blockChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, > reduceId, i) > chunksMetaMap.put(blockChunkId, bitmaps(i)) > logDebug(s"adding block chunk $blockChunkId of size $approxChunkSize") > blocksToFetch += ((blockChunkId, approxChunkSize, SHUFFLE_PUSH_MAP_ID)) > } > blocksToFetch > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-42834) Divided by zero occurs in PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse
Li Ying created SPARK-42834: --- Summary: Divided by zero occurs in PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse Key: SPARK-42834 URL: https://issues.apache.org/jira/browse/SPARK-42834 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 3.2.0 Reporter: Li Ying {color:#22}Sometimes when run a SQL job with push based shuffle, exception occurs as below. It seems that there’s no element in the bitmaps which stores merge chunk meta. See org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse.{color} {color:#22} {color} {color:#22}Is it a bug that we should not createChunkBlockInfos when bitmaps is empty or the bitmaps should never be empty here ?{color} {code:java} Caused by: java.lang.ArithmeticException: / by zero at org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:117) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:980) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84) {code} related code: {code:java} def createChunkBlockInfosFromMetaResponse( shuffleId: Int, shuffleMergeId: Int, reduceId: Int, blockSize: Long, bitmaps: Array[RoaringBitmap]): ArrayBuffer[(BlockId, Long, Int)] = { val approxChunkSize = blockSize / bitmaps.length val blocksToFetch = new ArrayBuffer[(BlockId, Long, Int)]() for (i <- bitmaps.indices) { val blockChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, reduceId, i) chunksMetaMap.put(blockChunkId, bitmaps(i)) logDebug(s"adding block chunk $blockChunkId of size $approxChunkSize") blocksToFetch += ((blockChunkId, approxChunkSize, SHUFFLE_PUSH_MAP_ID)) } blocksToFetch } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42834) Divided by zero occurs in PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse
[ https://issues.apache.org/jira/browse/SPARK-42834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Ying updated SPARK-42834: Description: {color:#22}Sometimes when run a SQL job with push based shuffle, exception occurs as below. It seems that there’s no element in the bitmaps which stores merge chunk meta. {color} {color:#22}Is it a bug that we should not createChunkBlockInfos when bitmaps is empty or the bitmaps should never be empty here ?{color} {code:java} Caused by: java.lang.ArithmeticException: / by zero at org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:117) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:980) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84) {code} related code: {code:java} def createChunkBlockInfosFromMetaResponse( shuffleId: Int, shuffleMergeId: Int, reduceId: Int, blockSize: Long, bitmaps: Array[RoaringBitmap]): ArrayBuffer[(BlockId, Long, Int)] = { val approxChunkSize = blockSize / bitmaps.length val blocksToFetch = new ArrayBuffer[(BlockId, Long, Int)]() for (i <- bitmaps.indices) { val blockChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, reduceId, i) chunksMetaMap.put(blockChunkId, bitmaps(i)) logDebug(s"adding block chunk $blockChunkId of size $approxChunkSize") blocksToFetch += ((blockChunkId, approxChunkSize, SHUFFLE_PUSH_MAP_ID)) } blocksToFetch } {code} was: {color:#22}Sometimes when run a SQL job with push based shuffle, exception occurs as below. It seems that there’s no element in the bitmaps which stores merge chunk meta. See org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse.{color} {color:#22} {color} {color:#22}Is it a bug that we should not createChunkBlockInfos when bitmaps is empty or the bitmaps should never be empty here ?{color} {code:java} Caused by: java.lang.ArithmeticException: / by zero at org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:117) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:980) at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84) {code} related code: {code:java} def createChunkBlockInfosFromMetaResponse( shuffleId: Int, shuffleMergeId: Int, reduceId: Int, blockSize: Long, bitmaps: Array[RoaringBitmap]): ArrayBuffer[(BlockId, Long, Int)] = { val approxChunkSize = blockSize / bitmaps.length val blocksToFetch = new ArrayBuffer[(BlockId, Long, Int)]() for (i <- bitmaps.indices) { val blockChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, reduceId, i) chunksMetaMap.put(blockChunkId, bitmaps(i)) logDebug(s"adding block chunk $blockChunkId of size $approxChunkSize") blocksToFetch += ((blockChunkId, approxChunkSize, SHUFFLE_PUSH_MAP_ID)) } blocksToFetch } {code} > Divided by zero occurs in > PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse > > > Key: SPARK-42834 > URL: https://issues.apache.org/jira/browse/SPARK-42834 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Li Ying >Priority: Major > > {color:#22}Sometimes when run a SQL job with push based shuffle, > exception occurs as below. It seems that there’s no element in the bitmaps > which stores merge chunk meta. {color} > {color:#22}Is it a bug that we should not createChunkBlockInfos when > bitmaps is empty or the bitmaps should never be empty here ?{color} > > {code:java} > Caused by: java.lang.ArithmeticException: / by zero > at > org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:117) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:980) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84) > {code} > related code: > {code:java} > def createChunkBlockInfosFromMetaResponse( > shuffleId: Int, > shuffleMergeId: Int, > reduceId: Int, > blockSize: Long, > bitmaps: Array[RoaringBitmap]): ArrayBuffer[(BlockId, Long, Int)] = { > val approxChunkSize = blockSize / bitmaps.length > val blocksToFetch = new ArrayBuffer[(BlockId, Long, Int)]() > for (i <- bitmaps.indices) { > val blockChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, > reduceId, i) > chunksMetaMap.put(blockChunkId, bitmaps(i)) > logDebug(s"adding block chunk $blockChunkId of size $approxChunkSize") > blocksToFetch += ((bl
[jira] [Assigned] (SPARK-42833) Refactor `applyExtensions` in `SparkSession`
[ https://issues.apache.org/jira/browse/SPARK-42833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42833: Assignee: Apache Spark > Refactor `applyExtensions` in `SparkSession` > > > Key: SPARK-42833 > URL: https://issues.apache.org/jira/browse/SPARK-42833 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Assignee: Apache Spark >Priority: Minor > > Refactor `applyExtensions` in `SparkSession` in order to reduce the > duplicated codes -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42833) Refactor `applyExtensions` in `SparkSession`
[ https://issues.apache.org/jira/browse/SPARK-42833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-42833: Assignee: (was: Apache Spark) > Refactor `applyExtensions` in `SparkSession` > > > Key: SPARK-42833 > URL: https://issues.apache.org/jira/browse/SPARK-42833 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Kazuyuki Tanimura >Priority: Minor > > Refactor `applyExtensions` in `SparkSession` in order to reduce the > duplicated codes -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org