[jira] [Commented] (SPARK-42849) Session variables

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702069#comment-17702069
 ] 

Apache Spark commented on SPARK-42849:
--

User 'srielau' has created a pull request for this issue:
https://github.com/apache/spark/pull/40474

> Session variables
> -
>
> Key: SPARK-42849
> URL: https://issues.apache.org/jira/browse/SPARK-42849
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Serge Rielau
>Priority: Major
>
> Provide a type-safe, engine controlled session variable:
> CREATE [ OR REPLACE } TEMPORARY VARIABLE [ IF NOT EXISTS ]var_name  [ type ][ 
> DEFAULT expresion ]
> SET {  variable = expression | ( variable [, ...] ) = ( subquery | expression 
> [, ...] )
> DROP VARIABLE  [ IF EXISTS ]variable_name



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42849) Session variables

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42849:


Assignee: Apache Spark

> Session variables
> -
>
> Key: SPARK-42849
> URL: https://issues.apache.org/jira/browse/SPARK-42849
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Serge Rielau
>Assignee: Apache Spark
>Priority: Major
>
> Provide a type-safe, engine controlled session variable:
> CREATE [ OR REPLACE } TEMPORARY VARIABLE [ IF NOT EXISTS ]var_name  [ type ][ 
> DEFAULT expresion ]
> SET {  variable = expression | ( variable [, ...] ) = ( subquery | expression 
> [, ...] )
> DROP VARIABLE  [ IF EXISTS ]variable_name



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42849) Session variables

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702068#comment-17702068
 ] 

Apache Spark commented on SPARK-42849:
--

User 'srielau' has created a pull request for this issue:
https://github.com/apache/spark/pull/40474

> Session variables
> -
>
> Key: SPARK-42849
> URL: https://issues.apache.org/jira/browse/SPARK-42849
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Serge Rielau
>Priority: Major
>
> Provide a type-safe, engine controlled session variable:
> CREATE [ OR REPLACE } TEMPORARY VARIABLE [ IF NOT EXISTS ]var_name  [ type ][ 
> DEFAULT expresion ]
> SET {  variable = expression | ( variable [, ...] ) = ( subquery | expression 
> [, ...] )
> DROP VARIABLE  [ IF EXISTS ]variable_name



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42849) Session variables

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42849:


Assignee: (was: Apache Spark)

> Session variables
> -
>
> Key: SPARK-42849
> URL: https://issues.apache.org/jira/browse/SPARK-42849
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Serge Rielau
>Priority: Major
>
> Provide a type-safe, engine controlled session variable:
> CREATE [ OR REPLACE } TEMPORARY VARIABLE [ IF NOT EXISTS ]var_name  [ type ][ 
> DEFAULT expresion ]
> SET {  variable = expression | ( variable [, ...] ) = ( subquery | expression 
> [, ...] )
> DROP VARIABLE  [ IF EXISTS ]variable_name



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42834) Divided by zero occurs in PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse

2023-03-17 Thread Li Ying (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Ying resolved SPARK-42834.
-
Resolution: Won't Do

> Divided by zero occurs in 
> PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse
> 
>
> Key: SPARK-42834
> URL: https://issues.apache.org/jira/browse/SPARK-42834
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Li Ying
>Priority: Major
>
> {color:#22}Sometimes when run a SQL job with push based shuffle, 
> exception occurs as below.  It seems that there’s no element in the bitmaps 
> which stores merge chunk meta. {color}
> {color:#22}Is it a bug that we should not createChunkBlockInfos when 
> bitmaps is empty or the bitmaps should never be empty here ?{color}
>  
> {code:java}
> Caused by: java.lang.ArithmeticException: / by zero
> at 
> org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:117)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:980)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84)
>  {code}
> related code:
> {code:java}
> def createChunkBlockInfosFromMetaResponse(
> shuffleId: Int,
> shuffleMergeId: Int,
> reduceId: Int,
> blockSize: Long,
> bitmaps: Array[RoaringBitmap]): ArrayBuffer[(BlockId, Long, Int)] = {
>   val approxChunkSize = blockSize / bitmaps.length
>   val blocksToFetch = new ArrayBuffer[(BlockId, Long, Int)]()
>   for (i <- bitmaps.indices) {
> val blockChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, 
> reduceId, i)
> chunksMetaMap.put(blockChunkId, bitmaps(i))
> logDebug(s"adding block chunk $blockChunkId of size $approxChunkSize")
> blocksToFetch += ((blockChunkId, approxChunkSize, SHUFFLE_PUSH_MAP_ID))
>   }
>   blocksToFetch
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-42834) Divided by zero occurs in PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse

2023-03-17 Thread Li Ying (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Ying closed SPARK-42834.
---

> Divided by zero occurs in 
> PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse
> 
>
> Key: SPARK-42834
> URL: https://issues.apache.org/jira/browse/SPARK-42834
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Li Ying
>Priority: Major
>
> {color:#22}Sometimes when run a SQL job with push based shuffle, 
> exception occurs as below.  It seems that there’s no element in the bitmaps 
> which stores merge chunk meta. {color}
> {color:#22}Is it a bug that we should not createChunkBlockInfos when 
> bitmaps is empty or the bitmaps should never be empty here ?{color}
>  
> {code:java}
> Caused by: java.lang.ArithmeticException: / by zero
> at 
> org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:117)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:980)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84)
>  {code}
> related code:
> {code:java}
> def createChunkBlockInfosFromMetaResponse(
> shuffleId: Int,
> shuffleMergeId: Int,
> reduceId: Int,
> blockSize: Long,
> bitmaps: Array[RoaringBitmap]): ArrayBuffer[(BlockId, Long, Int)] = {
>   val approxChunkSize = blockSize / bitmaps.length
>   val blocksToFetch = new ArrayBuffer[(BlockId, Long, Int)]()
>   for (i <- bitmaps.indices) {
> val blockChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, 
> reduceId, i)
> chunksMetaMap.put(blockChunkId, bitmaps(i))
> logDebug(s"adding block chunk $blockChunkId of size $approxChunkSize")
> blocksToFetch += ((blockChunkId, approxChunkSize, SHUFFLE_PUSH_MAP_ID))
>   }
>   blocksToFetch
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42834) Divided by zero occurs in PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse

2023-03-17 Thread Li Ying (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702050#comment-17702050
 ] 

Li Ying commented on SPARK-42834:
-

[~csingh] Thanks for help. I would take this fix :)

> Divided by zero occurs in 
> PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse
> 
>
> Key: SPARK-42834
> URL: https://issues.apache.org/jira/browse/SPARK-42834
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Li Ying
>Priority: Major
>
> {color:#22}Sometimes when run a SQL job with push based shuffle, 
> exception occurs as below.  It seems that there’s no element in the bitmaps 
> which stores merge chunk meta. {color}
> {color:#22}Is it a bug that we should not createChunkBlockInfos when 
> bitmaps is empty or the bitmaps should never be empty here ?{color}
>  
> {code:java}
> Caused by: java.lang.ArithmeticException: / by zero
> at 
> org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:117)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:980)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84)
>  {code}
> related code:
> {code:java}
> def createChunkBlockInfosFromMetaResponse(
> shuffleId: Int,
> shuffleMergeId: Int,
> reduceId: Int,
> blockSize: Long,
> bitmaps: Array[RoaringBitmap]): ArrayBuffer[(BlockId, Long, Int)] = {
>   val approxChunkSize = blockSize / bitmaps.length
>   val blocksToFetch = new ArrayBuffer[(BlockId, Long, Int)]()
>   for (i <- bitmaps.indices) {
> val blockChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, 
> reduceId, i)
> chunksMetaMap.put(blockChunkId, bitmaps(i))
> logDebug(s"adding block chunk $blockChunkId of size $approxChunkSize")
> blocksToFetch += ((blockChunkId, approxChunkSize, SHUFFLE_PUSH_MAP_ID))
>   }
>   blocksToFetch
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42803) Use getParameterCount function instead of getParameterTypes.length

2023-03-17 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-42803.
--
Fix Version/s: 3.5.0
   (was: 3.3.2)
 Assignee: Narek Karapetian
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/40422

> Use getParameterCount function instead of getParameterTypes.length
> --
>
> Key: SPARK-42803
> URL: https://issues.apache.org/jira/browse/SPARK-42803
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, Spark Core, SQL
>Affects Versions: 3.3.3
>Reporter: Narek Karapetian
>Assignee: Narek Karapetian
>Priority: Minor
> Fix For: 3.5.0
>
>
> Since jdk1.8 there is an additional function in reflection API 
> {{{}getParameterCount{}}}, it is better to use that function instead of 
> {{getParameterTypes.length}} because {{getParameterTypes}} function makes a 
> copy of the parameter types array every invocation.
> This will help to avoid redundant arrays creation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42851:


Assignee: Apache Spark

> EquivalentExpressions methods need to be consistently guarded by 
> supportedExpression
> 
>
> Key: SPARK-42851
> URL: https://issues.apache.org/jira/browse/SPARK-42851
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Kris Mok
>Assignee: Apache Spark
>Priority: Major
>
> SPARK-41468 tried to fix a bug but introduced a new regression. Its change to 
> {{EquivalentExpressions}} added a {{supportedExpression()}} guard to the 
> {{addExprTree()}} and {{getExprState()}} methods, but didn't add the same 
> guard to the other "add" entry point -- {{addExpr()}}.
> As such, uses that add single expressions to CSE via {{addExpr()}} may 
> succeed, but upon retrieval via {{getExprState()}} it'd inconsistently get a 
> {{None}} due to failing the guard.
> We need to make sure the "add" and "get" methods are consistent. It could be 
> done by one of:
> 1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or
> 2. Removing the guard from {{getExprState()}}, relying solely on the guard on 
> the "add" path to make sure only intended state is added.
> (or other alternative refactorings to fuse the guard into various methods to 
> make it more efficient)
> There are pros and cons to the two directions above, because {{addExpr()}} 
> used to allow (potentially incorrect) more expressions to get CSE'd, making 
> it more restrictive may cause performance regressions (for the cases that 
> happened to work).
> Example:
> {code:sql}
> select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) 
> from range(2)
> {code}
> Running this query on Spark 3.2 branch returns the correct value:
> {code}
> scala> spark.sql("select max(transform(array(id), x -> x)), 
> max(transform(array(id), x -> x)) from range(2)").collect
> res0: Array[org.apache.spark.sql.Row] = 
> Array([WrappedArray(1),WrappedArray(1)])
> {code}
> Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was 
> (potentially unsafely) recognized by {{addExpr()}} as a common subexpression, 
> and {{getExprState()}} doesn't do extra guarding, so during physical 
> planning, in {{PhysicalAggregation}} this expression gets CSE'd in both the 
> aggregation expression list and the result expressions list.
> {code}
> AdaptiveSparkPlan isFinalPlan=false
> +- SortAggregate(key=[], functions=[max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11]
>   +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>  +- Range (0, 2, step=1, splits=16)
> {code}
> Running the same query on current master triggers an error when binding the 
> result expression to the aggregate expression in the Aggregate operators (for 
> a WSCG-enabled operator like {{HashAggregateExec}}, the same error would show 
> up during codegen):
> {code}
> ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
> (TID 16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): 
> java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), 
> lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in 
> [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, 
> false)))#3]
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:532)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode

[jira] [Assigned] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42851:


Assignee: (was: Apache Spark)

> EquivalentExpressions methods need to be consistently guarded by 
> supportedExpression
> 
>
> Key: SPARK-42851
> URL: https://issues.apache.org/jira/browse/SPARK-42851
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Kris Mok
>Priority: Major
>
> SPARK-41468 tried to fix a bug but introduced a new regression. Its change to 
> {{EquivalentExpressions}} added a {{supportedExpression()}} guard to the 
> {{addExprTree()}} and {{getExprState()}} methods, but didn't add the same 
> guard to the other "add" entry point -- {{addExpr()}}.
> As such, uses that add single expressions to CSE via {{addExpr()}} may 
> succeed, but upon retrieval via {{getExprState()}} it'd inconsistently get a 
> {{None}} due to failing the guard.
> We need to make sure the "add" and "get" methods are consistent. It could be 
> done by one of:
> 1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or
> 2. Removing the guard from {{getExprState()}}, relying solely on the guard on 
> the "add" path to make sure only intended state is added.
> (or other alternative refactorings to fuse the guard into various methods to 
> make it more efficient)
> There are pros and cons to the two directions above, because {{addExpr()}} 
> used to allow (potentially incorrect) more expressions to get CSE'd, making 
> it more restrictive may cause performance regressions (for the cases that 
> happened to work).
> Example:
> {code:sql}
> select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) 
> from range(2)
> {code}
> Running this query on Spark 3.2 branch returns the correct value:
> {code}
> scala> spark.sql("select max(transform(array(id), x -> x)), 
> max(transform(array(id), x -> x)) from range(2)").collect
> res0: Array[org.apache.spark.sql.Row] = 
> Array([WrappedArray(1),WrappedArray(1)])
> {code}
> Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was 
> (potentially unsafely) recognized by {{addExpr()}} as a common subexpression, 
> and {{getExprState()}} doesn't do extra guarding, so during physical 
> planning, in {{PhysicalAggregation}} this expression gets CSE'd in both the 
> aggregation expression list and the result expressions list.
> {code}
> AdaptiveSparkPlan isFinalPlan=false
> +- SortAggregate(key=[], functions=[max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11]
>   +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>  +- Range (0, 2, step=1, splits=16)
> {code}
> Running the same query on current master triggers an error when binding the 
> result expression to the aggregate expression in the Aggregate operators (for 
> a WSCG-enabled operator like {{HashAggregateExec}}, the same error would show 
> up during codegen):
> {code}
> ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
> (TID 16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): 
> java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), 
> lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in 
> [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, 
> false)))#3]
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:532)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:517)
>   at 
>

[jira] [Commented] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702033#comment-17702033
 ] 

Apache Spark commented on SPARK-42851:
--

User 'rednaxelafx' has created a pull request for this issue:
https://github.com/apache/spark/pull/40473

> EquivalentExpressions methods need to be consistently guarded by 
> supportedExpression
> 
>
> Key: SPARK-42851
> URL: https://issues.apache.org/jira/browse/SPARK-42851
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Kris Mok
>Priority: Major
>
> SPARK-41468 tried to fix a bug but introduced a new regression. Its change to 
> {{EquivalentExpressions}} added a {{supportedExpression()}} guard to the 
> {{addExprTree()}} and {{getExprState()}} methods, but didn't add the same 
> guard to the other "add" entry point -- {{addExpr()}}.
> As such, uses that add single expressions to CSE via {{addExpr()}} may 
> succeed, but upon retrieval via {{getExprState()}} it'd inconsistently get a 
> {{None}} due to failing the guard.
> We need to make sure the "add" and "get" methods are consistent. It could be 
> done by one of:
> 1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or
> 2. Removing the guard from {{getExprState()}}, relying solely on the guard on 
> the "add" path to make sure only intended state is added.
> (or other alternative refactorings to fuse the guard into various methods to 
> make it more efficient)
> There are pros and cons to the two directions above, because {{addExpr()}} 
> used to allow (potentially incorrect) more expressions to get CSE'd, making 
> it more restrictive may cause performance regressions (for the cases that 
> happened to work).
> Example:
> {code:sql}
> select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) 
> from range(2)
> {code}
> Running this query on Spark 3.2 branch returns the correct value:
> {code}
> scala> spark.sql("select max(transform(array(id), x -> x)), 
> max(transform(array(id), x -> x)) from range(2)").collect
> res0: Array[org.apache.spark.sql.Row] = 
> Array([WrappedArray(1),WrappedArray(1)])
> {code}
> Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was 
> (potentially unsafely) recognized by {{addExpr()}} as a common subexpression, 
> and {{getExprState()}} doesn't do extra guarding, so during physical 
> planning, in {{PhysicalAggregation}} this expression gets CSE'd in both the 
> aggregation expression list and the result expressions list.
> {code}
> AdaptiveSparkPlan isFinalPlan=false
> +- SortAggregate(key=[], functions=[max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>+- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11]
>   +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), 
> lambdafunction(lambda x#1L, lambda x#1L, false)))])
>  +- Range (0, 2, step=1, splits=16)
> {code}
> Running the same query on current master triggers an error when binding the 
> result expression to the aggregate expression in the Aggregate operators (for 
> a WSCG-enabled operator like {{HashAggregateExec}}, the same error would show 
> up during codegen):
> {code}
> ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 
> (TID 16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): 
> java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), 
> lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in 
> [max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, 
> false)))#3]
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
>   at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249)
>   at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248)
>   at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:532)

[jira] [Created] (SPARK-42851) EquivalentExpressions methods need to be consistently guarded by supportedExpression

2023-03-17 Thread Kris Mok (Jira)
Kris Mok created SPARK-42851:


 Summary: EquivalentExpressions methods need to be consistently 
guarded by supportedExpression
 Key: SPARK-42851
 URL: https://issues.apache.org/jira/browse/SPARK-42851
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.2, 3.4.0
Reporter: Kris Mok


SPARK-41468 tried to fix a bug but introduced a new regression. Its change to 
{{EquivalentExpressions}} added a {{supportedExpression()}} guard to the 
{{addExprTree()}} and {{getExprState()}} methods, but didn't add the same guard 
to the other "add" entry point -- {{addExpr()}}.

As such, uses that add single expressions to CSE via {{addExpr()}} may succeed, 
but upon retrieval via {{getExprState()}} it'd inconsistently get a {{None}} 
due to failing the guard.

We need to make sure the "add" and "get" methods are consistent. It could be 
done by one of:
1. Adding the same {{supportedExpression()}} guard to {{addExpr()}}, or
2. Removing the guard from {{getExprState()}}, relying solely on the guard on 
the "add" path to make sure only intended state is added.
(or other alternative refactorings to fuse the guard into various methods to 
make it more efficient)

There are pros and cons to the two directions above, because {{addExpr()}} used 
to allow (potentially incorrect) more expressions to get CSE'd, making it more 
restrictive may cause performance regressions (for the cases that happened to 
work).

Example:
{code:sql}
select max(transform(array(id), x -> x)), max(transform(array(id), x -> x)) 
from range(2)
{code}

Running this query on Spark 3.2 branch returns the correct value:
{code}
scala> spark.sql("select max(transform(array(id), x -> x)), 
max(transform(array(id), x -> x)) from range(2)").collect
res0: Array[org.apache.spark.sql.Row] = Array([WrappedArray(1),WrappedArray(1)])
{code}
Here, {{transform(array(id), x -> x)}} is an {{AggregateExpression}} that was 
(potentially unsafely) recognized by {{addExpr()}} as a common subexpression, 
and {{getExprState()}} doesn't do extra guarding, so during physical planning, 
in {{PhysicalAggregation}} this expression gets CSE'd in both the aggregation 
expression list and the result expressions list.
{code}
AdaptiveSparkPlan isFinalPlan=false
+- SortAggregate(key=[], functions=[max(transform(array(id#0L), 
lambdafunction(lambda x#1L, lambda x#1L, false)))])
   +- Exchange SinglePartition, ENSURE_REQUIREMENTS, [plan_id=11]
  +- SortAggregate(key=[], functions=[partial_max(transform(array(id#0L), 
lambdafunction(lambda x#1L, lambda x#1L, false)))])
 +- Range (0, 2, step=1, splits=16)
{code}

Running the same query on current master triggers an error when binding the 
result expression to the aggregate expression in the Aggregate operators (for a 
WSCG-enabled operator like {{HashAggregateExec}}, the same error would show up 
during codegen):
{code}
ERROR TaskSetManager: Task 0 in stage 2.0 failed 1 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 
16) (ip-10-110-16-93.us-west-2.compute.internal executor driver): 
java.lang.IllegalStateException: Couldn't find max(transform(array(id#0L), 
lambdafunction(lambda x#2L, lambda x#2L, false)))#4 in 
[max(transform(array(id#0L), lambdafunction(lambda x#1L, lambda x#1L, 
false)))#3]
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:104)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:517)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1249)
at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1248)
at 
org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:532)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:517)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:456)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:73)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:94)
at scala.coll

[jira] [Assigned] (SPARK-42247) Standardize `returnType` property of UserDefinedFunction

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42247:


Assignee: (was: Apache Spark)

> Standardize `returnType` property of UserDefinedFunction
> 
>
> Key: SPARK-42247
> URL: https://issues.apache.org/jira/browse/SPARK-42247
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> There are checks 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42247) Standardize `returnType` property of UserDefinedFunction

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702022#comment-17702022
 ] 

Apache Spark commented on SPARK-42247:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40472

> Standardize `returnType` property of UserDefinedFunction
> 
>
> Key: SPARK-42247
> URL: https://issues.apache.org/jira/browse/SPARK-42247
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> There are checks 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42247) Standardize `returnType` property of UserDefinedFunction

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42247:


Assignee: Apache Spark

> Standardize `returnType` property of UserDefinedFunction
> 
>
> Key: SPARK-42247
> URL: https://issues.apache.org/jira/browse/SPARK-42247
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> There are checks 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42850) Remove duplicated rule CombineFilters in Optimizer

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42850:


Assignee: Apache Spark  (was: Gengliang Wang)

> Remove duplicated rule CombineFilters in Optimizer
> --
>
> Key: SPARK-42850
> URL: https://issues.apache.org/jira/browse/SPARK-42850
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42850) Remove duplicated rule CombineFilters in Optimizer

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42850:


Assignee: Gengliang Wang  (was: Apache Spark)

> Remove duplicated rule CombineFilters in Optimizer
> --
>
> Key: SPARK-42850
> URL: https://issues.apache.org/jira/browse/SPARK-42850
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42850) Remove duplicated rule CombineFilters in Optimizer

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702021#comment-17702021
 ] 

Apache Spark commented on SPARK-42850:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40471

> Remove duplicated rule CombineFilters in Optimizer
> --
>
> Key: SPARK-42850
> URL: https://issues.apache.org/jira/browse/SPARK-42850
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42850) Remove duplicated rule CombineFilters in Optimizer

2023-03-17 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-42850:
--

 Summary: Remove duplicated rule CombineFilters in Optimizer
 Key: SPARK-42850
 URL: https://issues.apache.org/jira/browse/SPARK-42850
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 3.4.1
Reporter: Gengliang Wang
Assignee: Gengliang Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41843) Implement SparkSession.udf

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702006#comment-17702006
 ] 

Apache Spark commented on SPARK-41843:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40470

> Implement SparkSession.udf
> --
>
> Key: SPARK-41843
> URL: https://issues.apache.org/jira/browse/SPARK-41843
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 2331, in pyspark.sql.connect.functions.call_udf
> Failed example:
>     _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType())
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType())
>     AttributeError: 'SparkSession' object has no attribute 'udf'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41818) Support DataFrameWriter.saveAsTable

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702003#comment-17702003
 ] 

Apache Spark commented on SPARK-41818:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40470

> Support DataFrameWriter.saveAsTable
> ---
>
> Key: SPARK-41818
> URL: https://issues.apache.org/jira/browse/SPARK-41818
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto
> Failed example:
>     df.write.saveAsTable("tblA")
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in 
> 
>         df.write.saveAsTable("tblA")
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 350, in saveAsTable
>         
> self._spark.client.execute_command(self._write.command(self._spark.client))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 459, in execute_command
>         self._execute(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 547, in _execute
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 623, in _handle_error
>         raise SparkConnectException(status.message, info.reason) from None
>     pyspark.sql.connect.client.SparkConnectException: 
> (java.lang.ClassNotFoundException) .DefaultSource{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41843) Implement SparkSession.udf

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702004#comment-17702004
 ] 

Apache Spark commented on SPARK-41843:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40470

> Implement SparkSession.udf
> --
>
> Key: SPARK-41843
> URL: https://issues.apache.org/jira/browse/SPARK-41843
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 2331, in pyspark.sql.connect.functions.call_udf
> Failed example:
>     _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType())
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType())
>     AttributeError: 'SparkSession' object has no attribute 'udf'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41843) Implement SparkSession.udf

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702005#comment-17702005
 ] 

Apache Spark commented on SPARK-41843:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40470

> Implement SparkSession.udf
> --
>
> Key: SPARK-41843
> URL: https://issues.apache.org/jira/browse/SPARK-41843
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/functions.py", 
> line 2331, in pyspark.sql.connect.functions.call_udf
> Failed example:
>     _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType())
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 1, in 
> 
>         _ = spark.udf.register("intX2", lambda i: i * 2, IntegerType())
>     AttributeError: 'SparkSession' object has no attribute 'udf'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41818) Support DataFrameWriter.saveAsTable

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17702002#comment-17702002
 ] 

Apache Spark commented on SPARK-41818:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40470

> Support DataFrameWriter.saveAsTable
> ---
>
> Key: SPARK-41818
> URL: https://issues.apache.org/jira/browse/SPARK-41818
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 369, in pyspark.sql.connect.readwriter.DataFrameWriter.insertInto
> Failed example:
>     df.write.saveAsTable("tblA")
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File " pyspark.sql.connect.readwriter.DataFrameWriter.insertInto[2]>", line 1, in 
> 
>         df.write.saveAsTable("tblA")
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/readwriter.py", 
> line 350, in saveAsTable
>         
> self._spark.client.execute_command(self._write.command(self._spark.client))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 459, in execute_command
>         self._execute(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 547, in _execute
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 623, in _handle_error
>         raise SparkConnectException(status.message, info.reason) from None
>     pyspark.sql.connect.client.SparkConnectException: 
> (java.lang.ClassNotFoundException) .DefaultSource{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42849) Session variables

2023-03-17 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-42849:


 Summary: Session variables
 Key: SPARK-42849
 URL: https://issues.apache.org/jira/browse/SPARK-42849
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Serge Rielau


Provide a type-safe, engine controlled session variable:

CREATE [ OR REPLACE } TEMPORARY VARIABLE [ IF NOT EXISTS ]var_name  [ type ][ 
DEFAULT expresion ]

SET {  variable = expression | ( variable [, ...] ) = ( subquery | expression 
[, ...] )

DROP VARIABLE  [ IF EXISTS ]variable_name



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42546) SPARK-42045 is incomplete in supporting ANSI_MODE fro round() and bround()

2023-03-17 Thread Daniel Davies (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701987#comment-17701987
 ] 

Daniel Davies commented on SPARK-42546:
---

Can I take this?

> SPARK-42045 is incomplete in supporting ANSI_MODE fro round() and bround()
> --
>
> Key: SPARK-42546
> URL: https://issues.apache.org/jira/browse/SPARK-42546
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> under ANSI mode SPARK-42045 added error conditions insetad of silent 
> overflows for edge cases in round() and bround().
> However it appears this fix works only for the INT data type. Trying it on a 
> e.g. SMALLINT the function still returns wrong results:
> {code:java}
> spark-sql> select round(2147483647, -1);
> [ARITHMETIC_OVERFLOW] Overflow. If necessary set "spark.sql.ansi.enabled" to 
> "false" to bypass this error.{code}
> {code:java}
> spark-sql> select round(127y, -1);
> -126 {code}
>    



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42833) Refactor `applyExtensions` in `SparkSession`

2023-03-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42833.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40465
[https://github.com/apache/spark/pull/40465]

> Refactor `applyExtensions` in `SparkSession`
> 
>
> Key: SPARK-42833
> URL: https://issues.apache.org/jira/browse/SPARK-42833
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>Priority: Minor
> Fix For: 3.5.0
>
>
> Refactor `applyExtensions` in `SparkSession` in order to reduce the 
> duplicated codes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42833) Refactor `applyExtensions` in `SparkSession`

2023-03-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42833:
-

Assignee: Kazuyuki Tanimura

> Refactor `applyExtensions` in `SparkSession`
> 
>
> Key: SPARK-42833
> URL: https://issues.apache.org/jira/browse/SPARK-42833
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>Priority: Minor
>
> Refactor `applyExtensions` in `SparkSession` in order to reduce the 
> duplicated codes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42779) Allow V2 writes to indicate advisory partition size

2023-03-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42779:
-

Assignee: Anton Okolnychyi

> Allow V2 writes to indicate advisory partition size
> ---
>
> Key: SPARK-42779
> URL: https://issues.apache.org/jira/browse/SPARK-42779
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
>
> Data sources may request a particular distribution and ordering of data for 
> V2 writes. If AQE is enabled, the default session advisory partition size 
> (64MB) will be used as guidance. Unfortunately, this default value can still 
> lead to small files because the written data can be compressed nicely using 
> columnar file formats. Spark should allow data sources to indicate the 
> advisory shuffle partition size, just like it lets data sources request a 
> particular number of partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42779) Allow V2 writes to indicate advisory partition size

2023-03-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42779.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40421
[https://github.com/apache/spark/pull/40421]

> Allow V2 writes to indicate advisory partition size
> ---
>
> Key: SPARK-42779
> URL: https://issues.apache.org/jira/browse/SPARK-42779
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
> Fix For: 3.5.0
>
>
> Data sources may request a particular distribution and ordering of data for 
> V2 writes. If AQE is enabled, the default session advisory partition size 
> (64MB) will be used as guidance. Unfortunately, this default value can still 
> lead to small files because the written data can be compressed nicely using 
> columnar file formats. Spark should allow data sources to indicate the 
> advisory shuffle partition size, just like it lets data sources request a 
> particular number of partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42437) Pyspark catalog.cacheTable allow to specify storage level Connect add support Storagelevel

2023-03-17 Thread Khalid Mammadov (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42437?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khalid Mammadov updated SPARK-42437:

 Target Version/s:   (was: 3.5.0)
Affects Version/s: 3.4.0
   (was: 3.5.0)
  Summary: Pyspark catalog.cacheTable allow to specify storage 
level Connect add support Storagelevel  (was: Pyspark catalog.cacheTable allow 
to specify storage level)

> Pyspark catalog.cacheTable allow to specify storage level Connect add support 
> Storagelevel
> --
>
> Key: SPARK-42437
> URL: https://issues.apache.org/jira/browse/SPARK-42437
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Khalid Mammadov
>Priority: Major
>
> Currently PySpark version of catalog.cacheTable function does not support to 
> specify storage level. This is to add that.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42848) Implement DataFrame.registerTempTable

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701934#comment-17701934
 ] 

Apache Spark commented on SPARK-42848:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40469

> Implement DataFrame.registerTempTable
> -
>
> Key: SPARK-42848
> URL: https://issues.apache.org/jira/browse/SPARK-42848
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42848) Implement DataFrame.registerTempTable

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701933#comment-17701933
 ] 

Apache Spark commented on SPARK-42848:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40469

> Implement DataFrame.registerTempTable
> -
>
> Key: SPARK-42848
> URL: https://issues.apache.org/jira/browse/SPARK-42848
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42848) Implement DataFrame.registerTempTable

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42848:


Assignee: Apache Spark

> Implement DataFrame.registerTempTable
> -
>
> Key: SPARK-42848
> URL: https://issues.apache.org/jira/browse/SPARK-42848
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42848) Implement DataFrame.registerTempTable

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42848:


Assignee: (was: Apache Spark)

> Implement DataFrame.registerTempTable
> -
>
> Key: SPARK-42848
> URL: https://issues.apache.org/jira/browse/SPARK-42848
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42848) Implement DataFrame.registerTempTable

2023-03-17 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-42848:
-

 Summary: Implement DataFrame.registerTempTable
 Key: SPARK-42848
 URL: https://issues.apache.org/jira/browse/SPARK-42848
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-41922) Implement DataFrame `semanticHash`

2023-03-17 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-41922.
---
Resolution: Duplicate

> Implement DataFrame `semanticHash`
> --
>
> Key: SPARK-41922
> URL: https://issues.apache.org/jira/browse/SPARK-41922
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42834) Divided by zero occurs in PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse

2023-03-17 Thread Chandni Singh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701908#comment-17701908
 ] 

Chandni Singh commented on SPARK-42834:
---

We don't expect the `numChunks` to be zero or `bitmaps` to be empty. There was 
a bug in 3.2.0 which was fixed with 
https://issues.apache.org/jira/browse/SPARK-37675
Can you please check if you have this fix?

> Divided by zero occurs in 
> PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse
> 
>
> Key: SPARK-42834
> URL: https://issues.apache.org/jira/browse/SPARK-42834
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Li Ying
>Priority: Major
>
> {color:#22}Sometimes when run a SQL job with push based shuffle, 
> exception occurs as below.  It seems that there’s no element in the bitmaps 
> which stores merge chunk meta. {color}
> {color:#22}Is it a bug that we should not createChunkBlockInfos when 
> bitmaps is empty or the bitmaps should never be empty here ?{color}
>  
> {code:java}
> Caused by: java.lang.ArithmeticException: / by zero
> at 
> org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:117)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:980)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84)
>  {code}
> related code:
> {code:java}
> def createChunkBlockInfosFromMetaResponse(
> shuffleId: Int,
> shuffleMergeId: Int,
> reduceId: Int,
> blockSize: Long,
> bitmaps: Array[RoaringBitmap]): ArrayBuffer[(BlockId, Long, Int)] = {
>   val approxChunkSize = blockSize / bitmaps.length
>   val blocksToFetch = new ArrayBuffer[(BlockId, Long, Int)]()
>   for (i <- bitmaps.indices) {
> val blockChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, 
> reduceId, i)
> chunksMetaMap.put(blockChunkId, bitmaps(i))
> logDebug(s"adding block chunk $blockChunkId of size $approxChunkSize")
> blocksToFetch += ((blockChunkId, approxChunkSize, SHUFFLE_PUSH_MAP_ID))
>   }
>   blocksToFetch
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42833) Refactor `applyExtensions` in `SparkSession`

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701903#comment-17701903
 ] 

Apache Spark commented on SPARK-42833:
--

User 'kazuyukitanimura' has created a pull request for this issue:
https://github.com/apache/spark/pull/40465

> Refactor `applyExtensions` in `SparkSession`
> 
>
> Key: SPARK-42833
> URL: https://issues.apache.org/jira/browse/SPARK-42833
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Priority: Minor
>
> Refactor `applyExtensions` in `SparkSession` in order to reduce the 
> duplicated codes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42760) The partition of result data frame of join is always 1

2023-03-17 Thread binyang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701742#comment-17701742
 ] 

binyang commented on SPARK-42760:
-

Disabling AQE solved my problem. Thank you!

> The partition of result data frame of join is always 1
> --
>
> Key: SPARK-42760
> URL: https://issues.apache.org/jira/browse/SPARK-42760
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.3.2
> Environment: standard spark 3.0.3/3.3.2, using in jupyter notebook, 
> local mode
>Reporter: binyang
>Priority: Major
>
> I am using pyspark. The partition of result data frame of join is always 1.
> Here is my code from 
> https://stackoverflow.com/questions/51876281/is-partitioning-retained-after-a-spark-sql-join
>  
> print(spark.version)
> def example_shuffle_partitions(data_partitions=10, shuffle_partitions=4):
>     spark.conf.set("spark.sql.shuffle.partitions", shuffle_partitions)
>     spark.sql("SET spark.sql.autoBroadcastJoinThreshold=-1")
>     df1 = spark.range(1, 1000).repartition(data_partitions)
>     df2 = spark.range(1, 2000).repartition(data_partitions)
>     df3 = spark.range(1, 3000).repartition(data_partitions)
>     print("Data partitions is: {}. Shuffle partitions is 
> {}".format(data_partitions, shuffle_partitions))
>     print("Data partitions before join: 
> {}".format(df1.rdd.getNumPartitions()))
>     df = (df1.join(df2, df1.id == df2.id)
>           .join(df3, df1.id == df3.id))
>     print("Data partitions after join : {}".format(df.rdd.getNumPartitions()))
> example_shuffle_partitions()
>  
> In Spark 3.0.3, it prints out:
> 3.0.3
> Data partitions is: 10. Shuffle partitions is 4
> Data partitions before join: 10
> Data partitions after join : 4
> However, it prints out the following in the latest 3.3.2
> 3.3.2
> Data partitions is: 10. Shuffle partitions is 4
> Data partitions before join: 10
> Data partitions after join : 1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42847) Assign a name to the error class _LEGACY_ERROR_TEMP_2013

2023-03-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-42847:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2013* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2011* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2013
> 
>
> Key: SPARK-42847
> URL: https://issues.apache.org/jira/browse/SPARK-42847
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2013* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42847) Assign a name to the error class _LEGACY_ERROR_TEMP_2013

2023-03-17 Thread Max Gekk (Jira)
Max Gekk created SPARK-42847:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2013
 Key: SPARK-42847
 URL: https://issues.apache.org/jira/browse/SPARK-42847
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2011* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42846) Assign a name to the error class _LEGACY_ERROR_TEMP_2011

2023-03-17 Thread Max Gekk (Jira)
Max Gekk created SPARK-42846:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2011
 Key: SPARK-42846
 URL: https://issues.apache.org/jira/browse/SPARK-42846
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2010* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42846) Assign a name to the error class _LEGACY_ERROR_TEMP_2011

2023-03-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-42846:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2011* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2010* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2011
> 
>
> Key: SPARK-42846
> URL: https://issues.apache.org/jira/browse/SPARK-42846
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2011* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42845) Assign a name to the error class _LEGACY_ERROR_TEMP_2010

2023-03-17 Thread Max Gekk (Jira)
Max Gekk created SPARK-42845:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2010
 Key: SPARK-42845
 URL: https://issues.apache.org/jira/browse/SPARK-42845
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2008* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42845) Assign a name to the error class _LEGACY_ERROR_TEMP_2010

2023-03-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-42845:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2010* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2008* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2010
> 
>
> Key: SPARK-42845
> URL: https://issues.apache.org/jira/browse/SPARK-42845
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2010* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42843) Assign a name to the error class _LEGACY_ERROR_TEMP_2007

2023-03-17 Thread Max Gekk (Jira)
Max Gekk created SPARK-42843:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2007
 Key: SPARK-42843
 URL: https://issues.apache.org/jira/browse/SPARK-42843
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2006* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42844) Assign a name to the error class _LEGACY_ERROR_TEMP_2008

2023-03-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-42844:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2008* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2007* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2008
> 
>
> Key: SPARK-42844
> URL: https://issues.apache.org/jira/browse/SPARK-42844
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2008* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42844) Assign a name to the error class _LEGACY_ERROR_TEMP_2008

2023-03-17 Thread Max Gekk (Jira)
Max Gekk created SPARK-42844:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2008
 Key: SPARK-42844
 URL: https://issues.apache.org/jira/browse/SPARK-42844
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2007* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42843) Assign a name to the error class _LEGACY_ERROR_TEMP_2007

2023-03-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-42843:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2007* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2006* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2007
> 
>
> Key: SPARK-42843
> URL: https://issues.apache.org/jira/browse/SPARK-42843
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2007* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42842) Assign a name to the error class _LEGACY_ERROR_TEMP_2006

2023-03-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-42842:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2006* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2005* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2006
> 
>
> Key: SPARK-42842
> URL: https://issues.apache.org/jira/browse/SPARK-42842
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2006* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42842) Assign a name to the error class _LEGACY_ERROR_TEMP_2006

2023-03-17 Thread Max Gekk (Jira)
Max Gekk created SPARK-42842:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2006
 Key: SPARK-42842
 URL: https://issues.apache.org/jira/browse/SPARK-42842
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2005* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42636) Audit annotation usage

2023-03-17 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701703#comment-17701703
 ] 

jiaan.geng commented on SPARK-42636:


I will take a look!

> Audit annotation usage
> --
>
> Key: SPARK-42636
> URL: https://issues.apache.org/jira/browse/SPARK-42636
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Annotation usage is not entirely consistent in the client. We should probably 
> remove all Stable annotations and add a few DevelopApi ones.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-42584) Improve output of Column.explain

2023-03-17 Thread jiaan.geng (Jira)


[ https://issues.apache.org/jira/browse/SPARK-42584 ]


jiaan.geng deleted comment on SPARK-42584:


was (Author: beliefer):
I will take a look!

> Improve output of Column.explain
> 
>
> Key: SPARK-42584
> URL: https://issues.apache.org/jira/browse/SPARK-42584
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> We currently display the structure of the proto in both the regular and 
> extended version of explain. We should display a more compact sql-a-like 
> string for the regular version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42840) Assign a name to the error class _LEGACY_ERROR_TEMP_2004

2023-03-17 Thread Max Gekk (Jira)
Max Gekk created SPARK-42840:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2004
 Key: SPARK-42840
 URL: https://issues.apache.org/jira/browse/SPARK-42840
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42841) Assign a name to the error class _LEGACY_ERROR_TEMP_2005

2023-03-17 Thread Max Gekk (Jira)
Max Gekk created SPARK-42841:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2005
 Key: SPARK-42841
 URL: https://issues.apache.org/jira/browse/SPARK-42841
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2004* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42841) Assign a name to the error class _LEGACY_ERROR_TEMP_2005

2023-03-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-42841:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2005* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2004* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2005
> 
>
> Key: SPARK-42841
> URL: https://issues.apache.org/jira/browse/SPARK-42841
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2005* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42840) Assign a name to the error class _LEGACY_ERROR_TEMP_2004

2023-03-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-42840:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2004* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2004
> 
>
> Key: SPARK-42840
> URL: https://issues.apache.org/jira/browse/SPARK-42840
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2004* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42838) Assign a name to the error class _LEGACY_ERROR_TEMP_2000

2023-03-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-42838:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2000
> 
>
> Key: SPARK-42838
> URL: https://issues.apache.org/jira/browse/SPARK-42838
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003

2023-03-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-42839:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see {*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see 

{*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2003
> 
>
> Key: SPARK-42839
> URL: https://issues.apache.org/jira/browse/SPARK-42839
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003

2023-03-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-42839:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

If you cannot reproduce the error from user space (using SQL query), replace 
the error by an internal error, see 

{*}SparkException.internalError(){*}.

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2003
> 
>
> Key: SPARK-42839
> URL: https://issues.apache.org/jira/browse/SPARK-42839
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see 
> {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-42760) The partition of result data frame of join is always 1

2023-03-17 Thread Yuming Wang (Jira)


[ https://issues.apache.org/jira/browse/SPARK-42760 ]


Yuming Wang deleted comment on SPARK-42760:
-

was (Author: apachespark):
User '1511351836' has created a pull request for this issue:
https://github.com/apache/spark/pull/40380

> The partition of result data frame of join is always 1
> --
>
> Key: SPARK-42760
> URL: https://issues.apache.org/jira/browse/SPARK-42760
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.3.2
> Environment: standard spark 3.0.3/3.3.2, using in jupyter notebook, 
> local mode
>Reporter: binyang
>Priority: Major
>
> I am using pyspark. The partition of result data frame of join is always 1.
> Here is my code from 
> https://stackoverflow.com/questions/51876281/is-partitioning-retained-after-a-spark-sql-join
>  
> print(spark.version)
> def example_shuffle_partitions(data_partitions=10, shuffle_partitions=4):
>     spark.conf.set("spark.sql.shuffle.partitions", shuffle_partitions)
>     spark.sql("SET spark.sql.autoBroadcastJoinThreshold=-1")
>     df1 = spark.range(1, 1000).repartition(data_partitions)
>     df2 = spark.range(1, 2000).repartition(data_partitions)
>     df3 = spark.range(1, 3000).repartition(data_partitions)
>     print("Data partitions is: {}. Shuffle partitions is 
> {}".format(data_partitions, shuffle_partitions))
>     print("Data partitions before join: 
> {}".format(df1.rdd.getNumPartitions()))
>     df = (df1.join(df2, df1.id == df2.id)
>           .join(df3, df1.id == df3.id))
>     print("Data partitions after join : {}".format(df.rdd.getNumPartitions()))
> example_shuffle_partitions()
>  
> In Spark 3.0.3, it prints out:
> 3.0.3
> Data partitions is: 10. Shuffle partitions is 4
> Data partitions before join: 10
> Data partitions after join : 4
> However, it prints out the following in the latest 3.3.2
> 3.3.2
> Data partitions is: 10. Shuffle partitions is 4
> Data partitions before join: 10
> Data partitions after join : 1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003

2023-03-17 Thread Max Gekk (Jira)
Max Gekk created SPARK-42839:


 Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2003
 Key: SPARK-42839
 URL: https://issues.apache.org/jira/browse/SPARK-42839
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42839) Assign a name to the error class _LEGACY_ERROR_TEMP_2003

2023-03-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-42839:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]


> Assign a name to the error class _LEGACY_ERROR_TEMP_2003
> 
>
> Key: SPARK-42839
> URL: https://issues.apache.org/jira/browse/SPARK-42839
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2003* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42584) Improve output of Column.explain

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42584:


Assignee: Apache Spark

> Improve output of Column.explain
> 
>
> Key: SPARK-42584
> URL: https://issues.apache.org/jira/browse/SPARK-42584
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>
> We currently display the structure of the proto in both the regular and 
> extended version of explain. We should display a more compact sql-a-like 
> string for the regular version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42584) Improve output of Column.explain

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42584:


Assignee: (was: Apache Spark)

> Improve output of Column.explain
> 
>
> Key: SPARK-42584
> URL: https://issues.apache.org/jira/browse/SPARK-42584
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> We currently display the structure of the proto in both the regular and 
> extended version of explain. We should display a more compact sql-a-like 
> string for the regular version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42584) Improve output of Column.explain

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701690#comment-17701690
 ] 

Apache Spark commented on SPARK-42584:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40467

> Improve output of Column.explain
> 
>
> Key: SPARK-42584
> URL: https://issues.apache.org/jira/browse/SPARK-42584
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> We currently display the structure of the proto in both the regular and 
> extended version of explain. We should display a more compact sql-a-like 
> string for the regular version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42838) Assign a name to the error class _LEGACY_ERROR_TEMP_2000

2023-03-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-42838:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code if such test still doesn't 
exist. Check exception fields by using {*}checkError(){*}. The last function 
checks valuable error fields only, and avoids dependencies from error text 
message. In this way, tech editors can modify error format in 
error-classes.json, and don't worry of Spark's internal tests. Migrate other 
tests that might trigger the error onto checkError().

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:
 * [https://github.com/apache/spark/pull/38685]
 * [https://github.com/apache/spark/pull/38656]
 * [https://github.com/apache/spark/pull/38490]

  was:
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code. Check exception field by 
using {*}checkError(){*}. The function checks valuable error fields only, and 
avoids dependencies from error text message. In this way, tech editors can 
modify error format in error-classes.json, and don't worry of Spark's internal 
tests.

 

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:

[https://github.com/apache/spark/pull/38685]

[https://github.com/apache/spark/pull/38656]

https://github.com/apache/spark/pull/38490


> Assign a name to the error class _LEGACY_ERROR_TEMP_2000
> 
>
> Key: SPARK-42838
> URL: https://issues.apache.org/jira/browse/SPARK-42838
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42838) Assign a name to the error class _LEGACY_ERROR_TEMP_2000

2023-03-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-42838:
-
Description: 
Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code. Check exception field by 
using {*}checkError(){*}. The function checks valuable error fields only, and 
avoids dependencies from error text message. In this way, tech editors can 
modify error format in error-classes.json, and don't worry of Spark's internal 
tests.

 

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.

Please, look at the PR below as examples:

[https://github.com/apache/spark/pull/38685]

[https://github.com/apache/spark/pull/38656]

https://github.com/apache/spark/pull/38490

  was:
Choose a proper name for the error class _LEGACY_ERROR_TEMP_2000 defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code. Check exception field by 
using {*}checkError(){*}. The function checks valuable error fields only, and 
avoids dependencies from error text message. In this way, tech editors can 
modify error format in error-classes.json, and don't worry of Spark's internal 
tests.

 

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.


> Assign a name to the error class _LEGACY_ERROR_TEMP_2000
> 
>
> Key: SPARK-42838
> URL: https://issues.apache.org/jira/browse/SPARK-42838
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2000* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code. Check exception field by 
> using {*}checkError(){*}. The function checks valuable error fields only, and 
> avoids dependencies from error text message. In this way, tech editors can 
> modify error format in error-classes.json, and don't worry of Spark's 
> internal tests.
>  
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
> [https://github.com/apache/spark/pull/38685]
> [https://github.com/apache/spark/pull/38656]
> https://github.com/apache/spark/pull/38490



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42838) Assign a name to the error class _LEGACY_ERROR_TEMP_2000

2023-03-17 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk updated SPARK-42838:
-
Summary: Assign a name to the error class _LEGACY_ERROR_TEMP_2000  (was: 
Assing a name to the error class _LEGACY_ERROR_TEMP_2000)

> Assign a name to the error class _LEGACY_ERROR_TEMP_2000
> 
>
> Key: SPARK-42838
> URL: https://issues.apache.org/jira/browse/SPARK-42838
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: starter
>
> Choose a proper name for the error class _LEGACY_ERROR_TEMP_2000 defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code. Check exception field by 
> using {*}checkError(){*}. The function checks valuable error fields only, and 
> avoids dependencies from error text message. In this way, tech editors can 
> modify error format in error-classes.json, and don't worry of Spark's 
> internal tests.
>  
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42838) Assing a name to the error class _LEGACY_ERROR_TEMP_2000

2023-03-17 Thread Max Gekk (Jira)
Max Gekk created SPARK-42838:


 Summary: Assing a name to the error class _LEGACY_ERROR_TEMP_2000
 Key: SPARK-42838
 URL: https://issues.apache.org/jira/browse/SPARK-42838
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.5.0
Reporter: Max Gekk


Choose a proper name for the error class _LEGACY_ERROR_TEMP_2000 defined in 
{*}core/src/main/resources/error/error-classes.json{*}. The name should be 
short but complete (look at the example in error-classes.json).

Add a test which triggers the error from user code. Check exception field by 
using {*}checkError(){*}. The function checks valuable error fields only, and 
avoids dependencies from error text message. In this way, tech editors can 
modify error format in error-classes.json, and don't worry of Spark's internal 
tests.

 

Improve the error message format in error-classes.json if the current is not 
clear. Propose a solution to users how to avoid and fix such kind of errors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42837) spark-submit - issue when resolving dependencies hosted on a private repository in kubernetes cluster mode

2023-03-17 Thread lione Herbet (Jira)
lione Herbet created SPARK-42837:


 Summary: spark-submit - issue when resolving dependencies hosted 
on a private repository in kubernetes cluster mode
 Key: SPARK-42837
 URL: https://issues.apache.org/jira/browse/SPARK-42837
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 3.3.2
Reporter: lione Herbet


When using [spark 
operator|https://github.com/GoogleCloudPlatform/spark-on-k8s-operator], if 
dependencies are hosted on a private repository with authentication needed 
(like S3 or OCI) the spark operator submitting the job need to have all the 
secrets to access all dependencies. If not the spark-submit fails.

On a multi tenant kubernetes cluster where the spark operator and spark jobs 
execution are on seperate namespaces, it involves duplicating all secrets or it 
won't work.

It seems that spark-submit need to acces dependencies (with credentials) only 
to resolveGlobPath 
([https://github.com/apache/spark/blob/v3.3.2/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L364-L367)]
 . It seems to me (but need to be confirmed by someone more skilled than me on 
spark internals behavior) that this resolveGlobPath task is also done when the 
driver is downloading the jars.

Would it be possible to have this resolveGlobPath task skipped when running on 
a  Kubernetes Cluster in cluster mode ?

For example add a condition like this arround the 364-367 lines :
{code:java}
if (isKubernetesCluster) {
...
} {code}
We could even, for compatibility reason with old behavior if needed, add also a 
condition on a spark parameter like this :
{code:java}
if (isKubernetesCluster && 
sparkConf.getBoolean("spark.kubernetes.resolevGlobPathsInSubmit", true)) { 
...
}{code}
i tested both solution locally and it seems to resolve the case.

Do yout think I need to consider other elements ?

I may submit a patch depending on your feedback



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42836) Support for recursive queries

2023-03-17 Thread Max (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max updated SPARK-42836:

Component/s: SQL

> Support for recursive queries
> -
>
> Key: SPARK-42836
> URL: https://issues.apache.org/jira/browse/SPARK-42836
> Project: Spark
>  Issue Type: Question
>  Components: Java API, SQL
>Affects Versions: 3.4.0
>Reporter: Max
>Priority: Blocker
>
> Hello, a subtask was created a long time ago 
> https://issues.apache.org/jira/browse/SPARK-24497
> When will this task be completed? We really miss this.
> Thx.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42836) Support for recursive queries

2023-03-17 Thread Max (Jira)
Max created SPARK-42836:
---

 Summary: Support for recursive queries
 Key: SPARK-42836
 URL: https://issues.apache.org/jira/browse/SPARK-42836
 Project: Spark
  Issue Type: Question
  Components: Java API
Affects Versions: 3.4.0
Reporter: Max


Hello, a subtask was created a long time ago 
https://issues.apache.org/jira/browse/SPARK-24497

When will this task be completed? We really miss this.

Thx.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42835) Add test cases for Column.explain

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701623#comment-17701623
 ] 

Apache Spark commented on SPARK-42835:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40466

> Add test cases for Column.explain
> -
>
> Key: SPARK-42835
> URL: https://issues.apache.org/jira/browse/SPARK-42835
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42835) Add test cases for Column.explain

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42835:


Assignee: (was: Apache Spark)

> Add test cases for Column.explain
> -
>
> Key: SPARK-42835
> URL: https://issues.apache.org/jira/browse/SPARK-42835
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42835) Add test cases for Column.explain

2023-03-17 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701621#comment-17701621
 ] 

Apache Spark commented on SPARK-42835:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40466

> Add test cases for Column.explain
> -
>
> Key: SPARK-42835
> URL: https://issues.apache.org/jira/browse/SPARK-42835
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42835) Add test cases for Column.explain

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42835:


Assignee: Apache Spark

> Add test cases for Column.explain
> -
>
> Key: SPARK-42835
> URL: https://issues.apache.org/jira/browse/SPARK-42835
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42835) Add test cases for Column.explain

2023-03-17 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-42835:
--

 Summary: Add test cases for Column.explain
 Key: SPARK-42835
 URL: https://issues.apache.org/jira/browse/SPARK-42835
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.5.0
Reporter: jiaan.geng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42834) Divided by zero occurs in PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse

2023-03-17 Thread Li Ying (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17701610#comment-17701610
 ] 

Li Ying commented on SPARK-42834:
-

[~csingh] Could you please help confirm this?

> Divided by zero occurs in 
> PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse
> 
>
> Key: SPARK-42834
> URL: https://issues.apache.org/jira/browse/SPARK-42834
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Li Ying
>Priority: Major
>
> {color:#22}Sometimes when run a SQL job with push based shuffle, 
> exception occurs as below.  It seems that there’s no element in the bitmaps 
> which stores merge chunk meta. {color}
> {color:#22}Is it a bug that we should not createChunkBlockInfos when 
> bitmaps is empty or the bitmaps should never be empty here ?{color}
>  
> {code:java}
> Caused by: java.lang.ArithmeticException: / by zero
> at 
> org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:117)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:980)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84)
>  {code}
> related code:
> {code:java}
> def createChunkBlockInfosFromMetaResponse(
> shuffleId: Int,
> shuffleMergeId: Int,
> reduceId: Int,
> blockSize: Long,
> bitmaps: Array[RoaringBitmap]): ArrayBuffer[(BlockId, Long, Int)] = {
>   val approxChunkSize = blockSize / bitmaps.length
>   val blocksToFetch = new ArrayBuffer[(BlockId, Long, Int)]()
>   for (i <- bitmaps.indices) {
> val blockChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, 
> reduceId, i)
> chunksMetaMap.put(blockChunkId, bitmaps(i))
> logDebug(s"adding block chunk $blockChunkId of size $approxChunkSize")
> blocksToFetch += ((blockChunkId, approxChunkSize, SHUFFLE_PUSH_MAP_ID))
>   }
>   blocksToFetch
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42834) Divided by zero occurs in PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse

2023-03-17 Thread Li Ying (Jira)
Li Ying created SPARK-42834:
---

 Summary: Divided by zero occurs in 
PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse
 Key: SPARK-42834
 URL: https://issues.apache.org/jira/browse/SPARK-42834
 Project: Spark
  Issue Type: Bug
  Components: Shuffle
Affects Versions: 3.2.0
Reporter: Li Ying


{color:#22}Sometimes when run a SQL job with push based shuffle, exception 
occurs as below.  It seems that there’s no element in the bitmaps which stores 
merge chunk meta. See 
org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse.{color}
{color:#22} {color}
{color:#22}Is it a bug that we should not createChunkBlockInfos when 
bitmaps is empty or the bitmaps should never be empty here ?{color}
 
{code:java}
Caused by: java.lang.ArithmeticException: / by zero
at 
org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:117)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:980)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84)
 {code}
related code:
{code:java}
def createChunkBlockInfosFromMetaResponse(
shuffleId: Int,
shuffleMergeId: Int,
reduceId: Int,
blockSize: Long,
bitmaps: Array[RoaringBitmap]): ArrayBuffer[(BlockId, Long, Int)] = {
  val approxChunkSize = blockSize / bitmaps.length
  val blocksToFetch = new ArrayBuffer[(BlockId, Long, Int)]()
  for (i <- bitmaps.indices) {
val blockChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, reduceId, 
i)
chunksMetaMap.put(blockChunkId, bitmaps(i))
logDebug(s"adding block chunk $blockChunkId of size $approxChunkSize")
blocksToFetch += ((blockChunkId, approxChunkSize, SHUFFLE_PUSH_MAP_ID))
  }
  blocksToFetch
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42834) Divided by zero occurs in PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse

2023-03-17 Thread Li Ying (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Ying updated SPARK-42834:

Description: 
{color:#22}Sometimes when run a SQL job with push based shuffle, exception 
occurs as below.  It seems that there’s no element in the bitmaps which stores 
merge chunk meta. {color}


{color:#22}Is it a bug that we should not createChunkBlockInfos when 
bitmaps is empty or the bitmaps should never be empty here ?{color}
 
{code:java}
Caused by: java.lang.ArithmeticException: / by zero
at 
org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:117)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:980)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84)
 {code}
related code:
{code:java}
def createChunkBlockInfosFromMetaResponse(
shuffleId: Int,
shuffleMergeId: Int,
reduceId: Int,
blockSize: Long,
bitmaps: Array[RoaringBitmap]): ArrayBuffer[(BlockId, Long, Int)] = {
  val approxChunkSize = blockSize / bitmaps.length
  val blocksToFetch = new ArrayBuffer[(BlockId, Long, Int)]()
  for (i <- bitmaps.indices) {
val blockChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, reduceId, 
i)
chunksMetaMap.put(blockChunkId, bitmaps(i))
logDebug(s"adding block chunk $blockChunkId of size $approxChunkSize")
blocksToFetch += ((blockChunkId, approxChunkSize, SHUFFLE_PUSH_MAP_ID))
  }
  blocksToFetch
} {code}

  was:
{color:#22}Sometimes when run a SQL job with push based shuffle, exception 
occurs as below.  It seems that there’s no element in the bitmaps which stores 
merge chunk meta. See 
org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse.{color}
{color:#22} {color}
{color:#22}Is it a bug that we should not createChunkBlockInfos when 
bitmaps is empty or the bitmaps should never be empty here ?{color}
 
{code:java}
Caused by: java.lang.ArithmeticException: / by zero
at 
org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:117)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:980)
at 
org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84)
 {code}
related code:
{code:java}
def createChunkBlockInfosFromMetaResponse(
shuffleId: Int,
shuffleMergeId: Int,
reduceId: Int,
blockSize: Long,
bitmaps: Array[RoaringBitmap]): ArrayBuffer[(BlockId, Long, Int)] = {
  val approxChunkSize = blockSize / bitmaps.length
  val blocksToFetch = new ArrayBuffer[(BlockId, Long, Int)]()
  for (i <- bitmaps.indices) {
val blockChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, reduceId, 
i)
chunksMetaMap.put(blockChunkId, bitmaps(i))
logDebug(s"adding block chunk $blockChunkId of size $approxChunkSize")
blocksToFetch += ((blockChunkId, approxChunkSize, SHUFFLE_PUSH_MAP_ID))
  }
  blocksToFetch
} {code}


> Divided by zero occurs in 
> PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse
> 
>
> Key: SPARK-42834
> URL: https://issues.apache.org/jira/browse/SPARK-42834
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.2.0
>Reporter: Li Ying
>Priority: Major
>
> {color:#22}Sometimes when run a SQL job with push based shuffle, 
> exception occurs as below.  It seems that there’s no element in the bitmaps 
> which stores merge chunk meta. {color}
> {color:#22}Is it a bug that we should not createChunkBlockInfos when 
> bitmaps is empty or the bitmaps should never be empty here ?{color}
>  
> {code:java}
> Caused by: java.lang.ArithmeticException: / by zero
> at 
> org.apache.spark.storage.PushBasedFetchHelper.createChunkBlockInfosFromMetaResponse(PushBasedFetchHelper.scala:117)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:980)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:84)
>  {code}
> related code:
> {code:java}
> def createChunkBlockInfosFromMetaResponse(
> shuffleId: Int,
> shuffleMergeId: Int,
> reduceId: Int,
> blockSize: Long,
> bitmaps: Array[RoaringBitmap]): ArrayBuffer[(BlockId, Long, Int)] = {
>   val approxChunkSize = blockSize / bitmaps.length
>   val blocksToFetch = new ArrayBuffer[(BlockId, Long, Int)]()
>   for (i <- bitmaps.indices) {
> val blockChunkId = ShuffleBlockChunkId(shuffleId, shuffleMergeId, 
> reduceId, i)
> chunksMetaMap.put(blockChunkId, bitmaps(i))
> logDebug(s"adding block chunk $blockChunkId of size $approxChunkSize")
> blocksToFetch += ((bl

[jira] [Assigned] (SPARK-42833) Refactor `applyExtensions` in `SparkSession`

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42833:


Assignee: Apache Spark

> Refactor `applyExtensions` in `SparkSession`
> 
>
> Key: SPARK-42833
> URL: https://issues.apache.org/jira/browse/SPARK-42833
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Assignee: Apache Spark
>Priority: Minor
>
> Refactor `applyExtensions` in `SparkSession` in order to reduce the 
> duplicated codes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42833) Refactor `applyExtensions` in `SparkSession`

2023-03-17 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42833:


Assignee: (was: Apache Spark)

> Refactor `applyExtensions` in `SparkSession`
> 
>
> Key: SPARK-42833
> URL: https://issues.apache.org/jira/browse/SPARK-42833
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Kazuyuki Tanimura
>Priority: Minor
>
> Refactor `applyExtensions` in `SparkSession` in order to reduce the 
> duplicated codes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org