[jira] [Commented] (SPARK-47344) Enhance error message for invalid identifiers that need backticks

2024-03-12 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825757#comment-17825757
 ] 

Ignite TC Bot commented on SPARK-47344:
---

User 'srielau' has created a pull request for this issue:
https://github.com/apache/spark/pull/45470

> Enhance error message for invalid identifiers that need backticks
> -
>
> Key: SPARK-47344
> URL: https://issues.apache.org/jira/browse/SPARK-47344
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Priority: Major
>
> We detect patterns like "my-tab" and raise a meaningful INVALID_IDENTIFIER 
> error when it is not surrounded by back quotes.
> In this ticket we want to improve this effort to go beyond dashes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24203) Make executor's bindAddress configurable

2023-09-11 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763821#comment-17763821
 ] 

Ignite TC Bot commented on SPARK-24203:
---

User 'gedeh' has created a pull request for this issue:
https://github.com/apache/spark/pull/42870

> Make executor's bindAddress configurable
> 
>
> Key: SPARK-24203
> URL: https://issues.apache.org/jira/browse/SPARK-24203
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.1
>Reporter: Lukas Majercak
>Assignee: Nishchal Venkataramana
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44732) Port the initial implementation of Spark XML data source

2023-09-07 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762891#comment-17762891
 ] 

Ignite TC Bot commented on SPARK-44732:
---

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/42844

> Port the initial implementation of Spark XML data source
> 
>
> Key: SPARK-44732
> URL: https://issues.apache.org/jira/browse/SPARK-44732
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45075) Alter table with invalid default value will not report error

2023-09-05 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762110#comment-17762110
 ] 

Ignite TC Bot commented on SPARK-45075:
---

User 'Hisoka-X' has created a pull request for this issue:
https://github.com/apache/spark/pull/42810

> Alter table with invalid default value will not report error
> 
>
> Key: SPARK-45075
> URL: https://issues.apache.org/jira/browse/SPARK-45075
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Jia Fan
>Priority: Major
>
> create table t(i boolean, s bigint);
> alter table t alter column s set default badvalue;
>  
> The code wouldn't report error on DataSource V2, not align with V1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44640) Improve error messages for Python UDTF returning non iterable

2023-08-31 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761038#comment-17761038
 ] 

Ignite TC Bot commented on SPARK-44640:
---

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/42726

> Improve error messages for Python UDTF returning non iterable
> -
>
> Key: SPARK-44640
> URL: https://issues.apache.org/jira/browse/SPARK-44640
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 4.0.0
>
>
> When the return type of a UDTF is not an iterable, the error message can be 
> confusing to users. For example for this UDTF:
> {code:java}
> @udtf(returnType="x: int")
> class TestUDTF:
> def eval(self, a):
> return a {code}
> Currently it fails with this error for regular UDTFs:
>     return tuple(map(verify_and_convert_result, res))
> TypeError: 'int' object is not iterable
> And this error for arrow-optimized UDTFs:
>     raise ValueError("DataFrame constructor not properly called!")
> ValueError: DataFrame constructor not properly called!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44846) PushFoldableIntoBranches in complex grouping expressions may cause bindReference error

2023-08-31 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761037#comment-17761037
 ] 

Ignite TC Bot commented on SPARK-44846:
---

User 'zml1206' has created a pull request for this issue:
https://github.com/apache/spark/pull/42633

> PushFoldableIntoBranches in complex grouping expressions may cause 
> bindReference error
> --
>
> Key: SPARK-44846
> URL: https://issues.apache.org/jira/browse/SPARK-44846
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: zhuml
>Priority: Major
>
> SQL:
> {code:java}
> select c*2 as d from
> (select if(b > 1, 1, b) as c from
> (select if(a < 0, 0 ,a) as b from t group by b) t1
> group by c) t2 {code}
> ERROR:
> {code:java}
> Couldn't find _groupingexpression#15 in [if ((_groupingexpression#15 > 1)) 1 
> else _groupingexpression#15#16]
> java.lang.IllegalStateException: Couldn't find _groupingexpression#15 in [if 
> ((_groupingexpression#15 > 1)) 1 else _groupingexpression#15#16]
>     at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80)
>     at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
>     at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466)
>     at 
> org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1241)
>     at 
> org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1240)
>     at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:653)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466)
>     at 
> org.apache.spark.sql.catalyst.trees.TernaryLike.mapChildren(TreeNode.scala:1272)
>     at 
> org.apache.spark.sql.catalyst.trees.TernaryLike.mapChildren$(TreeNode.scala:1271)
>     at 
> org.apache.spark.sql.catalyst.expressions.If.mapChildren(conditionalExpressions.scala:41)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466)
>     at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1215)
>     at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1214)
>     at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:533)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437)
>     at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:405)
>     at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:73)
>     at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:94)
>     at scala.collection.immutable.List.map(List.scala:293)
>     at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:94)
>     at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.generateResultFunction(HashAggregateExec.scala:360)
>     at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduceWithKeys(HashAggregateExec.scala:538)
>     at 
> org.apache.spark.sql.execution.aggregate.AggregateCodegenSupport.doProduce(AggregateCodegenSupport.scala:69)
>     at 
> org.apache.spark.sql.execution.aggregate.AggregateCodegenSupport.doProduce$(AggregateCodegenSupport.scala:65)
>     at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduce(HashAggregateExec.scala:49)
>     at 
> org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:97)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
>     at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>     at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
>     at 
> org.apache.spark.sql.execution.CodegenSupport.produce(WholeStageCodegenExec.scala:92)
>     at 
> 

[jira] [Commented] (SPARK-44987) Assign name to the error class _LEGACY_ERROR_TEMP_1100

2023-08-31 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761034#comment-17761034
 ] 

Ignite TC Bot commented on SPARK-44987:
---

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/42737

> Assign name to the error class _LEGACY_ERROR_TEMP_1100
> --
>
> Key: SPARK-44987
> URL: https://issues.apache.org/jira/browse/SPARK-44987
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Minor
>
> Assign a name and improve the error message format.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44910) Encoders.bean does not support superclasses with generic type arguments

2023-08-31 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761036#comment-17761036
 ] 

Ignite TC Bot commented on SPARK-44910:
---

User 'gbloisi-openaire' has created a pull request for this issue:
https://github.com/apache/spark/pull/42634

> Encoders.bean does not support superclasses with generic type arguments
> ---
>
> Key: SPARK-44910
> URL: https://issues.apache.org/jira/browse/SPARK-44910
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0, 4.0.0
>Reporter: Giambattista Bloisi
>Priority: Major
>
> As per SPARK-44634 another unsupported feature of bean encoder is when the 
> superclass of the bean has generic type arguments. For example:
> {code:java}
> class JavaBeanWithGenericsA {
> public T getPropertyA() {
> return null;
> }
> public void setPropertyA(T a) {
> }
> }
> class JavaBeanWithGenericBase extends JavaBeanWithGenericsA {
> }
> Encoders.bean(JavaBeanWithGenericBase.class); // Exception
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44994) Refine docstring of `DataFrame.filter`

2023-08-31 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761035#comment-17761035
 ] 

Ignite TC Bot commented on SPARK-44994:
---

User 'allisonwang-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/42708

> Refine docstring of `DataFrame.filter`
> --
>
> Key: SPARK-44994
> URL: https://issues.apache.org/jira/browse/SPARK-44994
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> Refine the docstring and add more examples for DataFrame.filter



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45014) Clean up fileserver when cleaning up files, jars and archives in SparkContext

2023-08-30 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760657#comment-17760657
 ] 

Ignite TC Bot commented on SPARK-45014:
---

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/42731

> Clean up fileserver when cleaning up files, jars and archives in SparkContext
> -
>
> Key: SPARK-45014
> URL: https://issues.apache.org/jira/browse/SPARK-45014
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> In SPARK-44348, we clean up Spark Context's added files but we don't clean up 
> the ones in fileserver.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45021) Remove `antlr4-maven-plugin` configuration from `sql/catalyst/pom.xml`

2023-08-30 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760404#comment-17760404
 ] 

Ignite TC Bot commented on SPARK-45021:
---

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/42739

> Remove `antlr4-maven-plugin` configuration from `sql/catalyst/pom.xml`
> --
>
> Key: SPARK-45021
> URL: https://issues.apache.org/jira/browse/SPARK-45021
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>
> SPARK-44475 has already moved the relevant configuration to 
> `sql/api/pom.xml`, the configuration in the catalyst module is  unused now.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42304) Assign name to _LEGACY_ERROR_TEMP_2189

2023-08-29 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759898#comment-17759898
 ] 

Ignite TC Bot commented on SPARK-42304:
---

User 'valentinp17' has created a pull request for this issue:
https://github.com/apache/spark/pull/42706

> Assign name to _LEGACY_ERROR_TEMP_2189
> --
>
> Key: SPARK-42304
> URL: https://issues.apache.org/jira/browse/SPARK-42304
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43288) DataSourceV2: CREATE TABLE LIKE

2023-08-21 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756860#comment-17756860
 ] 

Ignite TC Bot commented on SPARK-43288:
---

User 'Hisoka-X' has created a pull request for this issue:
https://github.com/apache/spark/pull/42586

> DataSourceV2: CREATE TABLE LIKE
> ---
>
> Key: SPARK-43288
> URL: https://issues.apache.org/jira/browse/SPARK-43288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: John Zhuge
>Priority: Major
>
> Support CREATE TABLE LIKE in DSv2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44881) Executor stucked on retrying to fetch shuffle data when `java.lang.OutOfMemoryError. unable to create native thread` exception occurred.

2023-08-19 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756445#comment-17756445
 ] 

Ignite TC Bot commented on SPARK-44881:
---

User 'hgs19921112' has created a pull request for this issue:
https://github.com/apache/spark/pull/42572

> Executor stucked on retrying to fetch shuffle data when 
> `java.lang.OutOfMemoryError. unable to create native thread` exception 
> occurred.
> 
>
> Key: SPARK-44881
> URL: https://issues.apache.org/jira/browse/SPARK-44881
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: hgs
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44433) Implement termination of Python process for foreachBatch & streaming listener

2023-08-18 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756075#comment-17756075
 ] 

Ignite TC Bot commented on SPARK-44433:
---

User 'rangadi' has created a pull request for this issue:
https://github.com/apache/spark/pull/42555

> Implement termination of Python process for foreachBatch & streaming listener
> -
>
> Key: SPARK-44433
> URL: https://issues.apache.org/jira/browse/SPARK-44433
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.4.1
>Reporter: Raghu Angadi
>Assignee: Wei Liu
>Priority: Major
> Fix For: 3.5.0
>
>
> In the first implementation of Python support for foreachBatch, the python 
> process termination is not handled correctly. 
>  
> See the long TODO in 
> [https://github.com/apache/spark/blob/master/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/StreamingForeachBatchHelper.scala]
>  
> about an outline of the feature to terminate the runners by registering 
> StreamingQueryListners. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain

2023-08-16 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755180#comment-17755180
 ] 

Ignite TC Bot commented on SPARK-42947:
---

User 'liujiayi771' has created a pull request for this issue:
https://github.com/apache/spark/pull/40577

> Spark Thriftserver LDAP should not use DN pattern if user contains domain
> -
>
> Key: SPARK-42947
> URL: https://issues.apache.org/jira/browse/SPARK-42947
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jiayi Liu
>Priority: Major
>
> When the LDAP provider has domain configuration, such as Active Directory, 
> the principal should not be constructed according to the DN pattern, but the 
> username containing the domain should be directly passed to the LDAP provider 
> as the principal. We can refer to the implementation of Hive LdapUtils.
> When the username contains a domain or domain passes from 
> hive.server2.authentication.ldap.Domain configuration, if we construct the 
> principal according to the DN pattern (For example, 
> uid=user@domain,dc=test,dc=com), we will get the following error:
> {code:java}
> 23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: Error validating the login
>   at 
> org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108)
>  ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
>   at 
> org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537)
>  ~[libthrift-0.12.0.jar:0.12.0]
>   at 
> org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) 
> ~[libthrift-0.12.0.jar:0.12.0]
>   at 
> org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43)
>  ~[libthrift-0.12.0.jar:0.12.0]
>   at 
> org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223)
>  ~[libthrift-0.12.0.jar:0.12.0]
>   at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293)
>  ~[libthrift-0.12.0.jar:0.12.0]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[?:1.8.0_352]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  ~[?:1.8.0_352]
>   at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352]
> Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP 
> user
>   at 
> org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76)
>  ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
>   at 
> org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105)
>  ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
>   at 
> org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101)
>  ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1]
>   ... 8 more
> Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - 
> 80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data 
> 52e, v2580]
>   at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) 
> ~[?:1.8.0_352]
>   at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) 
> ~[?:1.8.0_352]
>   at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) 
> ~[?:1.8.0_352]
>   at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352]
>   at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352]
>   at 
> com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) 
> ~[?:1.8.0_352]
>   at 
> com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) 
> ~[?:1.8.0_352]
>   at 
> com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) 
> ~[?:1.8.0_352]
>   at 
> com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) 
> ~[?:1.8.0_352]
>   at 
> com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) 
> ~[?:1.8.0_352]
>   at 
> javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) 
> ~[?:1.8.0_352]
>   at 
> javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) 
> ~[?:1.8.0_352]
>   at javax.naming.InitialContext.init(InitialContext.java:244) 
> ~[?:1.8.0_352]
>   at javax.naming.InitialContext.(InitialContext.java:216) 
> ~[?:1.8.0_352]
>   at 
> javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) 
> ~[?:1.8.0_352]
>   at 
> org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73)
>  

[jira] [Commented] (SPARK-44799) Fix outer scopes for Ammonite generated classes

2023-08-16 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755057#comment-17755057
 ] 

Ignite TC Bot commented on SPARK-44799:
---

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/42489

> Fix outer scopes for Ammonite generated classes
> ---
>
> Key: SPARK-44799
> URL: https://issues.apache.org/jira/browse/SPARK-44799
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Blocker
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44613) Add Encoders.scala to Spark Connect Scala Client

2023-08-01 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749640#comment-17749640
 ] 

Ignite TC Bot commented on SPARK-44613:
---

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/42264

> Add Encoders.scala to Spark Connect Scala Client
> 
>
> Key: SPARK-44613
> URL: https://issues.apache.org/jira/browse/SPARK-44613
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44547) BlockManagerDecommissioner throws exceptions when migrating RDD cached blocks to fallback storage

2023-07-27 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748327#comment-17748327
 ] 

Ignite TC Bot commented on SPARK-44547:
---

User 'ukby1234' has created a pull request for this issue:
https://github.com/apache/spark/pull/42155

> BlockManagerDecommissioner throws exceptions when migrating RDD cached blocks 
> to fallback storage
> -
>
> Key: SPARK-44547
> URL: https://issues.apache.org/jira/browse/SPARK-44547
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.1
>Reporter: Frank Yin
>Priority: Major
> Attachments: spark-error.log
>
>
> Looks like the RDD cache doesn't support fallback storage and we should stop 
> the migration if the only viable peer is the fallback storage. 
>   [^spark-error.log] 23/07/25 05:12:58 WARN BlockManager: Failed to replicate 
> rdd_18_25 to BlockManagerId(fallback, remote, 7337, None), failure #0
> java.io.IOException: Failed to connect to remote:7337
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288)
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218)
>   at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230)
>   at 
> org.apache.spark.network.netty.NettyBlockTransferService.uploadBlock(NettyBlockTransferService.scala:168)
>   at 
> org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:121)
>   at 
> org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$replicate(BlockManager.scala:1784)
>   at 
> org.apache.spark.storage.BlockManager.$anonfun$replicateBlock$2(BlockManager.scala:1721)
>   at 
> org.apache.spark.storage.BlockManager.$anonfun$replicateBlock$2$adapted(BlockManager.scala:1707)
>   at scala.Option.forall(Option.scala:390)
>   at 
> org.apache.spark.storage.BlockManager.replicateBlock(BlockManager.scala:1707)
>   at 
> org.apache.spark.storage.BlockManagerDecommissioner.migrateBlock(BlockManagerDecommissioner.scala:356)
>   at 
> org.apache.spark.storage.BlockManagerDecommissioner.$anonfun$decommissionRddCacheBlocks$3(BlockManagerDecommissioner.scala:340)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
>   at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>   at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:286)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.storage.BlockManagerDecommissioner.decommissionRddCacheBlocks(BlockManagerDecommissioner.scala:339)
>   at 
> org.apache.spark.storage.BlockManagerDecommissioner$$anon$1.run(BlockManagerDecommissioner.scala:214)
>   at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>   at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
>   at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown 
> Source)
>   at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown 
> Source)
>   at java.base/java.lang.Thread.run(Unknown Source)
> Caused by: java.net.UnknownHostException: remote
>   at java.base/java.net.InetAddress$CachedAddresses.get(Unknown Source)
>   at java.base/java.net.InetAddress.getAllByName0(Unknown Source)
>   at java.base/java.net.InetAddress.getAllByName(Unknown Source)
>   at java.base/java.net.InetAddress.getAllByName(Unknown Source)
>   at java.base/java.net.InetAddress.getByName(Unknown Source)
>   at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:156)
>   at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:153)
>   at java.base/java.security.AccessController.doPrivileged(Native Method)
>   at 
> io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:153)
>   at 
> io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:41)
>   at 
> io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:61)
>   at 
> io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:53)
>   at 
> io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:55)
>   at 
> io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:31)
>   at 
> 

[jira] [Commented] (SPARK-44264) DeepSpeed Distrobutor

2023-07-26 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747612#comment-17747612
 ] 

Ignite TC Bot commented on SPARK-44264:
---

User 'mathewjacob1002' has created a pull request for this issue:
https://github.com/apache/spark/pull/42118

> DeepSpeed Distrobutor
> -
>
> Key: SPARK-44264
> URL: https://issues.apache.org/jira/browse/SPARK-44264
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 3.4.1
>Reporter: Lu Wang
>Priority: Critical
> Fix For: 3.5.0
>
> Attachments: Trying to Run Deepspeed Funcs.html
>
>
> To make it easier for Pyspark users to run distributed training and inference 
> with DeepSpeed on spark clusters using PySpark. This was a project determined 
> by the Databricks ML Training Team.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43402) FileSourceScanExec supports push down data filter with scalar subquery

2023-07-25 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747067#comment-17747067
 ] 

Ignite TC Bot commented on SPARK-43402:
---

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/41088

> FileSourceScanExec supports push down data filter with scalar subquery
> --
>
> Key: SPARK-43402
> URL: https://issues.apache.org/jira/browse/SPARK-43402
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Priority: Major
>
> Scalar subquery can be pushed down as data filter at runtime, since we always 
> execute subquery first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44216) Make assertSchemaEqual API public

2023-07-14 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743223#comment-17743223
 ] 

Ignite TC Bot commented on SPARK-44216:
---

User 'asl3' has created a pull request for this issue:
https://github.com/apache/spark/pull/41927

> Make assertSchemaEqual API public
> -
>
> Key: SPARK-44216
> URL: https://issues.apache.org/jira/browse/SPARK-44216
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Amanda Liu
>Priority: Major
>
> SPIP: 
> https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44431) Wrong semantics for null IN (empty list)

2023-07-14 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743222#comment-17743222
 ] 

Ignite TC Bot commented on SPARK-44431:
---

User 'jchen5' has created a pull request for this issue:
https://github.com/apache/spark/pull/42007

> Wrong semantics for null IN (empty list)
> 
>
> Key: SPARK-44431
> URL: https://issues.apache.org/jira/browse/SPARK-44431
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jack Chen
>Priority: Major
>
> {{null IN (empty list)}} incorrectly evaluates to null, when it should 
> evaluate to false. (The reason it should be false is because a IN (b1, b2) is 
> defined as a = b1 OR a = b2, and an empty IN list is treated as an empty OR 
> which is false. This is specified by ANSI SQL.)
> Many places in Spark execution (In, InSet, InSubquery) and optimization 
> (OptimizeIn, NullPropagation) implemented this wrong behavior. Also note that 
> the Spark behavior for the null IN (empty list) is inconsistent in some 
> places - literal IN lists generally return null (incorrect), while IN/NOT IN 
> subqueries mostly return false/true, respectively (correct) in this case.
> This is a longstanding correctness issue which has existed since null support 
> for IN expressions was first added to Spark.
> Doc with more details: 
> https://docs.google.com/document/d/15ttcB3OjGx5_WFKHB2COjQUbFHj5LrfNQv_d26o-wmI/edit



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44295) Upgrade scala-parser-combinators to 2.3

2023-07-04 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739838#comment-17739838
 ] 

Ignite TC Bot commented on SPARK-44295:
---

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/41848

> Upgrade scala-parser-combinators to 2.3
> ---
>
> Key: SPARK-44295
> URL: https://issues.apache.org/jira/browse/SPARK-44295
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
> [https://github.com/scala/scala-parser-combinators/releases/tag/v2.3.0]
>  
> new version:
>  * Drop support for Scala 2.11.x
>  * Fix {{Parsers.Parser.|||}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44250) Implement classification evaluator

2023-07-03 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739571#comment-17739571
 ] 

Ignite TC Bot commented on SPARK-44250:
---

User 'WeichenXu123' has created a pull request for this issue:
https://github.com/apache/spark/pull/41793

> Implement classification evaluator
> --
>
> Key: SPARK-44250
> URL: https://issues.apache.org/jira/browse/SPARK-44250
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML
>Affects Versions: 3.5.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
>
> Implement classification evaluator



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44268) Add tests to ensure error-classes.json and docs are in sync

2023-07-03 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739570#comment-17739570
 ] 

Ignite TC Bot commented on SPARK-44268:
---

User 'Hisoka-X' has created a pull request for this issue:
https://github.com/apache/spark/pull/41813

> Add tests to ensure error-classes.json and docs are in sync
> ---
>
> Key: SPARK-44268
> URL: https://issues.apache.org/jira/browse/SPARK-44268
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.1
>Reporter: Jia Fan
>Assignee: Jia Fan
>Priority: Major
> Fix For: 3.5.0
>
>
> We should add tests to ensure error-classes.json and docs are in sync, docs 
> and error-classes.json are always up to date before the PR is committed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43851) Support LCA in grouping expressions

2023-07-01 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739280#comment-17739280
 ] 

Ignite TC Bot commented on SPARK-43851:
---

User 'Hisoka-X' has created a pull request for this issue:
https://github.com/apache/spark/pull/41804

> Support LCA in grouping expressions
> ---
>
> Key: SPARK-43851
> URL: https://issues.apache.org/jira/browse/SPARK-43851
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Jia Fan
>Priority: Major
> Fix For: 3.5.0
>
>
> Teradata supports it:
> {code:sql}
> create table t1(a int) using  parquet;
> select a + 1 as a1, a1 + 1 as a2 from t1 group by a1, a2;
> {code}
> {noformat}
> [UNSUPPORTED_FEATURE.LATERAL_COLUMN_ALIAS_IN_GROUP_BY] The feature is not 
> supported: Referencing a lateral column alias via GROUP BY alias/ALL is not 
> supported yet.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44199) CacheManager refreshes the fileIndex unnecessarily

2023-06-29 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738673#comment-17738673
 ] 

Ignite TC Bot commented on SPARK-44199:
---

User 'vihangk1' has created a pull request for this issue:
https://github.com/apache/spark/pull/41749

> CacheManager refreshes the fileIndex unnecessarily
> --
>
> Key: SPARK-44199
> URL: https://issues.apache.org/jira/browse/SPARK-44199
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.1
>Reporter: Vihang Karajgaonkar
>Priority: Major
>
> The CacheManager on this line 
> [https://github.com/apache/spark/blob/680ca2e56f2c8fc759743ad6755f6e3b1a19c629/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala#L372]
>  uses a prefix based matching to decide which file index needs to be 
> refreshed. However, that can be incorrect if the users have paths which are 
> not subdirectories but share prefixes. For example, in the function below:
>  
> {code:java}
>   private def refreshFileIndexIfNecessary(
>       fileIndex: FileIndex,
>       fs: FileSystem,
>       qualifiedPath: Path): Boolean = {
>     val prefixToInvalidate = qualifiedPath.toString
>     val needToRefresh = fileIndex.rootPaths
>       .map(_.makeQualified(fs.getUri, fs.getWorkingDirectory).toString)
>       .exists(_.startsWith(prefixToInvalidate))
>     if (needToRefresh) fileIndex.refresh()
>     needToRefresh
>   } {code}
> {{If the prefixToInvalidate is s3://bucket/mypath/table_dir and the file 
> index has one of the root paths as s3://bucket/mypath/table_dir_2/part=1, 
> then the needToRefresh will be true and the file index gets refreshed 
> unnecessarily. This is not just wasted CPU cycles but can cause query 
> failures as well, if there are access restrictions to the path being 
> refreshed.}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44165) Exception when reading parquet file with TIME fields

2023-06-29 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738669#comment-17738669
 ] 

Ignite TC Bot commented on SPARK-44165:
---

User 'ramon-garcia' has created a pull request for this issue:
https://github.com/apache/spark/pull/41717

> Exception when reading parquet file with TIME fields
> 
>
> Key: SPARK-44165
> URL: https://issues.apache.org/jira/browse/SPARK-44165
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0, 3.4.1
> Environment: Spark 3.4.0 downloaded from apache.spark.org
> Also reproduced with latest build.
>Reporter: Ramón García Fernández
>Priority: Major
> Attachments: timeonly.parquet
>
>
> When one reads a parquet file containing TIME fields (either with INT32 or 
> INT64 storage) and exception is thrown. From spark shell
>  
> {{> val df = spark.read.parquet("timeonly.parquet")}}
> {color:#de350b}23/06/24 13:24:54 ERROR Executor: Exception in task 0.0 in 
> stage 0.0 (TID 0)/ 1]{color}
> {color:#de350b}org.apache.spark.sql.AnalysisException: Illegal Parquet type: 
> INT32 (TIME(MILLIS,true)).{color}
> {color:#de350b}    at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.illegalParquetTypeError(QueryCompilationErrors.scala:1762){color}
> {color:#de350b}    at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.illegalType$1(ParquetSchemaConverter.scala:206){color}
> {color:#de350b}    at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convertPrimitiveField$2(ParquetSchemaConverter.scala:252){color}
> {color:#de350b}    at scala.Option.getOrElse(Option.scala:189){color}
> {color:#de350b}    at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:224){color}
> {color:#de350b}    at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertField(ParquetSchemaConverter.scala:187){color}
> {color:#de350b}    at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convertInternal$3(ParquetSchemaConverter.scala:147){color}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44131) Add call_function and deprecate call_udf for Scala API

2023-06-29 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738672#comment-17738672
 ] 

Ignite TC Bot commented on SPARK-44131:
---

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/41687

> Add call_function and deprecate call_udf for Scala API
> --
>
> Key: SPARK-44131
> URL: https://issues.apache.org/jira/browse/SPARK-44131
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>
> The scala API for SQL exists a method call_udf used to call the user-defined 
> functions.
> In fact, call_udf also could call the builtin functions.
> The behavior is confused for users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44200) Support TABLE argument parser rule for TableValuedFunction

2023-06-29 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738671#comment-17738671
 ] 

Ignite TC Bot commented on SPARK-44200:
---

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/41750

> Support TABLE argument parser rule for TableValuedFunction
> --
>
> Key: SPARK-44200
> URL: https://issues.apache.org/jira/browse/SPARK-44200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44195) Add JobTag APIs to SparkR SparkContext

2023-06-29 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738668#comment-17738668
 ] 

Ignite TC Bot commented on SPARK-44195:
---

User 'juliuszsompolski' has created a pull request for this issue:
https://github.com/apache/spark/pull/41742

> Add JobTag APIs to SparkR SparkContext
> --
>
> Key: SPARK-44195
> URL: https://issues.apache.org/jira/browse/SPARK-44195
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 3.5.0
>Reporter: Juliusz Sompolski
>Priority: Major
>
> Add APIs added in https://issues.apache.org/jira/browse/SPARK-43952 to SparkR:
>  * {{SparkContext.addJobTag(tag: String): Unit}}
>  * {{SparkContext.removeJobTag(tag: String): Unit}}
>  * {{SparkContext.getJobTags(): Set[String]}}
>  * {{SparkContext.clearJobTags(): Unit}}
>  * {{SparkContext.cancelJobsWithTag(tag: String): Unit}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35564) Support subexpression elimination for non-common branches of conditional expressions

2023-06-29 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738670#comment-17738670
 ] 

Ignite TC Bot commented on SPARK-35564:
---

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/41677

> Support subexpression elimination for non-common branches of conditional 
> expressions
> 
>
> Key: SPARK-35564
> URL: https://issues.apache.org/jira/browse/SPARK-35564
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Adam Binford
>Priority: Major
>
> https://issues.apache.org/jira/browse/SPARK-7 added support for pulling 
> subexpressions out of branches of conditional expressions for expressions 
> present in all branches. We should be able to take this a step further and 
> pull out subexpressions for any branch, as long as that expression will 
> definitely be evaluated at least once.
> Consider a common data validation example:
> {code:java}
> from pyspark.sql.functions import *
> df = spark.createDataFrame([['word'], ['1234']])
> col = regexp_replace('_1', r'\d', '')
> df = df.withColumn('numbers_removed', when(length(col) > 0, col)){code}
> We only want to keep the value if it's non-empty with numbers removed, 
> otherwise we want it to be null. 
> Because we have no otherwise value, `col` is not a candidate for 
> subexpression elimination (you can see two regular expression replacements in 
> the codegen). But whenever the length is greater than 0, we will have to 
> execute the regular expression replacement twice. Since we know we will 
> always calculate `col` at least once, it makes sense to consider that as a 
> subexpression since we might need it again in the branch value. So we can 
> update the logic from:
> Create a subexpression if an expression will always be evaluated at least 
> twice
> To:
> Create a subexpression if an expression will always be evaluated at least 
> once AND will either always or conditionally be evaluated at least twice.
> The trade off is potentially another subexpression function call (for split 
> subexpressions) if the second evaluation doesn't happen, but this seems like 
> it would be worth it for when it is evaluated the second time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44137) Change handling of iterable objects for on field in joins

2023-06-26 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737266#comment-17737266
 ] 

Ignite TC Bot commented on SPARK-44137:
---

User 'jhaberstroh-sharethis' has created a pull request for this issue:
https://github.com/apache/spark/pull/41686

> Change handling of iterable objects for on field in joins
> -
>
> Key: SPARK-44137
> URL: https://issues.apache.org/jira/browse/SPARK-44137
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: John Haberstroh
>Priority: Minor
>
> The {{on}} field complained when I passed it a Tuple. That's because it saw 
> that it checked for {{list}} exactly, and so wrapped it into a list like 
> {{{}[on]{}}}, leading to immediate failure. This was surprising -- typically, 
> tuple and list should be interchangeable, and typically tuple is the more 
> readily accepted type. I have proposed a change that moves towards the 
> principle of least surprise for this situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44082) Generate operator does not update reference set properly

2023-06-26 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737265#comment-17737265
 ] 

Ignite TC Bot commented on SPARK-44082:
---

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/41633

> Generate operator does not update reference set properly
> 
>
> Key: SPARK-44082
> URL: https://issues.apache.org/jira/browse/SPARK-44082
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>
> Before 
> ```
> == Optimized Logical Plan ==
> Project [col1#2, col2#19]
> +- Generate replicaterows(sum#17L, col1#2, col2#3), [2], false, [col1#2, 
> col2#3]
>+- Filter (isnotnull(sum#17L) AND (sum#17L > 0))
>   +- Aggregate [col1#2, col2#19], [col1#2, col2#19, sum(vcol#14L) AS 
> sum#17L]
>  +- Union false, false
> :- Aggregate [col1#2], [1 AS vcol#14L, col1#2, first(col2#3, 
> false) AS col2#19]
> :  +- LogicalRDD [col1#2, col2#3], false
> +- Project [-1 AS vcol#15L, col1#8, col2#9]
>+- LogicalRDD [col1#8, col2#9], false
> ```
> Couldn't find col2#3 in [col1#2,col2#19,sum#17L]
> after 
> ```
> == Optimized Logical Plan ==
> Project [col1#2, col2#19]
> +- Generate replicaterows(sum#17L, col1#2, col2#19), [2], false, [col1#2, 
> col2#19]
>+- Filter (isnotnull(sum#17L) AND (sum#17L > 0))
>   +- Aggregate [col1#2, col2#19], [col1#2, col2#19, sum(vcol#14L) AS 
> sum#17L]
>  +- Union false, false
> :- Aggregate [col1#2], [1 AS vcol#14L, col1#2, first(col2#3, 
> false) AS col2#19]
> :  +- LogicalRDD [col1#2, col2#3], false
> +- Project [-1 AS vcol#15L, col1#8, col2#9]
>+- LogicalRDD [col1#8, col2#9], false
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43924) Add misc functions to Scala and Python

2023-06-21 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17735789#comment-17735789
 ] 

Ignite TC Bot commented on SPARK-43924:
---

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41689

> Add misc functions to Scala and Python
> --
>
> Key: SPARK-43924
> URL: https://issues.apache.org/jira/browse/SPARK-43924
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> Add following functions:
> * uuid
> * aes_encrypt
> * aes_decrypt
> * sha
> * input_file_block_length
> * input_file_block_start
> * reflect
> * java_method
> * version
> * typeof
> * stack
> * random
> to:
> * Scala API
> * Python API
> * Spark Connect Scala Client
> * Spark Connect Python Client



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44052) Add util to get proper Column or DataFrame class for Spark Connect.

2023-06-15 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732941#comment-17732941
 ] 

Ignite TC Bot commented on SPARK-44052:
---

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/41570

> Add util to get proper Column or DataFrame class for Spark Connect.
> ---
>
> Key: SPARK-44052
> URL: https://issues.apache.org/jira/browse/SPARK-44052
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> There are many codes are duplicated to get proper PySparkColumn or 
> PySparkDataFrame, so it would be great if we have util function to 
> deduplicate these codes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44039) Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite

2023-06-13 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732112#comment-17732112
 ] 

Ignite TC Bot commented on SPARK-44039:
---

User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/41572

> Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite
> 
>
> Key: SPARK-44039
> URL: https://issues.apache.org/jira/browse/SPARK-44039
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Tests
>Affects Versions: 3.5.0
>Reporter: BingKun Pan
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38477) Use error classes in org.apache.spark.storage

2023-06-13 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732111#comment-17732111
 ] 

Ignite TC Bot commented on SPARK-38477:
---

User 'bozhang2820' has created a pull request for this issue:
https://github.com/apache/spark/pull/41575

> Use error classes in org.apache.spark.storage
> -
>
> Key: SPARK-38477
> URL: https://issues.apache.org/jira/browse/SPARK-38477
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Bo Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43534) Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided

2023-05-19 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724373#comment-17724373
 ] 

Ignite TC Bot commented on SPARK-43534:
---

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/41195

> Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided
> --
>
> Key: SPARK-43534
> URL: https://issues.apache.org/jira/browse/SPARK-43534
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: hadoop log jars.png, log4j-1.2-api-2.20.0.jar, 
> log4j-slf4j2-impl-2.20.0.jar
>
>
> Build Spark:
> {code:sh}
> ./dev/make-distribution.sh --name default --tgz -Phive -Phive-thriftserver 
> -Pyarn -Phadoop-provided
> tar -zxf spark-3.5.0-SNAPSHOT-bin-default.tgz {code}
> Remove the following jars from spark-3.5.0-SNAPSHOT-bin-default:
> {noformat}
> jars/log4j-1.2-api-2.20.0.jar
> jars/log4j-slf4j2-impl-2.20.0.jar
> {noformat}
> Add a new log4j2.properties to spark-3.5.0-SNAPSHOT-bin-default/conf:
> {code:none}
> rootLogger.level = info
> rootLogger.appenderRef.file.ref = File
> rootLogger.appenderRef.stderr.ref = console
> appender.console.type = Console
> appender.console.name = console
> appender.console.target = SYSTEM_ERR
> appender.console.layout.type = PatternLayout
> appender.console.layout.pattern = %d{yy/MM/dd HH:mm:ss,SSS} %p [%t] %c{2}:%L 
> : %m%n
> appender.file.type = RollingFile
> appender.file.name = File
> appender.file.fileName = /tmp/spark/logs/spark.log
> appender.file.filePattern = /tmp/spark/logs/spark.%d{MMdd-HH}.log
> appender.file.append = true
> appender.file.layout.type = PatternLayout
> appender.file.layout.pattern = %d{yy/MM/dd HH:mm:ss,SSS} %p [%t] %c{2}:%L : 
> %m%n
> appender.file.policies.type = Policies
> appender.file.policies.time.type = TimeBasedTriggeringPolicy
> appender.file.policies.time.interval = 1
> appender.file.policies.time.modulate = true
> appender.file.policies.size.type = SizeBasedTriggeringPolicy
> appender.file.policies.size.size = 256M
> appender.file.strategy.type = DefaultRolloverStrategy
> appender.file.strategy.max = 100
> {code}
> Start Spark thriftserver:
> {code:java}
> sbin/start-thriftserver.sh
> {code}
> Check the log:
> {code:sh}
> cat /tmp/spark/logs/spark.log
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43509) Support creating multiple sessions for Spark Connect in PySpark

2023-05-17 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723755#comment-17723755
 ] 

Ignite TC Bot commented on SPARK-43509:
---

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/41206

> Support creating multiple sessions for Spark Connect in PySpark
> ---
>
> Key: SPARK-43509
> URL: https://issues.apache.org/jira/browse/SPARK-43509
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Martin Grund
>Assignee: Martin Grund
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43536) Statsd sink reporter reports incorrect counter metrics.

2023-05-17 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723415#comment-17723415
 ] 

Ignite TC Bot commented on SPARK-43536:
---

User 'venkateshbalaji99' has created a pull request for this issue:
https://github.com/apache/spark/pull/41199

> Statsd sink reporter reports incorrect counter metrics.
> ---
>
> Key: SPARK-43536
> URL: https://issues.apache.org/jira/browse/SPARK-43536
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.3
>Reporter: Abhishek Modi
>Priority: Major
>
> There is a mismatch between the definition of counter metrics between 
> dropwizard (which  is used by spark) and statsD. While Dropwizard interprets 
> counters as cumulative metrics, statsD interprets them as delta metrics. This 
> causes double aggregation in statsd causing inconsistent metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40887) Allow Spark on K8s to integrate w/ Log Service

2023-05-12 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722105#comment-17722105
 ] 

Ignite TC Bot commented on SPARK-40887:
---

User 'turboFei' has created a pull request for this issue:
https://github.com/apache/spark/pull/41139

> Allow Spark on K8s to integrate w/ Log Service
> --
>
> Key: SPARK-40887
> URL: https://issues.apache.org/jira/browse/SPARK-40887
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.4.0
>Reporter: Cheng Pan
>Assignee: Apache Spark
>Priority: Major
>
> https://docs.google.com/document/d/1MfB39LD4B4Rp7MDRxZbMKMbdNSe6V6mBmMQ-gkCnM-0/edit?usp=sharing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43461) Skip compiling javadoc.jar, sources.jar and test-jar when making distribution

2023-05-11 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721844#comment-17721844
 ] 

Ignite TC Bot commented on SPARK-43461:
---

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/41141

> Skip compiling javadoc.jar, sources.jar and test-jar when making distribution
> -
>
> Key: SPARK-43461
> URL: https://issues.apache.org/jira/browse/SPARK-43461
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>
> -Dmaven.javadoc.skip=true to skip java doc
> -Dskip=true to skip scala doc. Please see: 
> https://davidb.github.io/scala-maven-plugin/doc-jar-mojo.html#skip
> -Dmaven.source.skip to skip build sources.jar
> -Dmaven.test.skip to skip build test-jar



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43427) Unsigned integer types are deserialized as signed numeric equivalents

2023-05-11 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721698#comment-17721698
 ] 

Ignite TC Bot commented on SPARK-43427:
---

User 'justaparth' has created a pull request for this issue:
https://github.com/apache/spark/pull/41108

> Unsigned integer types are deserialized as signed numeric equivalents
> -
>
> Key: SPARK-43427
> URL: https://issues.apache.org/jira/browse/SPARK-43427
> Project: Spark
>  Issue Type: Bug
>  Components: Protobuf
>Affects Versions: 3.4.0
>Reporter: Parth Upadhyay
>Priority: Major
>
> I'm not sure if "bug" is the correct tag for this jira, but i've tagged it 
> like that for now since the behavior seems odd, happy to update to 
> "improvement" or something else based on the conversation!
> h2. Issue
> Protobuf supports unsigned integer types, including `uint32` and `uint64`. 
> When deserializing protobuf values with fields of these types, uint32 is 
> converted to `IntegerType` and uint64 is converted to `LongType` in the 
> resulting spark struct. `IntegerType` and `LongType` are 
> [signed|https://spark.apache.org/docs/latest/sql-ref-datatypes.html] integer 
> types, so this can lead to confusing results.
> Namely, if a uint32 value in a stored proto is above 2^31 or a uint64 value 
> is above 2^63, their representation in binary will contain a 1 in the highest 
> bit, which when interpreted as a signed integer will come out as negative 
> (I.e. overflow).
> I propose that we deserialize unsigned integer types into a type that can 
> contain them correctly, e.g.
> uint32 => `LongType`
> uint64 => `Decimal(20, 0)`
> h2. Backwards Compatibility / Default Behavior
> Should we maintain backwards compatibility and we add an option that allows 
> deserializing these types differently? Or should we change change the default 
> behavior (with an option to go back to the old way)? 
> I think by default it makes more sense to deserialize them as the larger 
> types so that it's semantically more correct. However, there may be existing 
> users of this library that would be affected by this behavior change. Though, 
> maybe we can justify the change since the function is tagged as 
> `Experimental` (and spark 3.4.0 was only released very recently).
> h2. Precedent
> I believe that unsigned integer types in parquet are deserialized in a 
> similar manner, i.e. put into a larger type so that the unsigned 
> representation natively fits. 
> https://issues.apache.org/jira/browse/SPARK-34817 and 
> https://github.com/apache/spark/pull/31921



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-35198) Add support for calling debugCodegen from Python & Java

2023-05-10 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-35198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721475#comment-17721475
 ] 

Ignite TC Bot commented on SPARK-35198:
---

User 'juanvisoler' has created a pull request for this issue:
https://github.com/apache/spark/pull/40608

> Add support for calling debugCodegen from Python & Java
> ---
>
> Key: SPARK-35198
> URL: https://issues.apache.org/jira/browse/SPARK-35198
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.0.1, 3.0.2, 3.1.0, 3.1.1, 3.2.0
>Reporter: Holden Karau
>Priority: Minor
>  Labels: starter
>
> Because it is implimented with an implicit conversion it's a bit complicated 
> to call, we should add a direct method to get debug state for Java & Python 
> users of Dataframes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43267) Support creating data frame from a Postgres table that contains user-defined array column

2023-05-01 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718376#comment-17718376
 ] 

Ignite TC Bot commented on SPARK-43267:
---

User 'juliuszsompolski' has created a pull request for this issue:
https://github.com/apache/spark/pull/41005

> Support creating data frame from a Postgres table that contains user-defined 
> array column
> -
>
> Key: SPARK-43267
> URL: https://issues.apache.org/jira/browse/SPARK-43267
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.0, 3.3.2
>Reporter: Sifan Huang
>Priority: Blocker
>
> Spark SQL now doesn’t support creating data frame from a Postgres table that 
> contains user-defined array column. However, it used to allow such type 
> before the Postgres JDBC commit 
> (https://github.com/pgjdbc/pgjdbc/commit/375cb3795c3330f9434cee9353f0791b86125914).
>  The previous behavior was to handle user-defined array column as String.
> Given:
>  * Postgres table with user-defined array column
>  * Function: DataFrameReader.jdbc - 
> https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/DataFrameReader.html#jdbc-java.lang.String-java.lang.String-java.util.Properties-
> Results:
>  * Exception “java.sql.SQLException: Unsupported type ARRAY” is thrown
> Expectation after the change:
>  * Function call succeeds
>  * User-defined array is converted as a string in Spark DataFrame
> Suggested fix:
>  * Update “getCatalystType” function in “PostgresDialect” as
>  ** 
> {code:java}
> val catalystType = toCatalystType(typeName.drop(1), size, 
> scale).map(ArrayType(_))
> if (catalystType.isEmpty) Some(StringType) else catalystType{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43223) KeyValueGroupedDataset#agg

2023-05-01 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718346#comment-17718346
 ] 

Ignite TC Bot commented on SPARK-43223:
---

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40796

> KeyValueGroupedDataset#agg
> --
>
> Key: SPARK-43223
> URL: https://issues.apache.org/jira/browse/SPARK-43223
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Zhen Li
>Priority: Major
>
> Adding missing agg functions in the KVGDS API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43321) Impl Dataset#JoinWith

2023-05-01 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718345#comment-17718345
 ] 

Ignite TC Bot commented on SPARK-43321:
---

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40997

> Impl Dataset#JoinWith
> -
>
> Key: SPARK-43321
> URL: https://issues.apache.org/jira/browse/SPARK-43321
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Zhen Li
>Priority: Major
>
> Impl missing method JoinWith



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43156) Correctness COUNT bug in correlated scalar subselect with `COUNT(*) is null`

2023-04-25 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17716312#comment-17716312
 ] 

Ignite TC Bot commented on SPARK-43156:
---

User 'jchen5' has created a pull request for this issue:
https://github.com/apache/spark/pull/40946

> Correctness COUNT bug in correlated scalar subselect with `COUNT(*) is null`
> 
>
> Key: SPARK-43156
> URL: https://issues.apache.org/jira/browse/SPARK-43156
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jack Chen
>Priority: Major
>
> Example query:
> {code:java}
> spark.sql("select *, (select (count(1)) is null from t1 where t0.a = t1.c) 
> from t0").collect()
> res6: Array[org.apache.spark.sql.Row] = Array([1,1.0,null], [2,2.0,false])  
> {code}
> In this subquery, count(1) always evaluates to a non-null integer value, so 
> count(1) is null is always false. The correct evaluation of the subquery is 
> always false.
> We incorrectly evaluate it to null for empty groups. The reason is that 
> NullPropagation rewrites Aggregate [c] [isnull(count(1))] to Aggregate [c] 
> [false] - this rewrite would be correct normally, but in the context of a 
> scalar subquery it breaks our count bug handling in 
> RewriteCorrelatedScalarSubquery.constructLeftJoins . By the time we get 
> there, the query appears to not have the count bug - it looks the same as if 
> the original query had a subquery with select any_value(false) from r..., and 
> that case is _not_ subject to the count bug.
>  
> Postgres comparison show correct always-false result: 
> [http://sqlfiddle.com/#!17/67822/5]
> DDL for the example:
> {code:java}
> create or replace temp view t0 (a, b)
> as values
>     (1, 1.0),
>     (2, 2.0);
> create or replace temp view t1 (c, d)
> as values
>     (2, 3.0); {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43098) Should not handle the COUNT bug when the GROUP BY clause of a correlated scalar subquery is non-empty

2023-04-25 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17716313#comment-17716313
 ] 

Ignite TC Bot commented on SPARK-43098:
---

User 'jchen5' has created a pull request for this issue:
https://github.com/apache/spark/pull/40946

> Should not handle the COUNT bug when the GROUP BY clause of a correlated 
> scalar subquery is non-empty
> -
>
> Key: SPARK-43098
> URL: https://issues.apache.org/jira/browse/SPARK-43098
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Jack Chen
>Assignee: Jack Chen
>Priority: Major
> Fix For: 3.4.1, 3.5.0
>
>
> From [~allisonwang-db] :
> There is no COUNT bug when the correlated equality predicates are also in the 
> group by clause. However, the current logic to handle the COUNT bug still 
> adds default aggregate function value and returns incorrect results.
>  
> {code:java}
> create view t1(c1, c2) as values (0, 1), (1, 2);
> create view t2(c1, c2) as values (0, 2), (0, 3);
> select c1, c2, (select count(*) from t2 where t1.c1 = t2.c1 group by c1) from 
> t1;
> -- Correct answer: [(0, 1, 2), (1, 2, null)]
> +---+---+--+
> |c1 |c2 |scalarsubquery(c1)|
> +---+---+--+
> |0  |1  |2 |
> |1  |2  |0 |
> +---+---+--+
>  {code}
>  
> This bug affects scalar subqueries in RewriteCorrelatedScalarSubquery, but 
> lateral subqueries handle it correctly in DecorrelateInnerQuery. Related: 
> https://issues.apache.org/jira/browse/SPARK-36113 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43196) Replace reflection w/ direct calling for `ContainerLaunchContext#setTokensConf`

2023-04-21 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715030#comment-17715030
 ] 

Ignite TC Bot commented on SPARK-43196:
---

User 'pan3793' has created a pull request for this issue:
https://github.com/apache/spark/pull/40900

> Replace reflection w/ direct calling for 
> `ContainerLaunchContext#setTokensConf`
> ---
>
> Key: SPARK-43196
> URL: https://issues.apache.org/jira/browse/SPARK-43196
> Project: Spark
>  Issue Type: Sub-task
>  Components: YARN
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43179) Add option for applications to control saving of metadata in the External Shuffle Service LevelDB

2023-04-19 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714149#comment-17714149
 ] 

Ignite TC Bot commented on SPARK-43179:
---

User 'otterc' has created a pull request for this issue:
https://github.com/apache/spark/pull/40843

> Add option for applications to control saving of metadata in the External 
> Shuffle Service LevelDB
> -
>
> Key: SPARK-43179
> URL: https://issues.apache.org/jira/browse/SPARK-43179
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 3.4.0
>Reporter: Chandni Singh
>Priority: Major
>
> Currently, the External Shuffle Service stores application metadata in 
> LevelDB. This is necessary to enable the shuffle server to resume serving 
> shuffle data for an application whose executors registered before the 
> NodeManager restarts. However, the metadata includes the application secret, 
> which is stored in LevelDB without encryption. This is a potential security 
> risk, particularly for applications with high security requirements. While 
> filesystem access control lists (ACLs) can help protect keys and 
> certificates, they may not be sufficient for some use cases. In response, we 
> have decided not to store metadata for these high-security applications in 
> LevelDB. As a result, these applications may experience more failures in the 
> event of a node restart, but we believe this trade-off is acceptable given 
> the increased security risk.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43187) Remove workaround for MiniKdc's BindException

2023-04-19 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714071#comment-17714071
 ] 

Ignite TC Bot commented on SPARK-43187:
---

User 'pan3793' has created a pull request for this issue:
https://github.com/apache/spark/pull/40849

> Remove workaround for MiniKdc's BindException
> -
>
> Key: SPARK-43187
> URL: https://issues.apache.org/jira/browse/SPARK-43187
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.5.0
>Reporter: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42552) Get ParseException when run sql: "SELECT 1 UNION SELECT 1;"

2023-04-18 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713489#comment-17713489
 ] 

Ignite TC Bot commented on SPARK-42552:
---

User 'Hisoka-X' has created a pull request for this issue:
https://github.com/apache/spark/pull/40823

> Get ParseException when run sql: "SELECT 1 UNION SELECT 1;"
> ---
>
> Key: SPARK-42552
> URL: https://issues.apache.org/jira/browse/SPARK-42552
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.3
> Environment: Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_345)
> Spark version 3.2.3-SNAPSHOT
>Reporter: jiang13021
>Priority: Major
> Fix For: 3.2.3
>
>
> When I run sql
> {code:java}
> scala> spark.sql("SELECT 1 UNION SELECT 1;") {code}
> I get ParseException:
> {code:java}
> org.apache.spark.sql.catalyst.parser.ParseException:
> mismatched input 'SELECT' expecting {, ';'}(line 1, pos 15)== SQL ==
> SELECT 1 UNION SELECT 1;
> ---^^^  at 
> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:77)
>   at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:616)
>   at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
>   at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:616)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
>   ... 47 elided
>  {code}
> If I run with parentheses , it works well 
> {code:java}
> scala> spark.sql("(SELECT 1) UNION (SELECT 1);") 
> res4: org.apache.spark.sql.DataFrame = [1: int]{code}
> This should be a bug
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43022) protobuf functions

2023-04-14 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712468#comment-17712468
 ] 

Ignite TC Bot commented on SPARK-43022:
---

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40654

> protobuf functions
> --
>
> Key: SPARK-43022
> URL: https://issues.apache.org/jira/browse/SPARK-43022
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43081) Add torch distributor data loader that loads data from spark partition data

2023-04-10 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710127#comment-17710127
 ] 

Ignite TC Bot commented on SPARK-43081:
---

User 'WeichenXu123' has created a pull request for this issue:
https://github.com/apache/spark/pull/40724

> Add torch distributor data loader that loads data from spark partition data
> ---
>
> Key: SPARK-43081
> URL: https://issues.apache.org/jira/browse/SPARK-43081
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, PySpark
>Affects Versions: 3.5.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
>
> Add torch distributor data loader that loads data from spark partition data.
>  
> We can add 2 APIs like:
> Adds a `TorchDistributor` method API :
> {code:java}
>      def train_on_dataframe(self, train_function, spark_dataframe, *args, 
> **kwargs):
>         """
>         Runs distributed training using provided spark DataFrame as input 
> data.
>         You should ensure the input spark DataFrame have evenly divided 
> partitions,
>         and this method starts a barrier spark job that each spark task in 
> the job
>         process one partition of the input spark DataFrame.
>         Parameters
>         --
>         train_function :
>             Either a PyTorch function, PyTorch Lightning function that 
> launches distributed
>             training. Note that inside the function, you can call
>             `pyspark.ml.torch.distributor.get_spark_partition_data_loader` 
> API to get a torch
>             data loader, the data loader loads data from the corresponding 
> partition of the
>             input spark DataFrame.
>         spark_dataframe :
>             An input spark DataFrame that can be used in PyTorch 
> `train_function` function.
>             See `train_function` argument doc for details.
>         args :
>             `args` need to be the input parameters to `train_function` 
> function. It would look like
>             >>> model = distributor.run(train, 1e-3, 64)
>             where train is a function and 1e-3 and 64 are regular numeric 
> inputs to the function.
>         kwargs :
>             `kwargs` need to be the key-work input parameters to 
> `train_function` function.
>             It would look like
>             >>> model = distributor.run(train, tol=1e-3, max_iter=64)
>             where train is a function that has 2 arguments `tol` and 
> `max_iter`.
>         Returns
>         ---
>             Returns the output of `train_function` called with args inside 
> spark rank 0 task.
>         """{code}
>  
> Adds an loader API:
>  
> {code:java}
>  def get_spark_partition_data_loader(num_samples, batch_size, prefetch=2):
>     """
>     This function must be called inside the `train_function` where 
> `train_function`
>     is the input argument of `TorchDistributor.train_on_dataframe`.
>     The function returns a pytorch data loader that loads data from
>     the corresponding spark partition data.
>     Parameters
>     --
>     num_samples :
>         Number of samples to generate per epoch. If `num_samples` is less 
> than the number of
>         rows in the spark partition, it generate the first `num_samples` rows 
> of
>         the spark partition, if `num_samples` is greater than the number of
>         rows in the spark partition, then after the iterator loaded all rows 
> from the partition,
>         it wraps round back to the first row.
>     batch_size:
>         How many samples per batch to load.
>     prefetch:
>         Number of batches loaded in advance.
>     """{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org