[jira] [Commented] (SPARK-47344) Enhance error message for invalid identifiers that need backticks
[ https://issues.apache.org/jira/browse/SPARK-47344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17825757#comment-17825757 ] Ignite TC Bot commented on SPARK-47344: --- User 'srielau' has created a pull request for this issue: https://github.com/apache/spark/pull/45470 > Enhance error message for invalid identifiers that need backticks > - > > Key: SPARK-47344 > URL: https://issues.apache.org/jira/browse/SPARK-47344 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Serge Rielau >Priority: Major > > We detect patterns like "my-tab" and raise a meaningful INVALID_IDENTIFIER > error when it is not surrounded by back quotes. > In this ticket we want to improve this effort to go beyond dashes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24203) Make executor's bindAddress configurable
[ https://issues.apache.org/jira/browse/SPARK-24203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763821#comment-17763821 ] Ignite TC Bot commented on SPARK-24203: --- User 'gedeh' has created a pull request for this issue: https://github.com/apache/spark/pull/42870 > Make executor's bindAddress configurable > > > Key: SPARK-24203 > URL: https://issues.apache.org/jira/browse/SPARK-24203 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.1.1 >Reporter: Lukas Majercak >Assignee: Nishchal Venkataramana >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44732) Port the initial implementation of Spark XML data source
[ https://issues.apache.org/jira/browse/SPARK-44732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762891#comment-17762891 ] Ignite TC Bot commented on SPARK-44732: --- User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/42844 > Port the initial implementation of Spark XML data source > > > Key: SPARK-44732 > URL: https://issues.apache.org/jira/browse/SPARK-44732 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45075) Alter table with invalid default value will not report error
[ https://issues.apache.org/jira/browse/SPARK-45075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762110#comment-17762110 ] Ignite TC Bot commented on SPARK-45075: --- User 'Hisoka-X' has created a pull request for this issue: https://github.com/apache/spark/pull/42810 > Alter table with invalid default value will not report error > > > Key: SPARK-45075 > URL: https://issues.apache.org/jira/browse/SPARK-45075 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Jia Fan >Priority: Major > > create table t(i boolean, s bigint); > alter table t alter column s set default badvalue; > > The code wouldn't report error on DataSource V2, not align with V1. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44640) Improve error messages for Python UDTF returning non iterable
[ https://issues.apache.org/jira/browse/SPARK-44640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761038#comment-17761038 ] Ignite TC Bot commented on SPARK-44640: --- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/42726 > Improve error messages for Python UDTF returning non iterable > - > > Key: SPARK-44640 > URL: https://issues.apache.org/jira/browse/SPARK-44640 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 4.0.0 > > > When the return type of a UDTF is not an iterable, the error message can be > confusing to users. For example for this UDTF: > {code:java} > @udtf(returnType="x: int") > class TestUDTF: > def eval(self, a): > return a {code} > Currently it fails with this error for regular UDTFs: > return tuple(map(verify_and_convert_result, res)) > TypeError: 'int' object is not iterable > And this error for arrow-optimized UDTFs: > raise ValueError("DataFrame constructor not properly called!") > ValueError: DataFrame constructor not properly called! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44846) PushFoldableIntoBranches in complex grouping expressions may cause bindReference error
[ https://issues.apache.org/jira/browse/SPARK-44846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761037#comment-17761037 ] Ignite TC Bot commented on SPARK-44846: --- User 'zml1206' has created a pull request for this issue: https://github.com/apache/spark/pull/42633 > PushFoldableIntoBranches in complex grouping expressions may cause > bindReference error > -- > > Key: SPARK-44846 > URL: https://issues.apache.org/jira/browse/SPARK-44846 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1 >Reporter: zhuml >Priority: Major > > SQL: > {code:java} > select c*2 as d from > (select if(b > 1, 1, b) as c from > (select if(a < 0, 0 ,a) as b from t group by b) t1 > group by c) t2 {code} > ERROR: > {code:java} > Couldn't find _groupingexpression#15 in [if ((_groupingexpression#15 > 1)) 1 > else _groupingexpression#15#16] > java.lang.IllegalStateException: Couldn't find _groupingexpression#15 in [if > ((_groupingexpression#15 > 1)) 1 else _groupingexpression#15#16] > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren(TreeNode.scala:1241) > at > org.apache.spark.sql.catalyst.trees.BinaryLike.mapChildren$(TreeNode.scala:1240) > at > org.apache.spark.sql.catalyst.expressions.BinaryExpression.mapChildren(Expression.scala:653) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466) > at > org.apache.spark.sql.catalyst.trees.TernaryLike.mapChildren(TreeNode.scala:1272) > at > org.apache.spark.sql.catalyst.trees.TernaryLike.mapChildren$(TreeNode.scala:1271) > at > org.apache.spark.sql.catalyst.expressions.If.mapChildren(conditionalExpressions.scala:41) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:466) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1215) > at > org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1214) > at > org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:533) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:466) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:437) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:405) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:73) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:94) > at scala.collection.immutable.List.map(List.scala:293) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:94) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.generateResultFunction(HashAggregateExec.scala:360) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduceWithKeys(HashAggregateExec.scala:538) > at > org.apache.spark.sql.execution.aggregate.AggregateCodegenSupport.doProduce(AggregateCodegenSupport.scala:69) > at > org.apache.spark.sql.execution.aggregate.AggregateCodegenSupport.doProduce$(AggregateCodegenSupport.scala:65) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduce(HashAggregateExec.scala:49) > at > org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:97) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243) > at > org.apache.spark.sql.execution.CodegenSupport.produce(WholeStageCodegenExec.scala:92) > at >
[jira] [Commented] (SPARK-44987) Assign name to the error class _LEGACY_ERROR_TEMP_1100
[ https://issues.apache.org/jira/browse/SPARK-44987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761034#comment-17761034 ] Ignite TC Bot commented on SPARK-44987: --- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/42737 > Assign name to the error class _LEGACY_ERROR_TEMP_1100 > -- > > Key: SPARK-44987 > URL: https://issues.apache.org/jira/browse/SPARK-44987 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Max Gekk >Assignee: Max Gekk >Priority: Minor > > Assign a name and improve the error message format. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44910) Encoders.bean does not support superclasses with generic type arguments
[ https://issues.apache.org/jira/browse/SPARK-44910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761036#comment-17761036 ] Ignite TC Bot commented on SPARK-44910: --- User 'gbloisi-openaire' has created a pull request for this issue: https://github.com/apache/spark/pull/42634 > Encoders.bean does not support superclasses with generic type arguments > --- > > Key: SPARK-44910 > URL: https://issues.apache.org/jira/browse/SPARK-44910 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0, 4.0.0 >Reporter: Giambattista Bloisi >Priority: Major > > As per SPARK-44634 another unsupported feature of bean encoder is when the > superclass of the bean has generic type arguments. For example: > {code:java} > class JavaBeanWithGenericsA { > public T getPropertyA() { > return null; > } > public void setPropertyA(T a) { > } > } > class JavaBeanWithGenericBase extends JavaBeanWithGenericsA { > } > Encoders.bean(JavaBeanWithGenericBase.class); // Exception > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44994) Refine docstring of `DataFrame.filter`
[ https://issues.apache.org/jira/browse/SPARK-44994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761035#comment-17761035 ] Ignite TC Bot commented on SPARK-44994: --- User 'allisonwang-db' has created a pull request for this issue: https://github.com/apache/spark/pull/42708 > Refine docstring of `DataFrame.filter` > -- > > Key: SPARK-44994 > URL: https://issues.apache.org/jira/browse/SPARK-44994 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > > Refine the docstring and add more examples for DataFrame.filter -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45014) Clean up fileserver when cleaning up files, jars and archives in SparkContext
[ https://issues.apache.org/jira/browse/SPARK-45014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760657#comment-17760657 ] Ignite TC Bot commented on SPARK-45014: --- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/42731 > Clean up fileserver when cleaning up files, jars and archives in SparkContext > - > > Key: SPARK-45014 > URL: https://issues.apache.org/jira/browse/SPARK-45014 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > In SPARK-44348, we clean up Spark Context's added files but we don't clean up > the ones in fileserver. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45021) Remove `antlr4-maven-plugin` configuration from `sql/catalyst/pom.xml`
[ https://issues.apache.org/jira/browse/SPARK-45021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760404#comment-17760404 ] Ignite TC Bot commented on SPARK-45021: --- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/42739 > Remove `antlr4-maven-plugin` configuration from `sql/catalyst/pom.xml` > -- > > Key: SPARK-45021 > URL: https://issues.apache.org/jira/browse/SPARK-45021 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Minor > > SPARK-44475 has already moved the relevant configuration to > `sql/api/pom.xml`, the configuration in the catalyst module is unused now. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42304) Assign name to _LEGACY_ERROR_TEMP_2189
[ https://issues.apache.org/jira/browse/SPARK-42304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17759898#comment-17759898 ] Ignite TC Bot commented on SPARK-42304: --- User 'valentinp17' has created a pull request for this issue: https://github.com/apache/spark/pull/42706 > Assign name to _LEGACY_ERROR_TEMP_2189 > -- > > Key: SPARK-42304 > URL: https://issues.apache.org/jira/browse/SPARK-42304 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43288) DataSourceV2: CREATE TABLE LIKE
[ https://issues.apache.org/jira/browse/SPARK-43288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756860#comment-17756860 ] Ignite TC Bot commented on SPARK-43288: --- User 'Hisoka-X' has created a pull request for this issue: https://github.com/apache/spark/pull/42586 > DataSourceV2: CREATE TABLE LIKE > --- > > Key: SPARK-43288 > URL: https://issues.apache.org/jira/browse/SPARK-43288 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: John Zhuge >Priority: Major > > Support CREATE TABLE LIKE in DSv2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44881) Executor stucked on retrying to fetch shuffle data when `java.lang.OutOfMemoryError. unable to create native thread` exception occurred.
[ https://issues.apache.org/jira/browse/SPARK-44881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756445#comment-17756445 ] Ignite TC Bot commented on SPARK-44881: --- User 'hgs19921112' has created a pull request for this issue: https://github.com/apache/spark/pull/42572 > Executor stucked on retrying to fetch shuffle data when > `java.lang.OutOfMemoryError. unable to create native thread` exception > occurred. > > > Key: SPARK-44881 > URL: https://issues.apache.org/jira/browse/SPARK-44881 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.0 >Reporter: hgs >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44433) Implement termination of Python process for foreachBatch & streaming listener
[ https://issues.apache.org/jira/browse/SPARK-44433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756075#comment-17756075 ] Ignite TC Bot commented on SPARK-44433: --- User 'rangadi' has created a pull request for this issue: https://github.com/apache/spark/pull/42555 > Implement termination of Python process for foreachBatch & streaming listener > - > > Key: SPARK-44433 > URL: https://issues.apache.org/jira/browse/SPARK-44433 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.4.1 >Reporter: Raghu Angadi >Assignee: Wei Liu >Priority: Major > Fix For: 3.5.0 > > > In the first implementation of Python support for foreachBatch, the python > process termination is not handled correctly. > > See the long TODO in > [https://github.com/apache/spark/blob/master/connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/StreamingForeachBatchHelper.scala] > > about an outline of the feature to terminate the runners by registering > StreamingQueryListners. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42947) Spark Thriftserver LDAP should not use DN pattern if user contains domain
[ https://issues.apache.org/jira/browse/SPARK-42947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755180#comment-17755180 ] Ignite TC Bot commented on SPARK-42947: --- User 'liujiayi771' has created a pull request for this issue: https://github.com/apache/spark/pull/40577 > Spark Thriftserver LDAP should not use DN pattern if user contains domain > - > > Key: SPARK-42947 > URL: https://issues.apache.org/jira/browse/SPARK-42947 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jiayi Liu >Priority: Major > > When the LDAP provider has domain configuration, such as Active Directory, > the principal should not be constructed according to the DN pattern, but the > username containing the domain should be directly passed to the LDAP provider > as the principal. We can refer to the implementation of Hive LdapUtils. > When the username contains a domain or domain passes from > hive.server2.authentication.ldap.Domain configuration, if we construct the > principal according to the DN pattern (For example, > uid=user@domain,dc=test,dc=com), we will get the following error: > {code:java} > 23/03/28 11:01:48 ERROR TSaslTransport: SASL negotiation failure > javax.security.sasl.SaslException: Error validating the login > at > org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:108) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > at > org.apache.thrift.transport.TSaslTransport$SaslParticipant.evaluateChallengeOrResponse(TSaslTransport.java:537) > ~[libthrift-0.12.0.jar:0.12.0] > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:283) > ~[libthrift-0.12.0.jar:0.12.0] > at > org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:43) > ~[libthrift-0.12.0.jar:0.12.0] > at > org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:223) > ~[libthrift-0.12.0.jar:0.12.0] > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:293) > ~[libthrift-0.12.0.jar:0.12.0] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[?:1.8.0_352] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ~[?:1.8.0_352] > at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_352] > Caused by: javax.security.sasl.AuthenticationException: Error validating LDAP > user > at > org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:76) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > at > org.apache.hive.service.auth.PlainSaslHelper$PlainServerCallbackHandler.handle(PlainSaslHelper.java:105) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > at > org.apache.hive.service.auth.PlainSaslServer.evaluateResponse(PlainSaslServer.java:101) > ~[spark-hive-thriftserver_2.12-3.3.1.jar:3.3.1] > ... 8 more > Caused by: javax.naming.AuthenticationException: [LDAP: error code 49 - > 80090308: LdapErr: DSID-0C0903D9, comment: AcceptSecurityContext error, data > 52e, v2580] > at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3261) > ~[?:1.8.0_352] > at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:3207) > ~[?:1.8.0_352] > at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2993) > ~[?:1.8.0_352] > at com.sun.jndi.ldap.LdapCtx.connect(LdapCtx.java:2907) ~[?:1.8.0_352] > at com.sun.jndi.ldap.LdapCtx.(LdapCtx.java:347) ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxFromUrl(LdapCtxFactory.java:229) > ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getUsingURL(LdapCtxFactory.java:189) > ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getUsingURLs(LdapCtxFactory.java:247) > ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getLdapCtxInstance(LdapCtxFactory.java:154) > ~[?:1.8.0_352] > at > com.sun.jndi.ldap.LdapCtxFactory.getInitialContext(LdapCtxFactory.java:84) > ~[?:1.8.0_352] > at > javax.naming.spi.NamingManager.getInitialContext(NamingManager.java:695) > ~[?:1.8.0_352] > at > javax.naming.InitialContext.getDefaultInitCtx(InitialContext.java:313) > ~[?:1.8.0_352] > at javax.naming.InitialContext.init(InitialContext.java:244) > ~[?:1.8.0_352] > at javax.naming.InitialContext.(InitialContext.java:216) > ~[?:1.8.0_352] > at > javax.naming.directory.InitialDirContext.(InitialDirContext.java:101) > ~[?:1.8.0_352] > at > org.apache.hive.service.auth.LdapAuthenticationProviderImpl.Authenticate(LdapAuthenticationProviderImpl.java:73) >
[jira] [Commented] (SPARK-44799) Fix outer scopes for Ammonite generated classes
[ https://issues.apache.org/jira/browse/SPARK-44799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755057#comment-17755057 ] Ignite TC Bot commented on SPARK-44799: --- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/42489 > Fix outer scopes for Ammonite generated classes > --- > > Key: SPARK-44799 > URL: https://issues.apache.org/jira/browse/SPARK-44799 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Blocker > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44613) Add Encoders.scala to Spark Connect Scala Client
[ https://issues.apache.org/jira/browse/SPARK-44613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17749640#comment-17749640 ] Ignite TC Bot commented on SPARK-44613: --- User 'hvanhovell' has created a pull request for this issue: https://github.com/apache/spark/pull/42264 > Add Encoders.scala to Spark Connect Scala Client > > > Key: SPARK-44613 > URL: https://issues.apache.org/jira/browse/SPARK-44613 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.4.1 >Reporter: Herman van Hövell >Assignee: Herman van Hövell >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44547) BlockManagerDecommissioner throws exceptions when migrating RDD cached blocks to fallback storage
[ https://issues.apache.org/jira/browse/SPARK-44547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17748327#comment-17748327 ] Ignite TC Bot commented on SPARK-44547: --- User 'ukby1234' has created a pull request for this issue: https://github.com/apache/spark/pull/42155 > BlockManagerDecommissioner throws exceptions when migrating RDD cached blocks > to fallback storage > - > > Key: SPARK-44547 > URL: https://issues.apache.org/jira/browse/SPARK-44547 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Frank Yin >Priority: Major > Attachments: spark-error.log > > > Looks like the RDD cache doesn't support fallback storage and we should stop > the migration if the only viable peer is the fallback storage. > [^spark-error.log] 23/07/25 05:12:58 WARN BlockManager: Failed to replicate > rdd_18_25 to BlockManagerId(fallback, remote, 7337, None), failure #0 > java.io.IOException: Failed to connect to remote:7337 > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:288) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:218) > at > org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:230) > at > org.apache.spark.network.netty.NettyBlockTransferService.uploadBlock(NettyBlockTransferService.scala:168) > at > org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:121) > at > org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$replicate(BlockManager.scala:1784) > at > org.apache.spark.storage.BlockManager.$anonfun$replicateBlock$2(BlockManager.scala:1721) > at > org.apache.spark.storage.BlockManager.$anonfun$replicateBlock$2$adapted(BlockManager.scala:1707) > at scala.Option.forall(Option.scala:390) > at > org.apache.spark.storage.BlockManager.replicateBlock(BlockManager.scala:1707) > at > org.apache.spark.storage.BlockManagerDecommissioner.migrateBlock(BlockManagerDecommissioner.scala:356) > at > org.apache.spark.storage.BlockManagerDecommissioner.$anonfun$decommissionRddCacheBlocks$3(BlockManagerDecommissioner.scala:340) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > at > scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) > at > scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) > at scala.collection.TraversableLike.map(TraversableLike.scala:286) > at scala.collection.TraversableLike.map$(TraversableLike.scala:279) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.storage.BlockManagerDecommissioner.decommissionRddCacheBlocks(BlockManagerDecommissioner.scala:339) > at > org.apache.spark.storage.BlockManagerDecommissioner$$anon$1.run(BlockManagerDecommissioner.scala:214) > at > java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) > at java.base/java.util.concurrent.FutureTask.run(Unknown Source) > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown > Source) > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown > Source) > at java.base/java.lang.Thread.run(Unknown Source) > Caused by: java.net.UnknownHostException: remote > at java.base/java.net.InetAddress$CachedAddresses.get(Unknown Source) > at java.base/java.net.InetAddress.getAllByName0(Unknown Source) > at java.base/java.net.InetAddress.getAllByName(Unknown Source) > at java.base/java.net.InetAddress.getAllByName(Unknown Source) > at java.base/java.net.InetAddress.getByName(Unknown Source) > at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:156) > at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:153) > at java.base/java.security.AccessController.doPrivileged(Native Method) > at > io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:153) > at > io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:41) > at > io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:61) > at > io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:53) > at > io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:55) > at > io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:31) > at >
[jira] [Commented] (SPARK-44264) DeepSpeed Distrobutor
[ https://issues.apache.org/jira/browse/SPARK-44264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747612#comment-17747612 ] Ignite TC Bot commented on SPARK-44264: --- User 'mathewjacob1002' has created a pull request for this issue: https://github.com/apache/spark/pull/42118 > DeepSpeed Distrobutor > - > > Key: SPARK-44264 > URL: https://issues.apache.org/jira/browse/SPARK-44264 > Project: Spark > Issue Type: Improvement > Components: ML, PySpark >Affects Versions: 3.4.1 >Reporter: Lu Wang >Priority: Critical > Fix For: 3.5.0 > > Attachments: Trying to Run Deepspeed Funcs.html > > > To make it easier for Pyspark users to run distributed training and inference > with DeepSpeed on spark clusters using PySpark. This was a project determined > by the Databricks ML Training Team. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43402) FileSourceScanExec supports push down data filter with scalar subquery
[ https://issues.apache.org/jira/browse/SPARK-43402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17747067#comment-17747067 ] Ignite TC Bot commented on SPARK-43402: --- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/41088 > FileSourceScanExec supports push down data filter with scalar subquery > -- > > Key: SPARK-43402 > URL: https://issues.apache.org/jira/browse/SPARK-43402 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: XiDuo You >Priority: Major > > Scalar subquery can be pushed down as data filter at runtime, since we always > execute subquery first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44216) Make assertSchemaEqual API public
[ https://issues.apache.org/jira/browse/SPARK-44216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743223#comment-17743223 ] Ignite TC Bot commented on SPARK-44216: --- User 'asl3' has created a pull request for this issue: https://github.com/apache/spark/pull/41927 > Make assertSchemaEqual API public > - > > Key: SPARK-44216 > URL: https://issues.apache.org/jira/browse/SPARK-44216 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Amanda Liu >Priority: Major > > SPIP: > https://docs.google.com/document/d/1OkyBn3JbEHkkQgSQ45Lq82esXjr9rm2Vj7Ih_4zycRc/edit#heading=h.f5f0u2riv07v -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44431) Wrong semantics for null IN (empty list)
[ https://issues.apache.org/jira/browse/SPARK-44431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17743222#comment-17743222 ] Ignite TC Bot commented on SPARK-44431: --- User 'jchen5' has created a pull request for this issue: https://github.com/apache/spark/pull/42007 > Wrong semantics for null IN (empty list) > > > Key: SPARK-44431 > URL: https://issues.apache.org/jira/browse/SPARK-44431 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jack Chen >Priority: Major > > {{null IN (empty list)}} incorrectly evaluates to null, when it should > evaluate to false. (The reason it should be false is because a IN (b1, b2) is > defined as a = b1 OR a = b2, and an empty IN list is treated as an empty OR > which is false. This is specified by ANSI SQL.) > Many places in Spark execution (In, InSet, InSubquery) and optimization > (OptimizeIn, NullPropagation) implemented this wrong behavior. Also note that > the Spark behavior for the null IN (empty list) is inconsistent in some > places - literal IN lists generally return null (incorrect), while IN/NOT IN > subqueries mostly return false/true, respectively (correct) in this case. > This is a longstanding correctness issue which has existed since null support > for IN expressions was first added to Spark. > Doc with more details: > https://docs.google.com/document/d/15ttcB3OjGx5_WFKHB2COjQUbFHj5LrfNQv_d26o-wmI/edit -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44295) Upgrade scala-parser-combinators to 2.3
[ https://issues.apache.org/jira/browse/SPARK-44295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739838#comment-17739838 ] Ignite TC Bot commented on SPARK-44295: --- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/41848 > Upgrade scala-parser-combinators to 2.3 > --- > > Key: SPARK-44295 > URL: https://issues.apache.org/jira/browse/SPARK-44295 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Minor > > [https://github.com/scala/scala-parser-combinators/releases/tag/v2.3.0] > > new version: > * Drop support for Scala 2.11.x > * Fix {{Parsers.Parser.|||}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44250) Implement classification evaluator
[ https://issues.apache.org/jira/browse/SPARK-44250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739571#comment-17739571 ] Ignite TC Bot commented on SPARK-44250: --- User 'WeichenXu123' has created a pull request for this issue: https://github.com/apache/spark/pull/41793 > Implement classification evaluator > -- > > Key: SPARK-44250 > URL: https://issues.apache.org/jira/browse/SPARK-44250 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML >Affects Versions: 3.5.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > > Implement classification evaluator -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44268) Add tests to ensure error-classes.json and docs are in sync
[ https://issues.apache.org/jira/browse/SPARK-44268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739570#comment-17739570 ] Ignite TC Bot commented on SPARK-44268: --- User 'Hisoka-X' has created a pull request for this issue: https://github.com/apache/spark/pull/41813 > Add tests to ensure error-classes.json and docs are in sync > --- > > Key: SPARK-44268 > URL: https://issues.apache.org/jira/browse/SPARK-44268 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Jia Fan >Assignee: Jia Fan >Priority: Major > Fix For: 3.5.0 > > > We should add tests to ensure error-classes.json and docs are in sync, docs > and error-classes.json are always up to date before the PR is committed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43851) Support LCA in grouping expressions
[ https://issues.apache.org/jira/browse/SPARK-43851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17739280#comment-17739280 ] Ignite TC Bot commented on SPARK-43851: --- User 'Hisoka-X' has created a pull request for this issue: https://github.com/apache/spark/pull/41804 > Support LCA in grouping expressions > --- > > Key: SPARK-43851 > URL: https://issues.apache.org/jira/browse/SPARK-43851 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Assignee: Jia Fan >Priority: Major > Fix For: 3.5.0 > > > Teradata supports it: > {code:sql} > create table t1(a int) using parquet; > select a + 1 as a1, a1 + 1 as a2 from t1 group by a1, a2; > {code} > {noformat} > [UNSUPPORTED_FEATURE.LATERAL_COLUMN_ALIAS_IN_GROUP_BY] The feature is not > supported: Referencing a lateral column alias via GROUP BY alias/ALL is not > supported yet. > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44199) CacheManager refreshes the fileIndex unnecessarily
[ https://issues.apache.org/jira/browse/SPARK-44199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738673#comment-17738673 ] Ignite TC Bot commented on SPARK-44199: --- User 'vihangk1' has created a pull request for this issue: https://github.com/apache/spark/pull/41749 > CacheManager refreshes the fileIndex unnecessarily > -- > > Key: SPARK-44199 > URL: https://issues.apache.org/jira/browse/SPARK-44199 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Vihang Karajgaonkar >Priority: Major > > The CacheManager on this line > [https://github.com/apache/spark/blob/680ca2e56f2c8fc759743ad6755f6e3b1a19c629/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala#L372] > uses a prefix based matching to decide which file index needs to be > refreshed. However, that can be incorrect if the users have paths which are > not subdirectories but share prefixes. For example, in the function below: > > {code:java} > private def refreshFileIndexIfNecessary( > fileIndex: FileIndex, > fs: FileSystem, > qualifiedPath: Path): Boolean = { > val prefixToInvalidate = qualifiedPath.toString > val needToRefresh = fileIndex.rootPaths > .map(_.makeQualified(fs.getUri, fs.getWorkingDirectory).toString) > .exists(_.startsWith(prefixToInvalidate)) > if (needToRefresh) fileIndex.refresh() > needToRefresh > } {code} > {{If the prefixToInvalidate is s3://bucket/mypath/table_dir and the file > index has one of the root paths as s3://bucket/mypath/table_dir_2/part=1, > then the needToRefresh will be true and the file index gets refreshed > unnecessarily. This is not just wasted CPU cycles but can cause query > failures as well, if there are access restrictions to the path being > refreshed.}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44165) Exception when reading parquet file with TIME fields
[ https://issues.apache.org/jira/browse/SPARK-44165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738669#comment-17738669 ] Ignite TC Bot commented on SPARK-44165: --- User 'ramon-garcia' has created a pull request for this issue: https://github.com/apache/spark/pull/41717 > Exception when reading parquet file with TIME fields > > > Key: SPARK-44165 > URL: https://issues.apache.org/jira/browse/SPARK-44165 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.4.0, 3.4.1 > Environment: Spark 3.4.0 downloaded from apache.spark.org > Also reproduced with latest build. >Reporter: Ramón García Fernández >Priority: Major > Attachments: timeonly.parquet > > > When one reads a parquet file containing TIME fields (either with INT32 or > INT64 storage) and exception is thrown. From spark shell > > {{> val df = spark.read.parquet("timeonly.parquet")}} > {color:#de350b}23/06/24 13:24:54 ERROR Executor: Exception in task 0.0 in > stage 0.0 (TID 0)/ 1]{color} > {color:#de350b}org.apache.spark.sql.AnalysisException: Illegal Parquet type: > INT32 (TIME(MILLIS,true)).{color} > {color:#de350b} at > org.apache.spark.sql.errors.QueryCompilationErrors$.illegalParquetTypeError(QueryCompilationErrors.scala:1762){color} > {color:#de350b} at > org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.illegalType$1(ParquetSchemaConverter.scala:206){color} > {color:#de350b} at > org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convertPrimitiveField$2(ParquetSchemaConverter.scala:252){color} > {color:#de350b} at scala.Option.getOrElse(Option.scala:189){color} > {color:#de350b} at > org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:224){color} > {color:#de350b} at > org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertField(ParquetSchemaConverter.scala:187){color} > {color:#de350b} at > org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convertInternal$3(ParquetSchemaConverter.scala:147){color} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44131) Add call_function and deprecate call_udf for Scala API
[ https://issues.apache.org/jira/browse/SPARK-44131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738672#comment-17738672 ] Ignite TC Bot commented on SPARK-44131: --- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/41687 > Add call_function and deprecate call_udf for Scala API > -- > > Key: SPARK-44131 > URL: https://issues.apache.org/jira/browse/SPARK-44131 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.5.0 >Reporter: jiaan.geng >Priority: Major > > The scala API for SQL exists a method call_udf used to call the user-defined > functions. > In fact, call_udf also could call the builtin functions. > The behavior is confused for users. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44200) Support TABLE argument parser rule for TableValuedFunction
[ https://issues.apache.org/jira/browse/SPARK-44200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738671#comment-17738671 ] Ignite TC Bot commented on SPARK-44200: --- User 'ueshin' has created a pull request for this issue: https://github.com/apache/spark/pull/41750 > Support TABLE argument parser rule for TableValuedFunction > -- > > Key: SPARK-44200 > URL: https://issues.apache.org/jira/browse/SPARK-44200 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44195) Add JobTag APIs to SparkR SparkContext
[ https://issues.apache.org/jira/browse/SPARK-44195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738668#comment-17738668 ] Ignite TC Bot commented on SPARK-44195: --- User 'juliuszsompolski' has created a pull request for this issue: https://github.com/apache/spark/pull/41742 > Add JobTag APIs to SparkR SparkContext > -- > > Key: SPARK-44195 > URL: https://issues.apache.org/jira/browse/SPARK-44195 > Project: Spark > Issue Type: New Feature > Components: SparkR >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Priority: Major > > Add APIs added in https://issues.apache.org/jira/browse/SPARK-43952 to SparkR: > * {{SparkContext.addJobTag(tag: String): Unit}} > * {{SparkContext.removeJobTag(tag: String): Unit}} > * {{SparkContext.getJobTags(): Set[String]}} > * {{SparkContext.clearJobTags(): Unit}} > * {{SparkContext.cancelJobsWithTag(tag: String): Unit}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35564) Support subexpression elimination for non-common branches of conditional expressions
[ https://issues.apache.org/jira/browse/SPARK-35564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17738670#comment-17738670 ] Ignite TC Bot commented on SPARK-35564: --- User 'peter-toth' has created a pull request for this issue: https://github.com/apache/spark/pull/41677 > Support subexpression elimination for non-common branches of conditional > expressions > > > Key: SPARK-35564 > URL: https://issues.apache.org/jira/browse/SPARK-35564 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Adam Binford >Priority: Major > > https://issues.apache.org/jira/browse/SPARK-7 added support for pulling > subexpressions out of branches of conditional expressions for expressions > present in all branches. We should be able to take this a step further and > pull out subexpressions for any branch, as long as that expression will > definitely be evaluated at least once. > Consider a common data validation example: > {code:java} > from pyspark.sql.functions import * > df = spark.createDataFrame([['word'], ['1234']]) > col = regexp_replace('_1', r'\d', '') > df = df.withColumn('numbers_removed', when(length(col) > 0, col)){code} > We only want to keep the value if it's non-empty with numbers removed, > otherwise we want it to be null. > Because we have no otherwise value, `col` is not a candidate for > subexpression elimination (you can see two regular expression replacements in > the codegen). But whenever the length is greater than 0, we will have to > execute the regular expression replacement twice. Since we know we will > always calculate `col` at least once, it makes sense to consider that as a > subexpression since we might need it again in the branch value. So we can > update the logic from: > Create a subexpression if an expression will always be evaluated at least > twice > To: > Create a subexpression if an expression will always be evaluated at least > once AND will either always or conditionally be evaluated at least twice. > The trade off is potentially another subexpression function call (for split > subexpressions) if the second evaluation doesn't happen, but this seems like > it would be worth it for when it is evaluated the second time. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44137) Change handling of iterable objects for on field in joins
[ https://issues.apache.org/jira/browse/SPARK-44137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737266#comment-17737266 ] Ignite TC Bot commented on SPARK-44137: --- User 'jhaberstroh-sharethis' has created a pull request for this issue: https://github.com/apache/spark/pull/41686 > Change handling of iterable objects for on field in joins > - > > Key: SPARK-44137 > URL: https://issues.apache.org/jira/browse/SPARK-44137 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.5.0 >Reporter: John Haberstroh >Priority: Minor > > The {{on}} field complained when I passed it a Tuple. That's because it saw > that it checked for {{list}} exactly, and so wrapped it into a list like > {{{}[on]{}}}, leading to immediate failure. This was surprising -- typically, > tuple and list should be interchangeable, and typically tuple is the more > readily accepted type. I have proposed a change that moves towards the > principle of least surprise for this situation. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44082) Generate operator does not update reference set properly
[ https://issues.apache.org/jira/browse/SPARK-44082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17737265#comment-17737265 ] Ignite TC Bot commented on SPARK-44082: --- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/41633 > Generate operator does not update reference set properly > > > Key: SPARK-44082 > URL: https://issues.apache.org/jira/browse/SPARK-44082 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.5.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > > Before > ``` > == Optimized Logical Plan == > Project [col1#2, col2#19] > +- Generate replicaterows(sum#17L, col1#2, col2#3), [2], false, [col1#2, > col2#3] >+- Filter (isnotnull(sum#17L) AND (sum#17L > 0)) > +- Aggregate [col1#2, col2#19], [col1#2, col2#19, sum(vcol#14L) AS > sum#17L] > +- Union false, false > :- Aggregate [col1#2], [1 AS vcol#14L, col1#2, first(col2#3, > false) AS col2#19] > : +- LogicalRDD [col1#2, col2#3], false > +- Project [-1 AS vcol#15L, col1#8, col2#9] >+- LogicalRDD [col1#8, col2#9], false > ``` > Couldn't find col2#3 in [col1#2,col2#19,sum#17L] > after > ``` > == Optimized Logical Plan == > Project [col1#2, col2#19] > +- Generate replicaterows(sum#17L, col1#2, col2#19), [2], false, [col1#2, > col2#19] >+- Filter (isnotnull(sum#17L) AND (sum#17L > 0)) > +- Aggregate [col1#2, col2#19], [col1#2, col2#19, sum(vcol#14L) AS > sum#17L] > +- Union false, false > :- Aggregate [col1#2], [1 AS vcol#14L, col1#2, first(col2#3, > false) AS col2#19] > : +- LogicalRDD [col1#2, col2#3], false > +- Project [-1 AS vcol#15L, col1#8, col2#9] >+- LogicalRDD [col1#8, col2#9], false > ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43924) Add misc functions to Scala and Python
[ https://issues.apache.org/jira/browse/SPARK-43924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17735789#comment-17735789 ] Ignite TC Bot commented on SPARK-43924: --- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/41689 > Add misc functions to Scala and Python > -- > > Key: SPARK-43924 > URL: https://issues.apache.org/jira/browse/SPARK-43924 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Ruifeng Zheng >Priority: Major > > Add following functions: > * uuid > * aes_encrypt > * aes_decrypt > * sha > * input_file_block_length > * input_file_block_start > * reflect > * java_method > * version > * typeof > * stack > * random > to: > * Scala API > * Python API > * Spark Connect Scala Client > * Spark Connect Python Client -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44052) Add util to get proper Column or DataFrame class for Spark Connect.
[ https://issues.apache.org/jira/browse/SPARK-44052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732941#comment-17732941 ] Ignite TC Bot commented on SPARK-44052: --- User 'itholic' has created a pull request for this issue: https://github.com/apache/spark/pull/41570 > Add util to get proper Column or DataFrame class for Spark Connect. > --- > > Key: SPARK-44052 > URL: https://issues.apache.org/jira/browse/SPARK-44052 > Project: Spark > Issue Type: Sub-task > Components: Connect, Pandas API on Spark >Affects Versions: 3.5.0 >Reporter: Haejoon Lee >Priority: Major > > There are many codes are duplicated to get proper PySparkColumn or > PySparkDataFrame, so it would be great if we have util function to > deduplicate these codes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44039) Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite
[ https://issues.apache.org/jira/browse/SPARK-44039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732112#comment-17732112 ] Ignite TC Bot commented on SPARK-44039: --- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/41572 > Improve for PlanGenerationTestSuite & ProtoToParsedPlanTestSuite > > > Key: SPARK-44039 > URL: https://issues.apache.org/jira/browse/SPARK-44039 > Project: Spark > Issue Type: Improvement > Components: Connect, Tests >Affects Versions: 3.5.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38477) Use error classes in org.apache.spark.storage
[ https://issues.apache.org/jira/browse/SPARK-38477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17732111#comment-17732111 ] Ignite TC Bot commented on SPARK-38477: --- User 'bozhang2820' has created a pull request for this issue: https://github.com/apache/spark/pull/41575 > Use error classes in org.apache.spark.storage > - > > Key: SPARK-38477 > URL: https://issues.apache.org/jira/browse/SPARK-38477 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Bo Zhang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43534) Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided
[ https://issues.apache.org/jira/browse/SPARK-43534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17724373#comment-17724373 ] Ignite TC Bot commented on SPARK-43534: --- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/41195 > Add log4j-1.2-api and log4j-slf4j2-impl to classpath if active hadoop-provided > -- > > Key: SPARK-43534 > URL: https://issues.apache.org/jira/browse/SPARK-43534 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Priority: Major > Attachments: hadoop log jars.png, log4j-1.2-api-2.20.0.jar, > log4j-slf4j2-impl-2.20.0.jar > > > Build Spark: > {code:sh} > ./dev/make-distribution.sh --name default --tgz -Phive -Phive-thriftserver > -Pyarn -Phadoop-provided > tar -zxf spark-3.5.0-SNAPSHOT-bin-default.tgz {code} > Remove the following jars from spark-3.5.0-SNAPSHOT-bin-default: > {noformat} > jars/log4j-1.2-api-2.20.0.jar > jars/log4j-slf4j2-impl-2.20.0.jar > {noformat} > Add a new log4j2.properties to spark-3.5.0-SNAPSHOT-bin-default/conf: > {code:none} > rootLogger.level = info > rootLogger.appenderRef.file.ref = File > rootLogger.appenderRef.stderr.ref = console > appender.console.type = Console > appender.console.name = console > appender.console.target = SYSTEM_ERR > appender.console.layout.type = PatternLayout > appender.console.layout.pattern = %d{yy/MM/dd HH:mm:ss,SSS} %p [%t] %c{2}:%L > : %m%n > appender.file.type = RollingFile > appender.file.name = File > appender.file.fileName = /tmp/spark/logs/spark.log > appender.file.filePattern = /tmp/spark/logs/spark.%d{MMdd-HH}.log > appender.file.append = true > appender.file.layout.type = PatternLayout > appender.file.layout.pattern = %d{yy/MM/dd HH:mm:ss,SSS} %p [%t] %c{2}:%L : > %m%n > appender.file.policies.type = Policies > appender.file.policies.time.type = TimeBasedTriggeringPolicy > appender.file.policies.time.interval = 1 > appender.file.policies.time.modulate = true > appender.file.policies.size.type = SizeBasedTriggeringPolicy > appender.file.policies.size.size = 256M > appender.file.strategy.type = DefaultRolloverStrategy > appender.file.strategy.max = 100 > {code} > Start Spark thriftserver: > {code:java} > sbin/start-thriftserver.sh > {code} > Check the log: > {code:sh} > cat /tmp/spark/logs/spark.log > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43509) Support creating multiple sessions for Spark Connect in PySpark
[ https://issues.apache.org/jira/browse/SPARK-43509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723755#comment-17723755 ] Ignite TC Bot commented on SPARK-43509: --- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/41206 > Support creating multiple sessions for Spark Connect in PySpark > --- > > Key: SPARK-43509 > URL: https://issues.apache.org/jira/browse/SPARK-43509 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43536) Statsd sink reporter reports incorrect counter metrics.
[ https://issues.apache.org/jira/browse/SPARK-43536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17723415#comment-17723415 ] Ignite TC Bot commented on SPARK-43536: --- User 'venkateshbalaji99' has created a pull request for this issue: https://github.com/apache/spark/pull/41199 > Statsd sink reporter reports incorrect counter metrics. > --- > > Key: SPARK-43536 > URL: https://issues.apache.org/jira/browse/SPARK-43536 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.3 >Reporter: Abhishek Modi >Priority: Major > > There is a mismatch between the definition of counter metrics between > dropwizard (which is used by spark) and statsD. While Dropwizard interprets > counters as cumulative metrics, statsD interprets them as delta metrics. This > causes double aggregation in statsd causing inconsistent metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40887) Allow Spark on K8s to integrate w/ Log Service
[ https://issues.apache.org/jira/browse/SPARK-40887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722105#comment-17722105 ] Ignite TC Bot commented on SPARK-40887: --- User 'turboFei' has created a pull request for this issue: https://github.com/apache/spark/pull/41139 > Allow Spark on K8s to integrate w/ Log Service > -- > > Key: SPARK-40887 > URL: https://issues.apache.org/jira/browse/SPARK-40887 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Cheng Pan >Assignee: Apache Spark >Priority: Major > > https://docs.google.com/document/d/1MfB39LD4B4Rp7MDRxZbMKMbdNSe6V6mBmMQ-gkCnM-0/edit?usp=sharing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43461) Skip compiling javadoc.jar, sources.jar and test-jar when making distribution
[ https://issues.apache.org/jira/browse/SPARK-43461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721844#comment-17721844 ] Ignite TC Bot commented on SPARK-43461: --- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/41141 > Skip compiling javadoc.jar, sources.jar and test-jar when making distribution > - > > Key: SPARK-43461 > URL: https://issues.apache.org/jira/browse/SPARK-43461 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Priority: Major > > -Dmaven.javadoc.skip=true to skip java doc > -Dskip=true to skip scala doc. Please see: > https://davidb.github.io/scala-maven-plugin/doc-jar-mojo.html#skip > -Dmaven.source.skip to skip build sources.jar > -Dmaven.test.skip to skip build test-jar -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43427) Unsigned integer types are deserialized as signed numeric equivalents
[ https://issues.apache.org/jira/browse/SPARK-43427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721698#comment-17721698 ] Ignite TC Bot commented on SPARK-43427: --- User 'justaparth' has created a pull request for this issue: https://github.com/apache/spark/pull/41108 > Unsigned integer types are deserialized as signed numeric equivalents > - > > Key: SPARK-43427 > URL: https://issues.apache.org/jira/browse/SPARK-43427 > Project: Spark > Issue Type: Bug > Components: Protobuf >Affects Versions: 3.4.0 >Reporter: Parth Upadhyay >Priority: Major > > I'm not sure if "bug" is the correct tag for this jira, but i've tagged it > like that for now since the behavior seems odd, happy to update to > "improvement" or something else based on the conversation! > h2. Issue > Protobuf supports unsigned integer types, including `uint32` and `uint64`. > When deserializing protobuf values with fields of these types, uint32 is > converted to `IntegerType` and uint64 is converted to `LongType` in the > resulting spark struct. `IntegerType` and `LongType` are > [signed|https://spark.apache.org/docs/latest/sql-ref-datatypes.html] integer > types, so this can lead to confusing results. > Namely, if a uint32 value in a stored proto is above 2^31 or a uint64 value > is above 2^63, their representation in binary will contain a 1 in the highest > bit, which when interpreted as a signed integer will come out as negative > (I.e. overflow). > I propose that we deserialize unsigned integer types into a type that can > contain them correctly, e.g. > uint32 => `LongType` > uint64 => `Decimal(20, 0)` > h2. Backwards Compatibility / Default Behavior > Should we maintain backwards compatibility and we add an option that allows > deserializing these types differently? Or should we change change the default > behavior (with an option to go back to the old way)? > I think by default it makes more sense to deserialize them as the larger > types so that it's semantically more correct. However, there may be existing > users of this library that would be affected by this behavior change. Though, > maybe we can justify the change since the function is tagged as > `Experimental` (and spark 3.4.0 was only released very recently). > h2. Precedent > I believe that unsigned integer types in parquet are deserialized in a > similar manner, i.e. put into a larger type so that the unsigned > representation natively fits. > https://issues.apache.org/jira/browse/SPARK-34817 and > https://github.com/apache/spark/pull/31921 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35198) Add support for calling debugCodegen from Python & Java
[ https://issues.apache.org/jira/browse/SPARK-35198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721475#comment-17721475 ] Ignite TC Bot commented on SPARK-35198: --- User 'juanvisoler' has created a pull request for this issue: https://github.com/apache/spark/pull/40608 > Add support for calling debugCodegen from Python & Java > --- > > Key: SPARK-35198 > URL: https://issues.apache.org/jira/browse/SPARK-35198 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 3.0.1, 3.0.2, 3.1.0, 3.1.1, 3.2.0 >Reporter: Holden Karau >Priority: Minor > Labels: starter > > Because it is implimented with an implicit conversion it's a bit complicated > to call, we should add a direct method to get debug state for Java & Python > users of Dataframes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43267) Support creating data frame from a Postgres table that contains user-defined array column
[ https://issues.apache.org/jira/browse/SPARK-43267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718376#comment-17718376 ] Ignite TC Bot commented on SPARK-43267: --- User 'juliuszsompolski' has created a pull request for this issue: https://github.com/apache/spark/pull/41005 > Support creating data frame from a Postgres table that contains user-defined > array column > - > > Key: SPARK-43267 > URL: https://issues.apache.org/jira/browse/SPARK-43267 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.0, 3.3.2 >Reporter: Sifan Huang >Priority: Blocker > > Spark SQL now doesn’t support creating data frame from a Postgres table that > contains user-defined array column. However, it used to allow such type > before the Postgres JDBC commit > (https://github.com/pgjdbc/pgjdbc/commit/375cb3795c3330f9434cee9353f0791b86125914). > The previous behavior was to handle user-defined array column as String. > Given: > * Postgres table with user-defined array column > * Function: DataFrameReader.jdbc - > https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/DataFrameReader.html#jdbc-java.lang.String-java.lang.String-java.util.Properties- > Results: > * Exception “java.sql.SQLException: Unsupported type ARRAY” is thrown > Expectation after the change: > * Function call succeeds > * User-defined array is converted as a string in Spark DataFrame > Suggested fix: > * Update “getCatalystType” function in “PostgresDialect” as > ** > {code:java} > val catalystType = toCatalystType(typeName.drop(1), size, > scale).map(ArrayType(_)) > if (catalystType.isEmpty) Some(StringType) else catalystType{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43223) KeyValueGroupedDataset#agg
[ https://issues.apache.org/jira/browse/SPARK-43223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718346#comment-17718346 ] Ignite TC Bot commented on SPARK-43223: --- User 'zhenlineo' has created a pull request for this issue: https://github.com/apache/spark/pull/40796 > KeyValueGroupedDataset#agg > -- > > Key: SPARK-43223 > URL: https://issues.apache.org/jira/browse/SPARK-43223 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Zhen Li >Priority: Major > > Adding missing agg functions in the KVGDS API -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43321) Impl Dataset#JoinWith
[ https://issues.apache.org/jira/browse/SPARK-43321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718345#comment-17718345 ] Ignite TC Bot commented on SPARK-43321: --- User 'zhenlineo' has created a pull request for this issue: https://github.com/apache/spark/pull/40997 > Impl Dataset#JoinWith > - > > Key: SPARK-43321 > URL: https://issues.apache.org/jira/browse/SPARK-43321 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Zhen Li >Priority: Major > > Impl missing method JoinWith -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43156) Correctness COUNT bug in correlated scalar subselect with `COUNT(*) is null`
[ https://issues.apache.org/jira/browse/SPARK-43156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17716312#comment-17716312 ] Ignite TC Bot commented on SPARK-43156: --- User 'jchen5' has created a pull request for this issue: https://github.com/apache/spark/pull/40946 > Correctness COUNT bug in correlated scalar subselect with `COUNT(*) is null` > > > Key: SPARK-43156 > URL: https://issues.apache.org/jira/browse/SPARK-43156 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Jack Chen >Priority: Major > > Example query: > {code:java} > spark.sql("select *, (select (count(1)) is null from t1 where t0.a = t1.c) > from t0").collect() > res6: Array[org.apache.spark.sql.Row] = Array([1,1.0,null], [2,2.0,false]) > {code} > In this subquery, count(1) always evaluates to a non-null integer value, so > count(1) is null is always false. The correct evaluation of the subquery is > always false. > We incorrectly evaluate it to null for empty groups. The reason is that > NullPropagation rewrites Aggregate [c] [isnull(count(1))] to Aggregate [c] > [false] - this rewrite would be correct normally, but in the context of a > scalar subquery it breaks our count bug handling in > RewriteCorrelatedScalarSubquery.constructLeftJoins . By the time we get > there, the query appears to not have the count bug - it looks the same as if > the original query had a subquery with select any_value(false) from r..., and > that case is _not_ subject to the count bug. > > Postgres comparison show correct always-false result: > [http://sqlfiddle.com/#!17/67822/5] > DDL for the example: > {code:java} > create or replace temp view t0 (a, b) > as values > (1, 1.0), > (2, 2.0); > create or replace temp view t1 (c, d) > as values > (2, 3.0); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43098) Should not handle the COUNT bug when the GROUP BY clause of a correlated scalar subquery is non-empty
[ https://issues.apache.org/jira/browse/SPARK-43098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17716313#comment-17716313 ] Ignite TC Bot commented on SPARK-43098: --- User 'jchen5' has created a pull request for this issue: https://github.com/apache/spark/pull/40946 > Should not handle the COUNT bug when the GROUP BY clause of a correlated > scalar subquery is non-empty > - > > Key: SPARK-43098 > URL: https://issues.apache.org/jira/browse/SPARK-43098 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: Jack Chen >Assignee: Jack Chen >Priority: Major > Fix For: 3.4.1, 3.5.0 > > > From [~allisonwang-db] : > There is no COUNT bug when the correlated equality predicates are also in the > group by clause. However, the current logic to handle the COUNT bug still > adds default aggregate function value and returns incorrect results. > > {code:java} > create view t1(c1, c2) as values (0, 1), (1, 2); > create view t2(c1, c2) as values (0, 2), (0, 3); > select c1, c2, (select count(*) from t2 where t1.c1 = t2.c1 group by c1) from > t1; > -- Correct answer: [(0, 1, 2), (1, 2, null)] > +---+---+--+ > |c1 |c2 |scalarsubquery(c1)| > +---+---+--+ > |0 |1 |2 | > |1 |2 |0 | > +---+---+--+ > {code} > > This bug affects scalar subqueries in RewriteCorrelatedScalarSubquery, but > lateral subqueries handle it correctly in DecorrelateInnerQuery. Related: > https://issues.apache.org/jira/browse/SPARK-36113 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43196) Replace reflection w/ direct calling for `ContainerLaunchContext#setTokensConf`
[ https://issues.apache.org/jira/browse/SPARK-43196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715030#comment-17715030 ] Ignite TC Bot commented on SPARK-43196: --- User 'pan3793' has created a pull request for this issue: https://github.com/apache/spark/pull/40900 > Replace reflection w/ direct calling for > `ContainerLaunchContext#setTokensConf` > --- > > Key: SPARK-43196 > URL: https://issues.apache.org/jira/browse/SPARK-43196 > Project: Spark > Issue Type: Sub-task > Components: YARN >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43179) Add option for applications to control saving of metadata in the External Shuffle Service LevelDB
[ https://issues.apache.org/jira/browse/SPARK-43179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714149#comment-17714149 ] Ignite TC Bot commented on SPARK-43179: --- User 'otterc' has created a pull request for this issue: https://github.com/apache/spark/pull/40843 > Add option for applications to control saving of metadata in the External > Shuffle Service LevelDB > - > > Key: SPARK-43179 > URL: https://issues.apache.org/jira/browse/SPARK-43179 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 3.4.0 >Reporter: Chandni Singh >Priority: Major > > Currently, the External Shuffle Service stores application metadata in > LevelDB. This is necessary to enable the shuffle server to resume serving > shuffle data for an application whose executors registered before the > NodeManager restarts. However, the metadata includes the application secret, > which is stored in LevelDB without encryption. This is a potential security > risk, particularly for applications with high security requirements. While > filesystem access control lists (ACLs) can help protect keys and > certificates, they may not be sufficient for some use cases. In response, we > have decided not to store metadata for these high-security applications in > LevelDB. As a result, these applications may experience more failures in the > event of a node restart, but we believe this trade-off is acceptable given > the increased security risk. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43187) Remove workaround for MiniKdc's BindException
[ https://issues.apache.org/jira/browse/SPARK-43187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714071#comment-17714071 ] Ignite TC Bot commented on SPARK-43187: --- User 'pan3793' has created a pull request for this issue: https://github.com/apache/spark/pull/40849 > Remove workaround for MiniKdc's BindException > - > > Key: SPARK-43187 > URL: https://issues.apache.org/jira/browse/SPARK-43187 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.5.0 >Reporter: Cheng Pan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42552) Get ParseException when run sql: "SELECT 1 UNION SELECT 1;"
[ https://issues.apache.org/jira/browse/SPARK-42552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713489#comment-17713489 ] Ignite TC Bot commented on SPARK-42552: --- User 'Hisoka-X' has created a pull request for this issue: https://github.com/apache/spark/pull/40823 > Get ParseException when run sql: "SELECT 1 UNION SELECT 1;" > --- > > Key: SPARK-42552 > URL: https://issues.apache.org/jira/browse/SPARK-42552 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.3 > Environment: Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java > 1.8.0_345) > Spark version 3.2.3-SNAPSHOT >Reporter: jiang13021 >Priority: Major > Fix For: 3.2.3 > > > When I run sql > {code:java} > scala> spark.sql("SELECT 1 UNION SELECT 1;") {code} > I get ParseException: > {code:java} > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input 'SELECT' expecting {, ';'}(line 1, pos 15)== SQL == > SELECT 1 UNION SELECT 1; > ---^^^ at > org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:127) > at > org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51) > at > org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:77) > at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:616) > at > org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111) > at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:616) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613) > ... 47 elided > {code} > If I run with parentheses , it works well > {code:java} > scala> spark.sql("(SELECT 1) UNION (SELECT 1);") > res4: org.apache.spark.sql.DataFrame = [1: int]{code} > This should be a bug > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43022) protobuf functions
[ https://issues.apache.org/jira/browse/SPARK-43022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17712468#comment-17712468 ] Ignite TC Bot commented on SPARK-43022: --- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/40654 > protobuf functions > -- > > Key: SPARK-43022 > URL: https://issues.apache.org/jira/browse/SPARK-43022 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43081) Add torch distributor data loader that loads data from spark partition data
[ https://issues.apache.org/jira/browse/SPARK-43081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17710127#comment-17710127 ] Ignite TC Bot commented on SPARK-43081: --- User 'WeichenXu123' has created a pull request for this issue: https://github.com/apache/spark/pull/40724 > Add torch distributor data loader that loads data from spark partition data > --- > > Key: SPARK-43081 > URL: https://issues.apache.org/jira/browse/SPARK-43081 > Project: Spark > Issue Type: Sub-task > Components: Connect, ML, PySpark >Affects Versions: 3.5.0 >Reporter: Weichen Xu >Assignee: Weichen Xu >Priority: Major > > Add torch distributor data loader that loads data from spark partition data. > > We can add 2 APIs like: > Adds a `TorchDistributor` method API : > {code:java} > def train_on_dataframe(self, train_function, spark_dataframe, *args, > **kwargs): > """ > Runs distributed training using provided spark DataFrame as input > data. > You should ensure the input spark DataFrame have evenly divided > partitions, > and this method starts a barrier spark job that each spark task in > the job > process one partition of the input spark DataFrame. > Parameters > -- > train_function : > Either a PyTorch function, PyTorch Lightning function that > launches distributed > training. Note that inside the function, you can call > `pyspark.ml.torch.distributor.get_spark_partition_data_loader` > API to get a torch > data loader, the data loader loads data from the corresponding > partition of the > input spark DataFrame. > spark_dataframe : > An input spark DataFrame that can be used in PyTorch > `train_function` function. > See `train_function` argument doc for details. > args : > `args` need to be the input parameters to `train_function` > function. It would look like > >>> model = distributor.run(train, 1e-3, 64) > where train is a function and 1e-3 and 64 are regular numeric > inputs to the function. > kwargs : > `kwargs` need to be the key-work input parameters to > `train_function` function. > It would look like > >>> model = distributor.run(train, tol=1e-3, max_iter=64) > where train is a function that has 2 arguments `tol` and > `max_iter`. > Returns > --- > Returns the output of `train_function` called with args inside > spark rank 0 task. > """{code} > > Adds an loader API: > > {code:java} > def get_spark_partition_data_loader(num_samples, batch_size, prefetch=2): > """ > This function must be called inside the `train_function` where > `train_function` > is the input argument of `TorchDistributor.train_on_dataframe`. > The function returns a pytorch data loader that loads data from > the corresponding spark partition data. > Parameters > -- > num_samples : > Number of samples to generate per epoch. If `num_samples` is less > than the number of > rows in the spark partition, it generate the first `num_samples` rows > of > the spark partition, if `num_samples` is greater than the number of > rows in the spark partition, then after the iterator loaded all rows > from the partition, > it wraps round back to the first row. > batch_size: > How many samples per batch to load. > prefetch: > Number of batches loaded in advance. > """{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org