date:20230222

[jira] [Created] (SPARK-42535) The HA support of Spark Thrift Server

2023-02-22 Thread WangHL (Jira)

WangHL created SPARK-42535:
--

 Summary: The HA support of Spark Thrift Server
 Key: SPARK-42535
 URL: https://issues.apache.org/jira/browse/SPARK-42535
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.2
Reporter: WangHL


When there are many Spark SQL connects on Spark Thrift Server.if the Thrift 
Server is down ,the connect cannot get service .So we need to consider the High 
Availability support for Spark Thrift Server . 
We want to import the pattern of HiveServer HA to provide ThriftServer HA.  we 
need to get HA support as the 'addServerInstanceToZooKeeper' method of 
HiveServer2. 
Use zookeeper to choose the active thrift server when you connect the Spark 
Thrift Server.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42286) Fix internal error for valid CASE WHEN expression with CAST when inserting into a table

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692512#comment-17692512
 ] 

Apache Spark commented on SPARK-42286:
--

User 'RunyaoChen' has created a pull request for this issue:
https://github.com/apache/spark/pull/40140

> Fix internal error for valid CASE WHEN expression with CAST when inserting 
> into a table
> ---
>
> Key: SPARK-42286
> URL: https://issues.apache.org/jira/browse/SPARK-42286
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Runyao.Chen
>Assignee: Runyao.Chen
>Priority: Major
> Fix For: 3.4.0
>
>
> ```
> spark-sql> create or replace table es570639t1 as select x FROM values (1), 
> (2), (3) as tab(x);
> spark-sql> create or replace table es570639t2 (x Decimal(9, 0));
> spark-sql> insert into es570639t2 select 0 - (case when x = 1 then 1 else x 
> end) from es570639t1 where x = 1;
> ```
> hits the following internal error
> org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or 
> ExpressionProxy of Cast
>  
> Stack trace:
> org.apache.spark.SparkException: [INTERNAL_ERROR] Child is not Cast or 
> ExpressionProxy of Cast at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:78) at 
> org.apache.spark.SparkException$.internalError(SparkException.scala:82) at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.checkChild(Cast.scala:2693)
>  at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2697)
>  at 
> org.apache.spark.sql.catalyst.expressions.CheckOverflowInTableInsert.withNewChildInternal(Cast.scala:2683)
>  at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.$anonfun$mapChildren$5(TreeNode.scala:1315)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:106)
>  at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1314)
>  at 
> org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1309)
>  at 
> org.apache.spark.sql.catalyst.expressions.UnaryExpression.mapChildren(Expression.scala:636)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUpWithPruning(TreeNode.scala:570)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformUpWithPruning$1(TreeNode.scala:570)
>  
> This internal error comes from `CheckOverflowInTableInsert``checkChild`, 
> where we covered only `Cast` expr and `ExpressionProxy` expr, but not the 
> `CaseWhen` expr.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-39859) Support v2 `DESCRIBE TABLE EXTENDED` for columns

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-39859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692503#comment-17692503
 ] 

Apache Spark commented on SPARK-39859:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/40139

> Support v2 `DESCRIBE TABLE EXTENDED` for columns
> 
>
> Key: SPARK-39859
> URL: https://issues.apache.org/jira/browse/SPARK-39859
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42484) Better logging for UnsafeRowUtils

2023-02-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-42484.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Better logging for UnsafeRowUtils
> -
>
> Key: SPARK-42484
> URL: https://issues.apache.org/jira/browse/SPARK-42484
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.3
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
> Fix For: 3.5.0
>
>
> Current `UnsafeRowUtils.validateStructuralIntegrity` only returns a boolean, 
> making it hard to track exactly where the problem is. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42484) Better logging for UnsafeRowUtils

2023-02-22 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-42484:
---

Assignee: Wei Liu

> Better logging for UnsafeRowUtils
> -
>
> Key: SPARK-42484
> URL: https://issues.apache.org/jira/browse/SPARK-42484
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.3
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>
> Current `UnsafeRowUtils.validateStructuralIntegrity` only returns a boolean, 
> making it hard to track exactly where the problem is. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42049) Improve AliasAwareOutputExpression

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692481#comment-17692481
 ] 

Apache Spark commented on SPARK-42049:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/40137

> Improve AliasAwareOutputExpression
> --
>
> Key: SPARK-42049
> URL: https://issues.apache.org/jira/browse/SPARK-42049
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: XiDuo You
>Assignee: Peter Toth
>Priority: Major
> Fix For: 3.4.0
>
>
> AliasAwareOutputExpression now does not support if an attribute has more than 
> one alias.
> AliasAwareOutputExpression should also work for LogicalPlan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41793) Incorrect result for window frames defined by a range clause on large decimals

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41793:


Assignee: (was: Apache Spark)

> Incorrect result for window frames defined by a range clause on large 
> decimals 
> ---
>
> Key: SPARK-41793
> URL: https://issues.apache.org/jira/browse/SPARK-41793
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gera Shegalov
>Priority: Blocker
>  Labels: correctness
>
> Context 
> https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686
> The following windowing query on a simple two-row input should produce two 
> non-empty windows as a result
> {code}
> from pprint import pprint
> data = [
>   ('9223372036854775807', '11342371013783243717493546650944543.47'),
>   ('9223372036854775807', '.99')
> ]
> df1 = spark.createDataFrame(data, 'a STRING, b STRING')
> df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)'))
> df2.createOrReplaceTempView('test_table')
> df = sql('''
>   SELECT 
> COUNT(1) OVER (
>   PARTITION BY a 
>   ORDER BY b ASC 
>   RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING
> ) AS CNT_1 
>   FROM 
> test_table
>   ''')
> res = df.collect()
> df.explain(True)
> pprint(res)
> {code}
> Spark 3.4.0-SNAPSHOT output:
> {code}
> [Row(CNT_1=1), Row(CNT_1=0)]
> {code}
> Spark 3.3.1 output as expected:
> {code}
> Row(CNT_1=1), Row(CNT_1=1)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41793) Incorrect result for window frames defined by a range clause on large decimals

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692480#comment-17692480
 ] 

Apache Spark commented on SPARK-41793:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/40138

> Incorrect result for window frames defined by a range clause on large 
> decimals 
> ---
>
> Key: SPARK-41793
> URL: https://issues.apache.org/jira/browse/SPARK-41793
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gera Shegalov
>Priority: Blocker
>  Labels: correctness
>
> Context 
> https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686
> The following windowing query on a simple two-row input should produce two 
> non-empty windows as a result
> {code}
> from pprint import pprint
> data = [
>   ('9223372036854775807', '11342371013783243717493546650944543.47'),
>   ('9223372036854775807', '.99')
> ]
> df1 = spark.createDataFrame(data, 'a STRING, b STRING')
> df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)'))
> df2.createOrReplaceTempView('test_table')
> df = sql('''
>   SELECT 
> COUNT(1) OVER (
>   PARTITION BY a 
>   ORDER BY b ASC 
>   RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING
> ) AS CNT_1 
>   FROM 
> test_table
>   ''')
> res = df.collect()
> df.explain(True)
> pprint(res)
> {code}
> Spark 3.4.0-SNAPSHOT output:
> {code}
> [Row(CNT_1=1), Row(CNT_1=0)]
> {code}
> Spark 3.3.1 output as expected:
> {code}
> Row(CNT_1=1), Row(CNT_1=1)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-41793) Incorrect result for window frames defined by a range clause on large decimals

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-41793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41793:


Assignee: Apache Spark

> Incorrect result for window frames defined by a range clause on large 
> decimals 
> ---
>
> Key: SPARK-41793
> URL: https://issues.apache.org/jira/browse/SPARK-41793
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gera Shegalov
>Assignee: Apache Spark
>Priority: Blocker
>  Labels: correctness
>
> Context 
> https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686
> The following windowing query on a simple two-row input should produce two 
> non-empty windows as a result
> {code}
> from pprint import pprint
> data = [
>   ('9223372036854775807', '11342371013783243717493546650944543.47'),
>   ('9223372036854775807', '.99')
> ]
> df1 = spark.createDataFrame(data, 'a STRING, b STRING')
> df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)'))
> df2.createOrReplaceTempView('test_table')
> df = sql('''
>   SELECT 
> COUNT(1) OVER (
>   PARTITION BY a 
>   ORDER BY b ASC 
>   RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING
> ) AS CNT_1 
>   FROM 
> test_table
>   ''')
> res = df.collect()
> df.explain(True)
> pprint(res)
> {code}
> Spark 3.4.0-SNAPSHOT output:
> {code}
> [Row(CNT_1=1), Row(CNT_1=0)]
> {code}
> Spark 3.3.1 output as expected:
> {code}
> Row(CNT_1=1), Row(CNT_1=1)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42529) Support Cube and Rollup

2023-02-22 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-42529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42529.
---
Fix Version/s: 3.4.1
   Resolution: Fixed

> Support Cube and Rollup
> ---
>
> Key: SPARK-42529
> URL: https://issues.apache.org/jira/browse/SPARK-42529
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42515) ClientE2ETestSuite local test failed

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692469#comment-17692469
 ] 

Apache Spark commented on SPARK-42515:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40136

> ClientE2ETestSuite local test failed
> 
>
> Key: SPARK-42515
> URL: https://issues.apache.org/jira/browse/SPARK-42515
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
>  
> local run `build/sbt clean "connect-client-jvm/test"`, 
> `ClientE2ETestSuite#write table` failed, GA not failed.
>  
> {code:java}
> [info] - rite table *** FAILED *** (41 milliseconds)
> [info]   io.grpc.StatusRuntimeException: UNKNOWN: 
> org/apache/parquet/hadoop/api/ReadSupport
> [info]   at io.grpc.Status.asRuntimeException(Status.java:535)
> [info]   at 
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
> [info]   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
> [info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
> [info]   at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:169)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.executeWriteOperation(DataFrameWriter.scala:255)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:338)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$12(ClientE2ETestSuite.scala:145)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
> [info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info]   at org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> [info]   at org.scalatest.Suite.run(Suite.scala:1114)
> [info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
> [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.org$scalatest$BeforeAndAfterAll$$super$run(ClientE2ETestSuite.scala:33)
> [info]   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> [info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> [info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.run(ClientE2ETestSuite.scala:33)
> [info]   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
> [i

[jira] [Assigned] (SPARK-42515) ClientE2ETestSuite local test failed

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42515:


Assignee: (was: Apache Spark)

> ClientE2ETestSuite local test failed
> 
>
> Key: SPARK-42515
> URL: https://issues.apache.org/jira/browse/SPARK-42515
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Priority: Minor
>
>  
> local run `build/sbt clean "connect-client-jvm/test"`, 
> `ClientE2ETestSuite#write table` failed, GA not failed.
>  
> {code:java}
> [info] - rite table *** FAILED *** (41 milliseconds)
> [info]   io.grpc.StatusRuntimeException: UNKNOWN: 
> org/apache/parquet/hadoop/api/ReadSupport
> [info]   at io.grpc.Status.asRuntimeException(Status.java:535)
> [info]   at 
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
> [info]   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
> [info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
> [info]   at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:169)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.executeWriteOperation(DataFrameWriter.scala:255)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:338)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$12(ClientE2ETestSuite.scala:145)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
> [info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info]   at org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> [info]   at org.scalatest.Suite.run(Suite.scala:1114)
> [info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
> [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.org$scalatest$BeforeAndAfterAll$$super$run(ClientE2ETestSuite.scala:33)
> [info]   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> [info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> [info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.run(ClientE2ETestSuite.scala:33)
> [info]   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
> [info]   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
> [info]   at sbt.ForkMain$Run.l

[jira] [Assigned] (SPARK-42515) ClientE2ETestSuite local test failed

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42515:


Assignee: Apache Spark

> ClientE2ETestSuite local test failed
> 
>
> Key: SPARK-42515
> URL: https://issues.apache.org/jira/browse/SPARK-42515
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>
>  
> local run `build/sbt clean "connect-client-jvm/test"`, 
> `ClientE2ETestSuite#write table` failed, GA not failed.
>  
> {code:java}
> [info] - rite table *** FAILED *** (41 milliseconds)
> [info]   io.grpc.StatusRuntimeException: UNKNOWN: 
> org/apache/parquet/hadoop/api/ReadSupport
> [info]   at io.grpc.Status.asRuntimeException(Status.java:535)
> [info]   at 
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
> [info]   at 
> scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943)
> [info]   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
> [info]   at org.apache.spark.sql.SparkSession.execute(SparkSession.scala:169)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.executeWriteOperation(DataFrameWriter.scala:255)
> [info]   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:338)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$12(ClientE2ETestSuite.scala:145)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
> [info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
> [info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
> [info]   at org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
> [info]   at scala.collection.immutable.List.foreach(List.scala:431)
> [info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
> [info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
> [info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
> [info]   at org.scalatest.Suite.run(Suite.scala:1114)
> [info]   at org.scalatest.Suite.run$(Suite.scala:1096)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
> [info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.org$scalatest$BeforeAndAfterAll$$super$run(ClientE2ETestSuite.scala:33)
> [info]   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213)
> [info]   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
> [info]   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
> [info]   at 
> org.apache.spark.sql.ClientE2ETestSuite.run(ClientE2ETestSuite.scala:33)
> [info]   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
> [info]   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
> [info

[jira] [Assigned] (SPARK-42444) DataFrame.drop should handle multi columns properly

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42444:


Assignee: (was: Apache Spark)

> DataFrame.drop should handle multi columns properly
> ---
>
> Key: SPARK-42444
> URL: https://issues.apache.org/jira/browse/SPARK-42444
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Blocker
>
> {code:java}
> from pyspark.sql import Row
> df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], 
> ["age", "name"])
> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, 
> name="Bob")])
> df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show()
> {code}
> This works in 3.3
> {code:java}
> +--+
> |height|
> +--+
> |85|
> |80|
> +--+
> {code}
> but fails in 3.4
> {code:java}
> ---
> AnalysisException Traceback (most recent call last)
> Cell In[1], line 4
>   2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, 
> "Bob")], ["age", "name"])
>   3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), 
> Row(height=85, name="Bob")])
> > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', 
> 'age').show()
> File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in 
> DataFrame.drop(self, *cols)
>4911 jcols = [_to_java_column(c) for c in cols]
>4912 first_column, *remaining_columns = jcols
> -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns))
>4915 return DataFrame(jdf, self.sparkSession)
> File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, 
> in JavaMember.__call__(self, *args)
>1316 command = proto.CALL_COMMAND_NAME +\
>1317 self.command_header +\
>1318 args_command +\
>1319 proto.END_COMMAND_PART
>1321 answer = self.gateway_client.send_command(command)
> -> 1322 return_value = get_return_value(
>1323 answer, self.gateway_client, self.target_id, self.name)
>1325 for temp_arg in temp_args:
>1326 if hasattr(temp_arg, "_detach"):
> File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in 
> capture_sql_exception..deco(*a, **kw)
> 155 converted = convert_exception(e.java_exception)
> 156 if not isinstance(converted, UnknownException):
> 157 # Hide where the exception came from that shows a non-Pythonic
> 158 # JVM exception message.
> --> 159 raise converted from None
> 160 else:
> 161 raise
> AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could 
> be: [`name`, `name`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42444) DataFrame.drop should handle multi columns properly

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692467#comment-17692467
 ] 

Apache Spark commented on SPARK-42444:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40135

> DataFrame.drop should handle multi columns properly
> ---
>
> Key: SPARK-42444
> URL: https://issues.apache.org/jira/browse/SPARK-42444
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Blocker
>
> {code:java}
> from pyspark.sql import Row
> df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], 
> ["age", "name"])
> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, 
> name="Bob")])
> df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show()
> {code}
> This works in 3.3
> {code:java}
> +--+
> |height|
> +--+
> |85|
> |80|
> +--+
> {code}
> but fails in 3.4
> {code:java}
> ---
> AnalysisException Traceback (most recent call last)
> Cell In[1], line 4
>   2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, 
> "Bob")], ["age", "name"])
>   3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), 
> Row(height=85, name="Bob")])
> > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', 
> 'age').show()
> File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in 
> DataFrame.drop(self, *cols)
>4911 jcols = [_to_java_column(c) for c in cols]
>4912 first_column, *remaining_columns = jcols
> -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns))
>4915 return DataFrame(jdf, self.sparkSession)
> File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, 
> in JavaMember.__call__(self, *args)
>1316 command = proto.CALL_COMMAND_NAME +\
>1317 self.command_header +\
>1318 args_command +\
>1319 proto.END_COMMAND_PART
>1321 answer = self.gateway_client.send_command(command)
> -> 1322 return_value = get_return_value(
>1323 answer, self.gateway_client, self.target_id, self.name)
>1325 for temp_arg in temp_args:
>1326 if hasattr(temp_arg, "_detach"):
> File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in 
> capture_sql_exception..deco(*a, **kw)
> 155 converted = convert_exception(e.java_exception)
> 156 if not isinstance(converted, UnknownException):
> 157 # Hide where the exception came from that shows a non-Pythonic
> 158 # JVM exception message.
> --> 159 raise converted from None
> 160 else:
> 161 raise
> AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could 
> be: [`name`, `name`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42444) DataFrame.drop should handle multi columns properly

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692466#comment-17692466
 ] 

Apache Spark commented on SPARK-42444:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40135

> DataFrame.drop should handle multi columns properly
> ---
>
> Key: SPARK-42444
> URL: https://issues.apache.org/jira/browse/SPARK-42444
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Blocker
>
> {code:java}
> from pyspark.sql import Row
> df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], 
> ["age", "name"])
> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, 
> name="Bob")])
> df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show()
> {code}
> This works in 3.3
> {code:java}
> +--+
> |height|
> +--+
> |85|
> |80|
> +--+
> {code}
> but fails in 3.4
> {code:java}
> ---
> AnalysisException Traceback (most recent call last)
> Cell In[1], line 4
>   2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, 
> "Bob")], ["age", "name"])
>   3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), 
> Row(height=85, name="Bob")])
> > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', 
> 'age').show()
> File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in 
> DataFrame.drop(self, *cols)
>4911 jcols = [_to_java_column(c) for c in cols]
>4912 first_column, *remaining_columns = jcols
> -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns))
>4915 return DataFrame(jdf, self.sparkSession)
> File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, 
> in JavaMember.__call__(self, *args)
>1316 command = proto.CALL_COMMAND_NAME +\
>1317 self.command_header +\
>1318 args_command +\
>1319 proto.END_COMMAND_PART
>1321 answer = self.gateway_client.send_command(command)
> -> 1322 return_value = get_return_value(
>1323 answer, self.gateway_client, self.target_id, self.name)
>1325 for temp_arg in temp_args:
>1326 if hasattr(temp_arg, "_detach"):
> File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in 
> capture_sql_exception..deco(*a, **kw)
> 155 converted = convert_exception(e.java_exception)
> 156 if not isinstance(converted, UnknownException):
> 157 # Hide where the exception came from that shows a non-Pythonic
> 158 # JVM exception message.
> --> 159 raise converted from None
> 160 else:
> 161 raise
> AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could 
> be: [`name`, `name`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42444) DataFrame.drop should handle multi columns properly

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42444:


Assignee: Apache Spark

> DataFrame.drop should handle multi columns properly
> ---
>
> Key: SPARK-42444
> URL: https://issues.apache.org/jira/browse/SPARK-42444
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Blocker
>
> {code:java}
> from pyspark.sql import Row
> df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], 
> ["age", "name"])
> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, 
> name="Bob")])
> df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show()
> {code}
> This works in 3.3
> {code:java}
> +--+
> |height|
> +--+
> |85|
> |80|
> +--+
> {code}
> but fails in 3.4
> {code:java}
> ---
> AnalysisException Traceback (most recent call last)
> Cell In[1], line 4
>   2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, 
> "Bob")], ["age", "name"])
>   3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), 
> Row(height=85, name="Bob")])
> > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', 
> 'age').show()
> File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in 
> DataFrame.drop(self, *cols)
>4911 jcols = [_to_java_column(c) for c in cols]
>4912 first_column, *remaining_columns = jcols
> -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns))
>4915 return DataFrame(jdf, self.sparkSession)
> File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, 
> in JavaMember.__call__(self, *args)
>1316 command = proto.CALL_COMMAND_NAME +\
>1317 self.command_header +\
>1318 args_command +\
>1319 proto.END_COMMAND_PART
>1321 answer = self.gateway_client.send_command(command)
> -> 1322 return_value = get_return_value(
>1323 answer, self.gateway_client, self.target_id, self.name)
>1325 for temp_arg in temp_args:
>1326 if hasattr(temp_arg, "_detach"):
> File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in 
> capture_sql_exception..deco(*a, **kw)
> 155 converted = convert_exception(e.java_exception)
> 156 if not isinstance(converted, UnknownException):
> 157 # Hide where the exception came from that shows a non-Pythonic
> 158 # JVM exception message.
> --> 159 raise converted from None
> 160 else:
> 161 raise
> AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could 
> be: [`name`, `name`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41793) Incorrect result for window frames defined by a range clause on large decimals

2023-02-22 Thread Wenchen Fan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692451#comment-17692451
 ] 

Wenchen Fan commented on SPARK-41793:
-

When we are doing window operator computing per partition, it's local decimal 
calculations and we can temporarily go beyond the decimal precision limitation, 
because `Decimal` is backed by `java.math.BigDecimal`. We should only check 
overflow before writing out decimal values. There is an expression 
`DecimalAddNoOverflowCheck` and we should use it in the window operator.

[~ulysses] can you help to fix it? This is the same idea we use in Sum/Average.

> Incorrect result for window frames defined by a range clause on large 
> decimals 
> ---
>
> Key: SPARK-41793
> URL: https://issues.apache.org/jira/browse/SPARK-41793
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gera Shegalov
>Priority: Blocker
>  Labels: correctness
>
> Context 
> https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686
> The following windowing query on a simple two-row input should produce two 
> non-empty windows as a result
> {code}
> from pprint import pprint
> data = [
>   ('9223372036854775807', '11342371013783243717493546650944543.47'),
>   ('9223372036854775807', '.99')
> ]
> df1 = spark.createDataFrame(data, 'a STRING, b STRING')
> df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)'))
> df2.createOrReplaceTempView('test_table')
> df = sql('''
>   SELECT 
> COUNT(1) OVER (
>   PARTITION BY a 
>   ORDER BY b ASC 
>   RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING
> ) AS CNT_1 
>   FROM 
> test_table
>   ''')
> res = df.collect()
> df.explain(True)
> pprint(res)
> {code}
> Spark 3.4.0-SNAPSHOT output:
> {code}
> [Row(CNT_1=1), Row(CNT_1=0)]
> {code}
> Spark 3.3.1 output as expected:
> {code}
> Row(CNT_1=1), Row(CNT_1=1)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42444) DataFrame.drop should handle multi columns properly

2023-02-22 Thread Ruifeng Zheng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692442#comment-17692442
 ] 

Ruifeng Zheng commented on SPARK-42444:
---

I am going to fix this one

> DataFrame.drop should handle multi columns properly
> ---
>
> Key: SPARK-42444
> URL: https://issues.apache.org/jira/browse/SPARK-42444
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Blocker
>
> {code:java}
> from pyspark.sql import Row
> df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], 
> ["age", "name"])
> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, 
> name="Bob")])
> df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show()
> {code}
> This works in 3.3
> {code:java}
> +--+
> |height|
> +--+
> |85|
> |80|
> +--+
> {code}
> but fails in 3.4
> {code:java}
> ---
> AnalysisException Traceback (most recent call last)
> Cell In[1], line 4
>   2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, 
> "Bob")], ["age", "name"])
>   3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), 
> Row(height=85, name="Bob")])
> > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', 
> 'age').show()
> File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in 
> DataFrame.drop(self, *cols)
>4911 jcols = [_to_java_column(c) for c in cols]
>4912 first_column, *remaining_columns = jcols
> -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns))
>4915 return DataFrame(jdf, self.sparkSession)
> File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, 
> in JavaMember.__call__(self, *args)
>1316 command = proto.CALL_COMMAND_NAME +\
>1317 self.command_header +\
>1318 args_command +\
>1319 proto.END_COMMAND_PART
>1321 answer = self.gateway_client.send_command(command)
> -> 1322 return_value = get_return_value(
>1323 answer, self.gateway_client, self.target_id, self.name)
>1325 for temp_arg in temp_args:
>1326 if hasattr(temp_arg, "_detach"):
> File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in 
> capture_sql_exception..deco(*a, **kw)
> 155 converted = convert_exception(e.java_exception)
> 156 if not isinstance(converted, UnknownException):
> 157 # Hide where the exception came from that shows a non-Pythonic
> 158 # JVM exception message.
> --> 159 raise converted from None
> 160 else:
> 161 raise
> AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could 
> be: [`name`, `name`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42534) Fix DB2 Limit clause

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692440#comment-17692440
 ] 

Apache Spark commented on SPARK-42534:
--

User 'sadikovi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40134

> Fix DB2 Limit clause
> 
>
> Key: SPARK-42534
> URL: https://issues.apache.org/jira/browse/SPARK-42534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Ivan Sadikov
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42534) Fix DB2 Limit clause

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42534:


Assignee: (was: Apache Spark)

> Fix DB2 Limit clause
> 
>
> Key: SPARK-42534
> URL: https://issues.apache.org/jira/browse/SPARK-42534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Ivan Sadikov
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42534) Fix DB2 Limit clause

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692439#comment-17692439
 ] 

Apache Spark commented on SPARK-42534:
--

User 'sadikovi' has created a pull request for this issue:
https://github.com/apache/spark/pull/40134

> Fix DB2 Limit clause
> 
>
> Key: SPARK-42534
> URL: https://issues.apache.org/jira/browse/SPARK-42534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Ivan Sadikov
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42534) Fix DB2 Limit clause

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42534:


Assignee: Apache Spark

> Fix DB2 Limit clause
> 
>
> Key: SPARK-42534
> URL: https://issues.apache.org/jira/browse/SPARK-42534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Ivan Sadikov
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42534) Fix DB2 Limit clause

2023-02-22 Thread Ivan Sadikov (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692438#comment-17692438
 ] 

Ivan Sadikov commented on SPARK-42534:
--

I am going to open a PR to fix this.

> Fix DB2 Limit clause
> 
>
> Key: SPARK-42534
> URL: https://issues.apache.org/jira/browse/SPARK-42534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Ivan Sadikov
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42534) Fix DB2 Limit clause

2023-02-22 Thread Ivan Sadikov (Jira)

Ivan Sadikov created SPARK-42534:


 Summary: Fix DB2 Limit clause
 Key: SPARK-42534
 URL: https://issues.apache.org/jira/browse/SPARK-42534
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Ivan Sadikov






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42530) Remove Hadoop 2 from PySpark installation guide

2023-02-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42530.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40127
[https://github.com/apache/spark/pull/40127]

> Remove Hadoop 2 from PySpark installation guide
> ---
>
> Key: SPARK-42530
> URL: https://issues.apache.org/jira/browse/SPARK-42530
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42533) SSL support for Scala Client

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692420#comment-17692420
 ] 

Apache Spark commented on SPARK-42533:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40133

> SSL support for Scala Client
> 
>
> Key: SPARK-42533
> URL: https://issues.apache.org/jira/browse/SPARK-42533
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Add the basic encryption support for scala client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42533) SSL support for Scala Client

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42533:


Assignee: (was: Apache Spark)

> SSL support for Scala Client
> 
>
> Key: SPARK-42533
> URL: https://issues.apache.org/jira/browse/SPARK-42533
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Add the basic encryption support for scala client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42533) SSL support for Scala Client

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42533:


Assignee: Apache Spark

> SSL support for Scala Client
> 
>
> Key: SPARK-42533
> URL: https://issues.apache.org/jira/browse/SPARK-42533
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Apache Spark
>Priority: Major
>
> Add the basic encryption support for scala client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42533) SSL support for Scala Client

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692417#comment-17692417
 ] 

Apache Spark commented on SPARK-42533:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40133

> SSL support for Scala Client
> 
>
> Key: SPARK-42533
> URL: https://issues.apache.org/jira/browse/SPARK-42533
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Add the basic encryption support for scala client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42533) SSL support for Scala Client

2023-02-22 Thread Zhen Li (Jira)

Zhen Li created SPARK-42533:
---

 Summary: SSL support for Scala Client
 Key: SPARK-42533
 URL: https://issues.apache.org/jira/browse/SPARK-42533
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Zhen Li


Add the basic encryption support for scala client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-42518) Scala client Write API V2

2023-02-22 Thread Zhen Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li closed SPARK-42518.
---

> Scala client Write API V2
> -
>
> Key: SPARK-42518
> URL: https://issues.apache.org/jira/browse/SPARK-42518
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Impl the Dataset#writeTo method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-40378) What is React Native.

2023-02-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen deleted SPARK-40378:
-


> What is React Native.
> -
>
> Key: SPARK-40378
> URL: https://issues.apache.org/jira/browse/SPARK-40378
> Project: Spark
>  Issue Type: Bug
>Reporter: Nikhil Sharma
>Priority: Major
>
> Content deleted as spam



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40378) What is React Native.

2023-02-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-40378:
-

> What is React Native.
> -
>
> Key: SPARK-40378
> URL: https://issues.apache.org/jira/browse/SPARK-40378
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 3.0.3
>Reporter: Nikhil Sharma
>Priority: Major
>
> Content deleted as spam



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-35563) [SQL] Window operations with over Int.MaxValue + 1 rows can silently drop rows

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-35563 ]


Sean R. Owen deleted comment on SPARK-35563:
--

was (Author: JIRAUSER295436):
Thank you for sharing such good information. Very informative and effective 
post. 

[Rails 
Course|https://www.igmguru.com/digital-marketing-programming/ruby-on-rails-certification-training/]

> [SQL] Window operations with over Int.MaxValue + 1 rows can silently drop rows
> --
>
> Key: SPARK-35563
> URL: https://issues.apache.org/jira/browse/SPARK-35563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.2
>Reporter: Robert Joseph Evans
>Priority: Major
>  Labels: data-loss
>
> I think this impacts a lot more versions of Spark, but I don't know for sure 
> because it takes a long time to test. As a part of doing corner case 
> validation testing for spark rapids I found that if a window function has 
> more than {{Int.MaxValue + 1}} rows the result is silently truncated to that 
> many rows. I have only tested this on 3.0.2 with {{row_number}}, but I 
> suspect it will impact others as well. This is a really rare corner case, but 
> because it is silent data corruption I personally think it is quite serious.
> {code:scala}
> import org.apache.spark.sql.expressions.Window
> val windowSpec = Window.partitionBy("a").orderBy("b")
> val df = spark.range(Int.MaxValue.toLong + 100).selectExpr(s"1 as a", "id as 
> b")
> spark.time(df.select(col("a"), col("b"), 
> row_number().over(windowSpec).alias("rn")).orderBy(desc("a"), 
> desc("b")).select((col("rn") < 0).alias("dir")).groupBy("dir").count.show(20))
> +-+--+
>   
> |  dir| count|
> +-+--+
> |false|2147483647|
> | true| 1|
> +-+--+
> Time taken: 1139089 ms
> Int.MaxValue.toLong + 100
> res15: Long = 2147483747
> 2147483647L + 1
> res16: Long = 2147483648
> {code}
> I had to make sure that I ran the above with at least 64GiB of heap for the 
> executor (I did it in local mode and it worked, but took forever to run)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40378) What is React Native.

2023-02-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-40378:
-

> What is React Native.
> -
>
> Key: SPARK-40378
> URL: https://issues.apache.org/jira/browse/SPARK-40378
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 3.0.3
>Reporter: Nikhil Sharma
>Priority: Major
>
> React Native is an open-source framework for building mobile apps. It was 
> created by Facebook and is designed for cross-platform capability. It can be 
> tough to choose between an excellent user experience, a beautiful user 
> interface, and fast processing, but [React Native online 
> course|https://www.igmguru.com/digital-marketing-programming/react-native-training/]
>  makes that decision an easy one with powerful native development. Jordan 
> Walke found a way to generate UI elements from a javascript thread and 
> applied it to iOS to build the first native application.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-40819) Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type instead of automatically converting to LongType

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-40819 ]


Sean R. Owen deleted comment on SPARK-40819:
--

was (Author: JIRAUSER295436):
Thank you for sharing such good information. Very informative and effective 
post. 

[https://www.igmguru.com/digital-marketing-programming/react-native-training/]

> Parquet INT64 (TIMESTAMP(NANOS,true)) now throwing Illegal Parquet type 
> instead of automatically converting to LongType 
> 
>
> Key: SPARK-40819
> URL: https://issues.apache.org/jira/browse/SPARK-40819
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.3.2, 3.4.0
>Reporter: Alfred Davidson
>Assignee: Alfred Davidson
>Priority: Critical
>  Labels: regression
> Fix For: 3.2.4, 3.3.2, 3.4.0
>
>
> Since 3.2 parquet files containing attributes with type "INT64 
> (TIMESTAMP(NANOS, true))" are no longer readable and attempting to read 
> throws:
>  
> {code:java}
> Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: 
> INT64 (TIMESTAMP(NANOS,true))
>   at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.illegalParquetTypeError(QueryCompilationErrors.scala:1284)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.illegalType$1(ParquetSchemaConverter.scala:105)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertPrimitiveField(ParquetSchemaConverter.scala:174)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convertField(ParquetSchemaConverter.scala:90)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.$anonfun$convert$1(ParquetSchemaConverter.scala:72)
>   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
>   at scala.collection.Iterator.foreach(Iterator.scala:941)
>   at scala.collection.Iterator.foreach$(Iterator.scala:941)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
>   at scala.collection.TraversableLike.map(TraversableLike.scala:238)
>   at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:108)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:66)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.convert(ParquetSchemaConverter.scala:63)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$readSchemaFromFooter$2(ParquetFileFormat.scala:548)
>   at scala.Option.getOrElse(Option.scala:189)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.readSchemaFromFooter(ParquetFileFormat.scala:548)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$2(ParquetFileFormat.scala:528)
>   at scala.collection.immutable.Stream.map(Stream.scala:418)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$1(ParquetFileFormat.scala:528)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$.$anonfun$mergeSchemasInParallel$1$adapted(ParquetFileFormat.scala:521)
>   at 
> org.apache.spark.sql.execution.datasources.SchemaMergeUtils$.$anonfun$mergeSchemasInParallel$2(SchemaMergeUtils.scala:76)
>  {code}
> Prior to 3.2 successfully reads the parquet automatically converting to a 
> LongType.
> I believe work part of https://issues.apache.org/jira/browse/SPARK-34661 
> introduced the change in behaviour, more specifically here: 
> [https://github.com/apache/spark/pull/31776/files#diff-3730a913c4b95edf09fb78f8739c538bae53f7269555b6226efe7ccee1901b39R154]
>  which throws the QueryCompilationErrors.illegalParquetTypeError



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-22588) SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values

2023-02-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-22588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-22588:
-

> SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values
> -
>
> Key: SPARK-22588
> URL: https://issues.apache.org/jira/browse/SPARK-22588
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Saanvi Sharma
>Priority: Minor
>  Labels: dynamodb, spark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Content deleted as spam



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-22588) SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values

2023-02-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-22588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen deleted SPARK-22588:
-


> SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values
> -
>
> Key: SPARK-22588
> URL: https://issues.apache.org/jira/browse/SPARK-22588
> Project: Spark
>  Issue Type: Question
>Reporter: Saanvi Sharma
>Priority: Minor
>  Labels: dynamodb, spark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Content deleted as spam



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-22588) SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values

2023-02-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-22588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-22588:
-
External issue URL:   (was: https://mindmajix.com/scala-training)

> SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values
> -
>
> Key: SPARK-22588
> URL: https://issues.apache.org/jira/browse/SPARK-22588
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Saanvi Sharma
>Priority: Minor
>  Labels: dynamodb, spark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am using spark 2.1 on EMR and i have a dataframe like this:
>  ClientNum  | Value_1  | Value_2 | Value_3  | Value_4
>  14 |A |B|   C  |   null
>  19 |X |Y|  null|   null
>  21 |R |   null  |  null|   null
> I want to load data into DynamoDB table with ClientNum as key fetching:
> Analyze Your Data on Amazon DynamoDB with apche Spark11
> Using Spark SQL for ETL3
> here is my code that I tried to solve:
>   var jobConf = new JobConf(sc.hadoopConfiguration)
>   jobConf.set("dynamodb.servicename", "dynamodb")
>   jobConf.set("dynamodb.input.tableName", "table_name")   
>   jobConf.set("dynamodb.output.tableName", "table_name")   
>   jobConf.set("dynamodb.endpoint", "dynamodb.eu-west-1.amazonaws.com")
>   jobConf.set("dynamodb.regionid", "eu-west-1")
>   jobConf.set("dynamodb.throughput.read", "1")
>   jobConf.set("dynamodb.throughput.read.percent", "1")
>   jobConf.set("dynamodb.throughput.write", "1")
>   jobConf.set("dynamodb.throughput.write.percent", "1")
>   
>   jobConf.set("mapred.output.format.class", 
> "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
>   jobConf.set("mapred.input.format.class", 
> "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
>   #Import Data
>   val df = 
> sqlContext.read.format("com.databricks.spark.csv").option("header", 
> "true").option("inferSchema", "true").load(path)
> I performed a transformation to have an RDD that matches the types that the 
> DynamoDB custom output format knows how to write. The custom output format 
> expects a tuple containing the Text and DynamoDBItemWritable types.
> Create a new RDD with those types in it, in the following map call:
>   #Convert the dataframe to rdd
>   val df_rdd = df.rdd
>   > df_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> MapPartitionsRDD[10] at rdd at :41
>   
>   #Print first rdd
>   df_rdd.take(1)
>   > res12: Array[org.apache.spark.sql.Row] = Array([14,A,B,C,null])
>   var ddbInsertFormattedRDD = df_rdd.map(a => {
>   var ddbMap = new HashMap[String, AttributeValue]()
>   var ClientNum = new AttributeValue()
>   ClientNum.setN(a.get(0).toString)
>   ddbMap.put("ClientNum", ClientNum)
>   var Value_1 = new AttributeValue()
>   Value_1.setS(a.get(1).toString)
>   ddbMap.put("Value_1", Value_1)
>   var Value_2 = new AttributeValue()
>   Value_2.setS(a.get(2).toString)
>   ddbMap.put("Value_2", Value_2)
>   var Value_3 = new AttributeValue()
>   Value_3.setS(a.get(3).toString)
>   ddbMap.put("Value_3", Value_3)
>   var Value_4 = new AttributeValue()
>   Value_4.setS(a.get(4).toString)
>   ddbMap.put("Value_4", Value_4)
>   var item = new DynamoDBItemWritable()
>   item.setItem(ddbMap)
>   (new Text(""), item)
>   })
> This last call uses the job configuration that defines the EMR-DDB connector 
> to write out the new RDD you created in the expected format:
> ddbInsertFormattedRDD.saveAsHadoopDataset(jobConf)
> fails with the follwoing error:
> Caused by: java.lang.NullPointerException
> null values caused the error, if I try with ClientNum and Value_1 it works 
> data is correctly inserted on DynamoDB table.
> Thanks for your help !!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-22588) SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-22588 ]


Sean R. Owen deleted comment on SPARK-22588:
--

was (Author: JIRAUSER294516):
We offer comprehensive [Splunk online 
training|https://www.igmguru.com/big-data/splunk-training/] that also covers a 
variety of administrative and support options.

> SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values
> -
>
> Key: SPARK-22588
> URL: https://issues.apache.org/jira/browse/SPARK-22588
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Saanvi Sharma
>Priority: Minor
>  Labels: dynamodb, spark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am using spark 2.1 on EMR and i have a dataframe like this:
>  ClientNum  | Value_1  | Value_2 | Value_3  | Value_4
>  14 |A |B|   C  |   null
>  19 |X |Y|  null|   null
>  21 |R |   null  |  null|   null
> I want to load data into DynamoDB table with ClientNum as key fetching:
> Analyze Your Data on Amazon DynamoDB with apche Spark11
> Using Spark SQL for ETL3
> here is my code that I tried to solve:
>   var jobConf = new JobConf(sc.hadoopConfiguration)
>   jobConf.set("dynamodb.servicename", "dynamodb")
>   jobConf.set("dynamodb.input.tableName", "table_name")   
>   jobConf.set("dynamodb.output.tableName", "table_name")   
>   jobConf.set("dynamodb.endpoint", "dynamodb.eu-west-1.amazonaws.com")
>   jobConf.set("dynamodb.regionid", "eu-west-1")
>   jobConf.set("dynamodb.throughput.read", "1")
>   jobConf.set("dynamodb.throughput.read.percent", "1")
>   jobConf.set("dynamodb.throughput.write", "1")
>   jobConf.set("dynamodb.throughput.write.percent", "1")
>   
>   jobConf.set("mapred.output.format.class", 
> "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
>   jobConf.set("mapred.input.format.class", 
> "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
>   #Import Data
>   val df = 
> sqlContext.read.format("com.databricks.spark.csv").option("header", 
> "true").option("inferSchema", "true").load(path)
> I performed a transformation to have an RDD that matches the types that the 
> DynamoDB custom output format knows how to write. The custom output format 
> expects a tuple containing the Text and DynamoDBItemWritable types.
> Create a new RDD with those types in it, in the following map call:
>   #Convert the dataframe to rdd
>   val df_rdd = df.rdd
>   > df_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> MapPartitionsRDD[10] at rdd at :41
>   
>   #Print first rdd
>   df_rdd.take(1)
>   > res12: Array[org.apache.spark.sql.Row] = Array([14,A,B,C,null])
>   var ddbInsertFormattedRDD = df_rdd.map(a => {
>   var ddbMap = new HashMap[String, AttributeValue]()
>   var ClientNum = new AttributeValue()
>   ClientNum.setN(a.get(0).toString)
>   ddbMap.put("ClientNum", ClientNum)
>   var Value_1 = new AttributeValue()
>   Value_1.setS(a.get(1).toString)
>   ddbMap.put("Value_1", Value_1)
>   var Value_2 = new AttributeValue()
>   Value_2.setS(a.get(2).toString)
>   ddbMap.put("Value_2", Value_2)
>   var Value_3 = new AttributeValue()
>   Value_3.setS(a.get(3).toString)
>   ddbMap.put("Value_3", Value_3)
>   var Value_4 = new AttributeValue()
>   Value_4.setS(a.get(4).toString)
>   ddbMap.put("Value_4", Value_4)
>   var item = new DynamoDBItemWritable()
>   item.setItem(ddbMap)
>   (new Text(""), item)
>   })
> This last call uses the job configuration that defines the EMR-DDB connector 
> to write out the new RDD you created in the expected format:
> ddbInsertFormattedRDD.saveAsHadoopDataset(jobConf)
> fails with the follwoing error:
> Caused by: java.lang.NullPointerException
> null values caused the error, if I try with ClientNum and Value_1 it works 
> data is correctly inserted on DynamoDB table.
> Thanks for your help !!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40847) SPARK: Load Data from Dataframe or RDD to DynamoDB

2023-02-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-40847:
-

> SPARK: Load Data from Dataframe or RDD to DynamoDB 
> ---
>
> Key: SPARK-40847
> URL: https://issues.apache.org/jira/browse/SPARK-40847
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Vivek Garg
>Priority: Major
>  Labels: spark
>
> I am using spark 2.1 on EMR and i have a dataframe like this:
> ClientNum | Value_1 | Value_2 | Value_3 | Value_4
> 14 | A | B | C | null
> 19 | X | Y | null | null
> 21 | R | null | null | null
> I want to load data into DynamoDB table with ClientNum as key fetching:
> Analyze Your Data on Amazon DynamoDB with apche Spark11
> Using Spark SQL for ETL3
> here is my code that I tried to solve:
> var jobConf = new JobConf(sc.hadoopConfiguration)
> jobConf.set("dynamodb.servicename", "dynamodb")
> jobConf.set("dynamodb.input.tableName", "table_name")
> jobConf.set("dynamodb.output.tableName", "table_name")
> jobConf.set("dynamodb.endpoint", "dynamodb.eu-west-1.amazonaws.com")
> jobConf.set("dynamodb.regionid", "eu-west-1")
> jobConf.set("dynamodb.throughput.read", "1")
> jobConf.set("dynamodb.throughput.read.percent", "1")
> jobConf.set("dynamodb.throughput.write", "1")
> jobConf.set("dynamodb.throughput.write.percent", "1")
> jobConf.set("mapred.output.format.class", 
> "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
> jobConf.set("mapred.input.format.class", 
> "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
> #Import Data
> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", 
> "true").option("inferSchema", "true").load(path)
> I performed a transformation to have an RDD that matches the types that the 
> DynamoDB custom output format knows how to write. The custom output format 
> expects a tuple containing the Text and DynamoDBItemWritable types.
> Create a new RDD with those types in it, in the following map call:
> #Convert the dataframe to rdd
> val df_rdd = df.rdd
> > df_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> > MapPartitionsRDD[10] at rdd at :41
> #Print first rdd
> df_rdd.take(1)
> > res12: Array[org.apache.spark.sql.Row] = Array([14,A,B,C,null])
> var ddbInsertFormattedRDD = df_rdd.map(a =>
> { var ddbMap = new HashMap[String, AttributeValue]() var ClientNum = new 
> AttributeValue() ClientNum.setN(a.get(0).toString) ddbMap.put("ClientNum", 
> ClientNum) var Value_1 = new AttributeValue() Value_1.setS(a.get(1).toString) 
> ddbMap.put("Value_1", Value_1) var Value_2 = new AttributeValue() 
> Value_2.setS(a.get(2).toString) ddbMap.put("Value_2", Value_2) var Value_3 = 
> new AttributeValue() Value_3.setS(a.get(3).toString) ddbMap.put("Value_3", 
> Value_3) var Value_4 = new AttributeValue() Value_4.setS(a.get(4).toString) 
> ddbMap.put("Value_4", Value_4) var item = new DynamoDBItemWritable() 
> item.setItem(ddbMap) (new Text(""), item) }
> )
> This last call uses the job configuration that defines the EMR-DDB connector 
> to write out the new RDD you created in the expected format:
> ddbInsertFormattedRDD.saveAsHadoopDataset(jobConf)
> fails with the follwoing error:
> Caused by: java.lang.NullPointerException
> null values caused the error, if I try with ClientNum and Value_1 it works 
> data is correctly inserted on DynamoDB table.
> Thank you.
> [Power BI 
> Certification|https://www.igmguru.com/data-science-bi/power-bi-certification-training/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-40847) SPARK: Load Data from Dataframe or RDD to DynamoDB

2023-02-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-40847:
-

> SPARK: Load Data from Dataframe or RDD to DynamoDB 
> ---
>
> Key: SPARK-40847
> URL: https://issues.apache.org/jira/browse/SPARK-40847
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Vivek Garg
>Priority: Major
>  Labels: spark
>
> Content deleted as spam



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-23521) SPIP: Standardize SQL logical plans with DataSourceV2

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-23521 ]


Sean R. Owen deleted comment on SPARK-23521:
--

was (Author: JIRAUSER294516):
IgmGuru [Mulesoft Online 
Training|https://www.igmguru.com/digital-marketing-programming/mulesoft-training/]
 is created with the Mulesoft certification exam in mind to ensure that the 
applicant passes the test on their first try.

> SPIP: Standardize SQL logical plans with DataSourceV2
> -
>
> Key: SPARK-23521
> URL: https://issues.apache.org/jira/browse/SPARK-23521
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ryan Blue
>Priority: Major
>  Labels: SPIP
> Attachments: SPIP_ Standardize logical plans.pdf
>
>
> Executive Summary: This SPIP is based on [discussion about the DataSourceV2 
> implementation|https://lists.apache.org/thread.html/55676ec1f5039d3deaf347d391cf82fe8574b8fa4eeab70110ed5b2b@%3Cdev.spark.apache.org%3E]
>  on the dev list. The proposal is to standardize the logical plans used for 
> write operations to make the planner more maintainable and to make Spark's 
> write behavior predictable and reliable. It proposes the following principles:
>  # Use well-defined logical plan nodes for all high-level operations: insert, 
> create, CTAS, overwrite table, etc.
>  # Use planner rules that match on these high-level nodes, so that it isn’t 
> necessary to create rules to match each eventual code path individually.
>  # Clearly define Spark’s behavior for these logical plan nodes. Physical 
> nodes should implement that behavior so that all code paths eventually make 
> the same guarantees.
>  # Specialize implementation when creating a physical plan, not logical 
> plans. This will avoid behavior drift and ensure planner code is shared 
> across physical implementations.
> The SPIP doc presents a small but complete set of those high-level logical 
> operations, most of which are already defined in SQL or implemented by some 
> write path in Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-40847) SPARK: Load Data from Dataframe or RDD to DynamoDB

2023-02-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-40847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen deleted SPARK-40847:
-


> SPARK: Load Data from Dataframe or RDD to DynamoDB 
> ---
>
> Key: SPARK-40847
> URL: https://issues.apache.org/jira/browse/SPARK-40847
> Project: Spark
>  Issue Type: Question
>Reporter: Vivek Garg
>Priority: Major
>  Labels: spark
>
> Content deleted as spam



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-40993) Migrate markdown style README to python/docs/development/testing.rst

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-40993 ]


Sean R. Owen deleted comment on SPARK-40993:
--

was (Author: JIRAUSER294516):
Hii,
I think you got the answer.

[Salesforce Marketing Cloud 
Certification|https://www.igmguru.com/salesforce/salesforce-marketing-cloud-training/]

> Migrate markdown style README to python/docs/development/testing.rst
> 
>
> Key: SPARK-40993
> URL: https://issues.apache.org/jira/browse/SPARK-40993
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-22588) SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-22588 ]


Sean R. Owen deleted comment on SPARK-22588:
--

was (Author: JIRAUSER295436):
Thank you for sharing the information. [Best Machine Learning 
Course|https://www.igmguru.com/machine-learning-ai/machine-learning-certification-training/]
 worldwide. Machine Learning Training Online program is designed after 
consulting people from the industry and academia.

> SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values
> -
>
> Key: SPARK-22588
> URL: https://issues.apache.org/jira/browse/SPARK-22588
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Saanvi Sharma
>Priority: Minor
>  Labels: dynamodb, spark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am using spark 2.1 on EMR and i have a dataframe like this:
>  ClientNum  | Value_1  | Value_2 | Value_3  | Value_4
>  14 |A |B|   C  |   null
>  19 |X |Y|  null|   null
>  21 |R |   null  |  null|   null
> I want to load data into DynamoDB table with ClientNum as key fetching:
> Analyze Your Data on Amazon DynamoDB with apche Spark11
> Using Spark SQL for ETL3
> here is my code that I tried to solve:
>   var jobConf = new JobConf(sc.hadoopConfiguration)
>   jobConf.set("dynamodb.servicename", "dynamodb")
>   jobConf.set("dynamodb.input.tableName", "table_name")   
>   jobConf.set("dynamodb.output.tableName", "table_name")   
>   jobConf.set("dynamodb.endpoint", "dynamodb.eu-west-1.amazonaws.com")
>   jobConf.set("dynamodb.regionid", "eu-west-1")
>   jobConf.set("dynamodb.throughput.read", "1")
>   jobConf.set("dynamodb.throughput.read.percent", "1")
>   jobConf.set("dynamodb.throughput.write", "1")
>   jobConf.set("dynamodb.throughput.write.percent", "1")
>   
>   jobConf.set("mapred.output.format.class", 
> "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
>   jobConf.set("mapred.input.format.class", 
> "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
>   #Import Data
>   val df = 
> sqlContext.read.format("com.databricks.spark.csv").option("header", 
> "true").option("inferSchema", "true").load(path)
> I performed a transformation to have an RDD that matches the types that the 
> DynamoDB custom output format knows how to write. The custom output format 
> expects a tuple containing the Text and DynamoDBItemWritable types.
> Create a new RDD with those types in it, in the following map call:
>   #Convert the dataframe to rdd
>   val df_rdd = df.rdd
>   > df_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> MapPartitionsRDD[10] at rdd at :41
>   
>   #Print first rdd
>   df_rdd.take(1)
>   > res12: Array[org.apache.spark.sql.Row] = Array([14,A,B,C,null])
>   var ddbInsertFormattedRDD = df_rdd.map(a => {
>   var ddbMap = new HashMap[String, AttributeValue]()
>   var ClientNum = new AttributeValue()
>   ClientNum.setN(a.get(0).toString)
>   ddbMap.put("ClientNum", ClientNum)
>   var Value_1 = new AttributeValue()
>   Value_1.setS(a.get(1).toString)
>   ddbMap.put("Value_1", Value_1)
>   var Value_2 = new AttributeValue()
>   Value_2.setS(a.get(2).toString)
>   ddbMap.put("Value_2", Value_2)
>   var Value_3 = new AttributeValue()
>   Value_3.setS(a.get(3).toString)
>   ddbMap.put("Value_3", Value_3)
>   var Value_4 = new AttributeValue()
>   Value_4.setS(a.get(4).toString)
>   ddbMap.put("Value_4", Value_4)
>   var item = new DynamoDBItemWritable()
>   item.setItem(ddbMap)
>   (new Text(""), item)
>   })
> This last call uses the job configuration that defines the EMR-DDB connector 
> to write out the new RDD you created in the expected format:
> ddbInsertFormattedRDD.saveAsHadoopDataset(jobConf)
> fails with the follwoing error:
> Caused by: java.lang.NullPointerException
> null values caused the error, if I try with ClientNum and Value_1 it works 
> data is correctly inserted on DynamoDB table.
> Thanks for your help !!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-22588) SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-22588 ]


Sean R. Owen deleted comment on SPARK-22588:
--

was (Author: JIRAUSER295436):
Thank you for sharing the information. [Rails 
training|https://www.igmguru.com/digital-marketing-programming/ruby-on-rails-certification-training/]
 {*}provides in-depth knowledge on all the core fundamentals of Ruby and MVC 
design patterns through real-time use cases and projects{*}.

 

> SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values
> -
>
> Key: SPARK-22588
> URL: https://issues.apache.org/jira/browse/SPARK-22588
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Saanvi Sharma
>Priority: Minor
>  Labels: dynamodb, spark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am using spark 2.1 on EMR and i have a dataframe like this:
>  ClientNum  | Value_1  | Value_2 | Value_3  | Value_4
>  14 |A |B|   C  |   null
>  19 |X |Y|  null|   null
>  21 |R |   null  |  null|   null
> I want to load data into DynamoDB table with ClientNum as key fetching:
> Analyze Your Data on Amazon DynamoDB with apche Spark11
> Using Spark SQL for ETL3
> here is my code that I tried to solve:
>   var jobConf = new JobConf(sc.hadoopConfiguration)
>   jobConf.set("dynamodb.servicename", "dynamodb")
>   jobConf.set("dynamodb.input.tableName", "table_name")   
>   jobConf.set("dynamodb.output.tableName", "table_name")   
>   jobConf.set("dynamodb.endpoint", "dynamodb.eu-west-1.amazonaws.com")
>   jobConf.set("dynamodb.regionid", "eu-west-1")
>   jobConf.set("dynamodb.throughput.read", "1")
>   jobConf.set("dynamodb.throughput.read.percent", "1")
>   jobConf.set("dynamodb.throughput.write", "1")
>   jobConf.set("dynamodb.throughput.write.percent", "1")
>   
>   jobConf.set("mapred.output.format.class", 
> "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
>   jobConf.set("mapred.input.format.class", 
> "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
>   #Import Data
>   val df = 
> sqlContext.read.format("com.databricks.spark.csv").option("header", 
> "true").option("inferSchema", "true").load(path)
> I performed a transformation to have an RDD that matches the types that the 
> DynamoDB custom output format knows how to write. The custom output format 
> expects a tuple containing the Text and DynamoDBItemWritable types.
> Create a new RDD with those types in it, in the following map call:
>   #Convert the dataframe to rdd
>   val df_rdd = df.rdd
>   > df_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> MapPartitionsRDD[10] at rdd at :41
>   
>   #Print first rdd
>   df_rdd.take(1)
>   > res12: Array[org.apache.spark.sql.Row] = Array([14,A,B,C,null])
>   var ddbInsertFormattedRDD = df_rdd.map(a => {
>   var ddbMap = new HashMap[String, AttributeValue]()
>   var ClientNum = new AttributeValue()
>   ClientNum.setN(a.get(0).toString)
>   ddbMap.put("ClientNum", ClientNum)
>   var Value_1 = new AttributeValue()
>   Value_1.setS(a.get(1).toString)
>   ddbMap.put("Value_1", Value_1)
>   var Value_2 = new AttributeValue()
>   Value_2.setS(a.get(2).toString)
>   ddbMap.put("Value_2", Value_2)
>   var Value_3 = new AttributeValue()
>   Value_3.setS(a.get(3).toString)
>   ddbMap.put("Value_3", Value_3)
>   var Value_4 = new AttributeValue()
>   Value_4.setS(a.get(4).toString)
>   ddbMap.put("Value_4", Value_4)
>   var item = new DynamoDBItemWritable()
>   item.setItem(ddbMap)
>   (new Text(""), item)
>   })
> This last call uses the job configuration that defines the EMR-DDB connector 
> to write out the new RDD you created in the expected format:
> ddbInsertFormattedRDD.saveAsHadoopDataset(jobConf)
> fails with the follwoing error:
> Caused by: java.lang.NullPointerException
> null values caused the error, if I try with ClientNum and Value_1 it works 
> data is correctly inserted on DynamoDB table.
> Thanks for your help !!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-22588) SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-22588 ]


Sean R. Owen deleted comment on SPARK-22588:
--

was (Author: JIRAUSER295436):
Thank you for sharing the information. The [AWS DevOps Professional 
certification|https://www.igmguru.com/cloud-computing/aws-devops-training/] is 
a professional-level certification offered by Amazon Web Services (AWS) that 
validates a candidate's ability to design, implement, and maintain a software 
development process on the AWS platform using DevOps practices. The 
certification is intended for individuals with at least one year of experience 
working with AWS and at least two years of experience in a DevOps role.

> SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values
> -
>
> Key: SPARK-22588
> URL: https://issues.apache.org/jira/browse/SPARK-22588
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Saanvi Sharma
>Priority: Minor
>  Labels: dynamodb, spark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am using spark 2.1 on EMR and i have a dataframe like this:
>  ClientNum  | Value_1  | Value_2 | Value_3  | Value_4
>  14 |A |B|   C  |   null
>  19 |X |Y|  null|   null
>  21 |R |   null  |  null|   null
> I want to load data into DynamoDB table with ClientNum as key fetching:
> Analyze Your Data on Amazon DynamoDB with apche Spark11
> Using Spark SQL for ETL3
> here is my code that I tried to solve:
>   var jobConf = new JobConf(sc.hadoopConfiguration)
>   jobConf.set("dynamodb.servicename", "dynamodb")
>   jobConf.set("dynamodb.input.tableName", "table_name")   
>   jobConf.set("dynamodb.output.tableName", "table_name")   
>   jobConf.set("dynamodb.endpoint", "dynamodb.eu-west-1.amazonaws.com")
>   jobConf.set("dynamodb.regionid", "eu-west-1")
>   jobConf.set("dynamodb.throughput.read", "1")
>   jobConf.set("dynamodb.throughput.read.percent", "1")
>   jobConf.set("dynamodb.throughput.write", "1")
>   jobConf.set("dynamodb.throughput.write.percent", "1")
>   
>   jobConf.set("mapred.output.format.class", 
> "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
>   jobConf.set("mapred.input.format.class", 
> "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
>   #Import Data
>   val df = 
> sqlContext.read.format("com.databricks.spark.csv").option("header", 
> "true").option("inferSchema", "true").load(path)
> I performed a transformation to have an RDD that matches the types that the 
> DynamoDB custom output format knows how to write. The custom output format 
> expects a tuple containing the Text and DynamoDBItemWritable types.
> Create a new RDD with those types in it, in the following map call:
>   #Convert the dataframe to rdd
>   val df_rdd = df.rdd
>   > df_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> MapPartitionsRDD[10] at rdd at :41
>   
>   #Print first rdd
>   df_rdd.take(1)
>   > res12: Array[org.apache.spark.sql.Row] = Array([14,A,B,C,null])
>   var ddbInsertFormattedRDD = df_rdd.map(a => {
>   var ddbMap = new HashMap[String, AttributeValue]()
>   var ClientNum = new AttributeValue()
>   ClientNum.setN(a.get(0).toString)
>   ddbMap.put("ClientNum", ClientNum)
>   var Value_1 = new AttributeValue()
>   Value_1.setS(a.get(1).toString)
>   ddbMap.put("Value_1", Value_1)
>   var Value_2 = new AttributeValue()
>   Value_2.setS(a.get(2).toString)
>   ddbMap.put("Value_2", Value_2)
>   var Value_3 = new AttributeValue()
>   Value_3.setS(a.get(3).toString)
>   ddbMap.put("Value_3", Value_3)
>   var Value_4 = new AttributeValue()
>   Value_4.setS(a.get(4).toString)
>   ddbMap.put("Value_4", Value_4)
>   var item = new DynamoDBItemWritable()
>   item.setItem(ddbMap)
>   (new Text(""), item)
>   })
> This last call uses the job configuration that defines the EMR-DDB connector 
> to write out the new RDD you created in the expected format:
> ddbInsertFormattedRDD.saveAsHadoopDataset(jobConf)
> fails with the follwoing error:
> Caused by: java.lang.NullPointerException
> null values caused the error, if I try with ClientNum and Value_1 it works 
> data is correctly inserted on DynamoDB table.
> Thanks for your help !!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-22588) SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-22588 ]


Sean R. Owen deleted comment on SPARK-22588:
--

was (Author: JIRAUSER294516):
The industry standard-setting Machine Learning Operations or MLOps training 
offered by IgmGuru. The carefully selected training module includes the most 
recent syllabus to meet the demands of numerous sectors throughout the world. 
The [MLOps 
Certification|https://www.igmguru.com/machine-learning-ai/mlops-course-certification/]
 course was developed using the extensive knowledge and skills of industry 
leaders. The MLOps training gives people an advantage over the competition 
because it makes a wide range of profitable career prospects available to them.

> SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values
> -
>
> Key: SPARK-22588
> URL: https://issues.apache.org/jira/browse/SPARK-22588
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Saanvi Sharma
>Priority: Minor
>  Labels: dynamodb, spark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am using spark 2.1 on EMR and i have a dataframe like this:
>  ClientNum  | Value_1  | Value_2 | Value_3  | Value_4
>  14 |A |B|   C  |   null
>  19 |X |Y|  null|   null
>  21 |R |   null  |  null|   null
> I want to load data into DynamoDB table with ClientNum as key fetching:
> Analyze Your Data on Amazon DynamoDB with apche Spark11
> Using Spark SQL for ETL3
> here is my code that I tried to solve:
>   var jobConf = new JobConf(sc.hadoopConfiguration)
>   jobConf.set("dynamodb.servicename", "dynamodb")
>   jobConf.set("dynamodb.input.tableName", "table_name")   
>   jobConf.set("dynamodb.output.tableName", "table_name")   
>   jobConf.set("dynamodb.endpoint", "dynamodb.eu-west-1.amazonaws.com")
>   jobConf.set("dynamodb.regionid", "eu-west-1")
>   jobConf.set("dynamodb.throughput.read", "1")
>   jobConf.set("dynamodb.throughput.read.percent", "1")
>   jobConf.set("dynamodb.throughput.write", "1")
>   jobConf.set("dynamodb.throughput.write.percent", "1")
>   
>   jobConf.set("mapred.output.format.class", 
> "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
>   jobConf.set("mapred.input.format.class", 
> "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
>   #Import Data
>   val df = 
> sqlContext.read.format("com.databricks.spark.csv").option("header", 
> "true").option("inferSchema", "true").load(path)
> I performed a transformation to have an RDD that matches the types that the 
> DynamoDB custom output format knows how to write. The custom output format 
> expects a tuple containing the Text and DynamoDBItemWritable types.
> Create a new RDD with those types in it, in the following map call:
>   #Convert the dataframe to rdd
>   val df_rdd = df.rdd
>   > df_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> MapPartitionsRDD[10] at rdd at :41
>   
>   #Print first rdd
>   df_rdd.take(1)
>   > res12: Array[org.apache.spark.sql.Row] = Array([14,A,B,C,null])
>   var ddbInsertFormattedRDD = df_rdd.map(a => {
>   var ddbMap = new HashMap[String, AttributeValue]()
>   var ClientNum = new AttributeValue()
>   ClientNum.setN(a.get(0).toString)
>   ddbMap.put("ClientNum", ClientNum)
>   var Value_1 = new AttributeValue()
>   Value_1.setS(a.get(1).toString)
>   ddbMap.put("Value_1", Value_1)
>   var Value_2 = new AttributeValue()
>   Value_2.setS(a.get(2).toString)
>   ddbMap.put("Value_2", Value_2)
>   var Value_3 = new AttributeValue()
>   Value_3.setS(a.get(3).toString)
>   ddbMap.put("Value_3", Value_3)
>   var Value_4 = new AttributeValue()
>   Value_4.setS(a.get(4).toString)
>   ddbMap.put("Value_4", Value_4)
>   var item = new DynamoDBItemWritable()
>   item.setItem(ddbMap)
>   (new Text(""), item)
>   })
> This last call uses the job configuration that defines the EMR-DDB connector 
> to write out the new RDD you created in the expected format:
> ddbInsertFormattedRDD.saveAsHadoopDataset(jobConf)
> fails with the follwoing error:
> Caused by: java.lang.NullPointerException
> null values caused the error, if I try with ClientNum and Value_1 it works 
> data is correctly inserted on DynamoDB table.
> Thanks for your help !!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-22588) SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-22588 ]


Sean R. Owen deleted comment on SPARK-22588:
--

was (Author: JIRAUSER295111):
Thank you for sharing the information. [Vlocity Salesforce 
Certification|https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]
 enhances CPQ and guided selling as well. Salesforce Vlocity is the pioneer 
assisting many tops and arising companies obtain their wanted progress 
utilizing its Omnichannel procedures.




> SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values
> -
>
> Key: SPARK-22588
> URL: https://issues.apache.org/jira/browse/SPARK-22588
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Saanvi Sharma
>Priority: Minor
>  Labels: dynamodb, spark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am using spark 2.1 on EMR and i have a dataframe like this:
>  ClientNum  | Value_1  | Value_2 | Value_3  | Value_4
>  14 |A |B|   C  |   null
>  19 |X |Y|  null|   null
>  21 |R |   null  |  null|   null
> I want to load data into DynamoDB table with ClientNum as key fetching:
> Analyze Your Data on Amazon DynamoDB with apche Spark11
> Using Spark SQL for ETL3
> here is my code that I tried to solve:
>   var jobConf = new JobConf(sc.hadoopConfiguration)
>   jobConf.set("dynamodb.servicename", "dynamodb")
>   jobConf.set("dynamodb.input.tableName", "table_name")   
>   jobConf.set("dynamodb.output.tableName", "table_name")   
>   jobConf.set("dynamodb.endpoint", "dynamodb.eu-west-1.amazonaws.com")
>   jobConf.set("dynamodb.regionid", "eu-west-1")
>   jobConf.set("dynamodb.throughput.read", "1")
>   jobConf.set("dynamodb.throughput.read.percent", "1")
>   jobConf.set("dynamodb.throughput.write", "1")
>   jobConf.set("dynamodb.throughput.write.percent", "1")
>   
>   jobConf.set("mapred.output.format.class", 
> "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
>   jobConf.set("mapred.input.format.class", 
> "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
>   #Import Data
>   val df = 
> sqlContext.read.format("com.databricks.spark.csv").option("header", 
> "true").option("inferSchema", "true").load(path)
> I performed a transformation to have an RDD that matches the types that the 
> DynamoDB custom output format knows how to write. The custom output format 
> expects a tuple containing the Text and DynamoDBItemWritable types.
> Create a new RDD with those types in it, in the following map call:
>   #Convert the dataframe to rdd
>   val df_rdd = df.rdd
>   > df_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> MapPartitionsRDD[10] at rdd at :41
>   
>   #Print first rdd
>   df_rdd.take(1)
>   > res12: Array[org.apache.spark.sql.Row] = Array([14,A,B,C,null])
>   var ddbInsertFormattedRDD = df_rdd.map(a => {
>   var ddbMap = new HashMap[String, AttributeValue]()
>   var ClientNum = new AttributeValue()
>   ClientNum.setN(a.get(0).toString)
>   ddbMap.put("ClientNum", ClientNum)
>   var Value_1 = new AttributeValue()
>   Value_1.setS(a.get(1).toString)
>   ddbMap.put("Value_1", Value_1)
>   var Value_2 = new AttributeValue()
>   Value_2.setS(a.get(2).toString)
>   ddbMap.put("Value_2", Value_2)
>   var Value_3 = new AttributeValue()
>   Value_3.setS(a.get(3).toString)
>   ddbMap.put("Value_3", Value_3)
>   var Value_4 = new AttributeValue()
>   Value_4.setS(a.get(4).toString)
>   ddbMap.put("Value_4", Value_4)
>   var item = new DynamoDBItemWritable()
>   item.setItem(ddbMap)
>   (new Text(""), item)
>   })
> This last call uses the job configuration that defines the EMR-DDB connector 
> to write out the new RDD you created in the expected format:
> ddbInsertFormattedRDD.saveAsHadoopDataset(jobConf)
> fails with the follwoing error:
> Caused by: java.lang.NullPointerException
> null values caused the error, if I try with ClientNum and Value_1 it works 
> data is correctly inserted on DynamoDB table.
> Thanks for your help !!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-22588) SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-22588 ]


Sean R. Owen deleted comment on SPARK-22588:
--

was (Author: JIRAUSER294516):
According to the [Salesforce CPQ 
Certification|https://www.igmguru.com/salesforce/salesforce-cpq-training/] 
Exam, our Salesforce CPQ Certification Training program has been created. The 
core abilities needed for effectively implementing Salesforce CPQ solutions are 
developed in this course on Salesforce CPQ. Through instruction using practical 
examples, this course will go deeper into developing a quoting process, pricing 
strategies, configuration, CPQ object data model, and more. This online 
Salesforce CPQ training course includes practical projects that will aid you in 
passing the Salesforce CPQ Certification test.

> SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values
> -
>
> Key: SPARK-22588
> URL: https://issues.apache.org/jira/browse/SPARK-22588
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Saanvi Sharma
>Priority: Minor
>  Labels: dynamodb, spark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am using spark 2.1 on EMR and i have a dataframe like this:
>  ClientNum  | Value_1  | Value_2 | Value_3  | Value_4
>  14 |A |B|   C  |   null
>  19 |X |Y|  null|   null
>  21 |R |   null  |  null|   null
> I want to load data into DynamoDB table with ClientNum as key fetching:
> Analyze Your Data on Amazon DynamoDB with apche Spark11
> Using Spark SQL for ETL3
> here is my code that I tried to solve:
>   var jobConf = new JobConf(sc.hadoopConfiguration)
>   jobConf.set("dynamodb.servicename", "dynamodb")
>   jobConf.set("dynamodb.input.tableName", "table_name")   
>   jobConf.set("dynamodb.output.tableName", "table_name")   
>   jobConf.set("dynamodb.endpoint", "dynamodb.eu-west-1.amazonaws.com")
>   jobConf.set("dynamodb.regionid", "eu-west-1")
>   jobConf.set("dynamodb.throughput.read", "1")
>   jobConf.set("dynamodb.throughput.read.percent", "1")
>   jobConf.set("dynamodb.throughput.write", "1")
>   jobConf.set("dynamodb.throughput.write.percent", "1")
>   
>   jobConf.set("mapred.output.format.class", 
> "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
>   jobConf.set("mapred.input.format.class", 
> "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
>   #Import Data
>   val df = 
> sqlContext.read.format("com.databricks.spark.csv").option("header", 
> "true").option("inferSchema", "true").load(path)
> I performed a transformation to have an RDD that matches the types that the 
> DynamoDB custom output format knows how to write. The custom output format 
> expects a tuple containing the Text and DynamoDBItemWritable types.
> Create a new RDD with those types in it, in the following map call:
>   #Convert the dataframe to rdd
>   val df_rdd = df.rdd
>   > df_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> MapPartitionsRDD[10] at rdd at :41
>   
>   #Print first rdd
>   df_rdd.take(1)
>   > res12: Array[org.apache.spark.sql.Row] = Array([14,A,B,C,null])
>   var ddbInsertFormattedRDD = df_rdd.map(a => {
>   var ddbMap = new HashMap[String, AttributeValue]()
>   var ClientNum = new AttributeValue()
>   ClientNum.setN(a.get(0).toString)
>   ddbMap.put("ClientNum", ClientNum)
>   var Value_1 = new AttributeValue()
>   Value_1.setS(a.get(1).toString)
>   ddbMap.put("Value_1", Value_1)
>   var Value_2 = new AttributeValue()
>   Value_2.setS(a.get(2).toString)
>   ddbMap.put("Value_2", Value_2)
>   var Value_3 = new AttributeValue()
>   Value_3.setS(a.get(3).toString)
>   ddbMap.put("Value_3", Value_3)
>   var Value_4 = new AttributeValue()
>   Value_4.setS(a.get(4).toString)
>   ddbMap.put("Value_4", Value_4)
>   var item = new DynamoDBItemWritable()
>   item.setItem(ddbMap)
>   (new Text(""), item)
>   })
> This last call uses the job configuration that defines the EMR-DDB connector 
> to write out the new RDD you created in the expected format:
> ddbInsertFormattedRDD.saveAsHadoopDataset(jobConf)
> fails with the follwoing error:
> Caused by: java.lang.NullPointerException
> null values caused the error, if I try with ClientNum and Value_1 it works 
> data is correctly inserted on DynamoDB table.
> Thanks for your help !!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-22588) SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-22588 ]


Sean R. Owen deleted comment on SPARK-22588:
--

was (Author: JIRAUSER295436):
Thank you for sharing the information. [React Native Online 
Course|https://www.igmguru.com/digital-marketing-programming/react-native-training/]
 is an integrated professional course aimed at providing learners with the 
skills and knowledge of React Native, a mobile application framework used for 
the development of mobile applications for Android, iOS, UWP (Universal Windows 
Platform), and the web.

> SPARK: Load Data from Dataframe or RDD to DynamoDB / dealing with null values
> -
>
> Key: SPARK-22588
> URL: https://issues.apache.org/jira/browse/SPARK-22588
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Saanvi Sharma
>Priority: Minor
>  Labels: dynamodb, spark
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> I am using spark 2.1 on EMR and i have a dataframe like this:
>  ClientNum  | Value_1  | Value_2 | Value_3  | Value_4
>  14 |A |B|   C  |   null
>  19 |X |Y|  null|   null
>  21 |R |   null  |  null|   null
> I want to load data into DynamoDB table with ClientNum as key fetching:
> Analyze Your Data on Amazon DynamoDB with apche Spark11
> Using Spark SQL for ETL3
> here is my code that I tried to solve:
>   var jobConf = new JobConf(sc.hadoopConfiguration)
>   jobConf.set("dynamodb.servicename", "dynamodb")
>   jobConf.set("dynamodb.input.tableName", "table_name")   
>   jobConf.set("dynamodb.output.tableName", "table_name")   
>   jobConf.set("dynamodb.endpoint", "dynamodb.eu-west-1.amazonaws.com")
>   jobConf.set("dynamodb.regionid", "eu-west-1")
>   jobConf.set("dynamodb.throughput.read", "1")
>   jobConf.set("dynamodb.throughput.read.percent", "1")
>   jobConf.set("dynamodb.throughput.write", "1")
>   jobConf.set("dynamodb.throughput.write.percent", "1")
>   
>   jobConf.set("mapred.output.format.class", 
> "org.apache.hadoop.dynamodb.write.DynamoDBOutputFormat")
>   jobConf.set("mapred.input.format.class", 
> "org.apache.hadoop.dynamodb.read.DynamoDBInputFormat")
>   #Import Data
>   val df = 
> sqlContext.read.format("com.databricks.spark.csv").option("header", 
> "true").option("inferSchema", "true").load(path)
> I performed a transformation to have an RDD that matches the types that the 
> DynamoDB custom output format knows how to write. The custom output format 
> expects a tuple containing the Text and DynamoDBItemWritable types.
> Create a new RDD with those types in it, in the following map call:
>   #Convert the dataframe to rdd
>   val df_rdd = df.rdd
>   > df_rdd: org.apache.spark.rdd.RDD[org.apache.spark.sql.Row] = 
> MapPartitionsRDD[10] at rdd at :41
>   
>   #Print first rdd
>   df_rdd.take(1)
>   > res12: Array[org.apache.spark.sql.Row] = Array([14,A,B,C,null])
>   var ddbInsertFormattedRDD = df_rdd.map(a => {
>   var ddbMap = new HashMap[String, AttributeValue]()
>   var ClientNum = new AttributeValue()
>   ClientNum.setN(a.get(0).toString)
>   ddbMap.put("ClientNum", ClientNum)
>   var Value_1 = new AttributeValue()
>   Value_1.setS(a.get(1).toString)
>   ddbMap.put("Value_1", Value_1)
>   var Value_2 = new AttributeValue()
>   Value_2.setS(a.get(2).toString)
>   ddbMap.put("Value_2", Value_2)
>   var Value_3 = new AttributeValue()
>   Value_3.setS(a.get(3).toString)
>   ddbMap.put("Value_3", Value_3)
>   var Value_4 = new AttributeValue()
>   Value_4.setS(a.get(4).toString)
>   ddbMap.put("Value_4", Value_4)
>   var item = new DynamoDBItemWritable()
>   item.setItem(ddbMap)
>   (new Text(""), item)
>   })
> This last call uses the job configuration that defines the EMR-DDB connector 
> to write out the new RDD you created in the expected format:
> ddbInsertFormattedRDD.saveAsHadoopDataset(jobConf)
> fails with the follwoing error:
> Caused by: java.lang.NullPointerException
> null values caused the error, if I try with ClientNum and Value_1 it works 
> data is correctly inserted on DynamoDB table.
> Thanks for your help !!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-34827 ]


Sean R. Owen deleted comment on SPARK-34827:
--

was (Author: JIRAUSER295436):
Thank you for sharing such good information. Very informative and effective 
post. 

+[https://www.igmguru.com/digital-marketing-programming/golang-training/]+

> Support fetching shuffle blocks in batch with i/o encryption
> 
>
> Key: SPARK-34827
> URL: https://issues.apache.org/jira/browse/SPARK-34827
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-2258) Worker UI displays zombie executors

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-2258 ]


Sean R. Owen deleted comment on SPARK-2258:
-

was (Author: JIRAUSER295111):
The website is so easy to use – I am impressed with it. Thank you for Sharing.  
Salesforce Vlocity Training focuses on producing experts who aren't just able 
to handle the platform but build solutions to keep their respective companies 
as well as their careers way ahead of the competition. Go through this link:- 
[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]

> Worker UI displays zombie executors
> ---
>
> Key: SPARK-2258
> URL: https://issues.apache.org/jira/browse/SPARK-2258
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Major
> Fix For: 1.1.0
>
> Attachments: Screen Shot 2014-06-24 at 9.23.18 AM.png
>
>
> See attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-34827 ]


Sean R. Owen deleted comment on SPARK-34827:
--

was (Author: JIRAUSER297361):
I like your content.  If anyone wants to learn a new course like Vlocity 
platform developer certification focuses on producing experts who aren't just 
ready to handle the platform but build solutions to keep their respective 
companies and their careers ahead of the competition. Go through this 
link:[https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification|https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]

> Support fetching shuffle blocks in batch with i/o encryption
> 
>
> Key: SPARK-34827
> URL: https://issues.apache.org/jira/browse/SPARK-34827
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-34827 ]


Sean R. Owen deleted comment on SPARK-34827:
--

was (Author: JIRAUSER295111):
Thank you for sharing such good information. Very informative and effective 
post. [Msbi 
Training|https://www.igmguru.com/data-science-bi/msbi-certification-training/] 
offers the best solutions for Business Intelligence and data mining. MSBI uses 
Visual Studio data tools and SQL servers to make great decisions in our 
business activities.

> Support fetching shuffle blocks in batch with i/o encryption
> 
>
> Key: SPARK-34827
> URL: https://issues.apache.org/jira/browse/SPARK-34827
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-34827) Support fetching shuffle blocks in batch with i/o encryption

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-34827 ]


Sean R. Owen deleted comment on SPARK-34827:
--

was (Author: JIRAUSER294516):
I appreciate you sharing this useful information. Very useful and interesting 
post.
[Uipath 
training|https://www.igmguru.com/machine-learning-ai/rpa-uipath-certification-training/].

> Support fetching shuffle blocks in batch with i/o encryption
> 
>
> Key: SPARK-34827
> URL: https://issues.apache.org/jira/browse/SPARK-34827
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-42444) DataFrame.drop should handle multi columns properly

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-42444 ]


Sean R. Owen deleted comment on SPARK-42444:
--

was (Author: JIRAUSER295111):
Thank you for sharing.  [Azure Solution Architect Training 
|https://www.igmguru.com/cloud-computing/microsoft-azure-solution-architect-az-300-training/]has
 been designed for software developers who are keen on developing best-in-class 
applications using this open and advanced platform of Windows Azure.

 

> DataFrame.drop should handle multi columns properly
> ---
>
> Key: SPARK-42444
> URL: https://issues.apache.org/jira/browse/SPARK-42444
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Blocker
>
> {code:java}
> from pyspark.sql import Row
> df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, "Bob")], 
> ["age", "name"])
> df2 = spark.createDataFrame([Row(height=80, name="Tom"), Row(height=85, 
> name="Bob")])
> df1.join(df2, df1.name == df2.name, 'inner').drop('name', 'age').show()
> {code}
> This works in 3.3
> {code:java}
> +--+
> |height|
> +--+
> |85|
> |80|
> +--+
> {code}
> but fails in 3.4
> {code:java}
> ---
> AnalysisException Traceback (most recent call last)
> Cell In[1], line 4
>   2 df1 = spark.createDataFrame([(14, "Tom"), (23, "Alice"), (16, 
> "Bob")], ["age", "name"])
>   3 df2 = spark.createDataFrame([Row(height=80, name="Tom"), 
> Row(height=85, name="Bob")])
> > 4 df1.join(df2, df1.name == df2.name, 'inner').drop('name', 
> 'age').show()
> File ~/Dev/spark/python/pyspark/sql/dataframe.py:4913, in 
> DataFrame.drop(self, *cols)
>4911 jcols = [_to_java_column(c) for c in cols]
>4912 first_column, *remaining_columns = jcols
> -> 4913 jdf = self._jdf.drop(first_column, self._jseq(remaining_columns))
>4915 return DataFrame(jdf, self.sparkSession)
> File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, 
> in JavaMember.__call__(self, *args)
>1316 command = proto.CALL_COMMAND_NAME +\
>1317 self.command_header +\
>1318 args_command +\
>1319 proto.END_COMMAND_PART
>1321 answer = self.gateway_client.send_command(command)
> -> 1322 return_value = get_return_value(
>1323 answer, self.gateway_client, self.target_id, self.name)
>1325 for temp_arg in temp_args:
>1326 if hasattr(temp_arg, "_detach"):
> File ~/Dev/spark/python/pyspark/errors/exceptions/captured.py:159, in 
> capture_sql_exception..deco(*a, **kw)
> 155 converted = convert_exception(e.java_exception)
> 156 if not isinstance(converted, UnknownException):
> 157 # Hide where the exception came from that shows a non-Pythonic
> 158 # JVM exception message.
> --> 159 raise converted from None
> 160 else:
> 161 raise
> AnalysisException: [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could 
> be: [`name`, `name`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-02-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-42033:
-

> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
>
> Content deleted as spam



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Deleted] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-02-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen deleted SPARK-42033:
-


> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>Reporter: Pankaj Nagla
>Priority: Major
>
> Content deleted as spam



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-02-22 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692409#comment-17692409
 ] 

Sean R. Owen commented on SPARK-42033:
--

It's spam. This guy is injecting links to some course. I'm deleting this and 
spam comments

> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Registry for the GitLab 
> project says there's a Docker connection error. All these problems have been 
> discussed in this [Aws Sysops Training 
> |https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]follow
>  the page.
> Thanks
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-02-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-42033.
--
Resolution: Invalid

> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Registry for the GitLab 
> project says there's a Docker connection error. All these problems have been 
> discussed in this [Aws Sysops Training 
> |https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]follow
>  the page.
> Thanks
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42033) Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline

2023-02-22 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-42033:
-
External issue URL:   (was: 
https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/)

> Docker Tag Error 25 on gitlab-ci.yml trying to start GitLab Pipeline
> 
>
> Key: SPARK-42033
> URL: https://issues.apache.org/jira/browse/SPARK-42033
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.1.1
>Reporter: Pankaj Nagla
>Priority: Major
>
> I'm going through the "Scalable FastAPI Application on AWS" course. My 
> gitlab-ci.yml file is below.
>     stages:
>   - docker
> variables:
>   DOCKER_DRIVER: overlay2
>   DOCKER_TLS_CERTDIR: "/certs"
> cache:
>   key: ${CI_JOB_NAME}
>   paths:
>     - ${CI_PROJECT_DIR}/services/talk_booking/.venv/
> build-python-ci-image:
>   image: docker:19.03.0
>   services:
>     - docker:19.03.0-dind
>   stage: docker
>   before_script:
>     - cd ci_cd/python/
>   script:
>     - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" 
> $CI_REGISTRY
>     - docker build -t 
> registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim .
>     - docker push registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim
> My Pipeline fails with this error:
> See 
> [https://docs.docker.com/engine/reference/commandline/login/#credentials-store]
> Login Succeeded
> $ docker build -t registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim 
> .
> invalid argument 
> "registry.gitlab.com/chris_/talk-booking:cicd-python3.9-slim" for "-t, --tag" 
> flag: invalid reference format
> See 'docker build --help'.
> Cleaning up project directory and file based variables
> ERROR: Job failed: exit code 125
> It may or may not be relevant but the Container Registry for the GitLab 
> project says there's a Docker connection error. All these problems have been 
> discussed in this [Aws Sysops Training 
> |https://www.igmguru.com/cloud-computing/aws-sysops-certification-training/]follow
>  the page.
> Thanks
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] (SPARK-40149) Star expansion after outer join asymmetrically includes joining key

2023-02-22 Thread Sean R. Owen (Jira)



[ https://issues.apache.org/jira/browse/SPARK-40149 ]


Sean R. Owen deleted comment on SPARK-40149:
--

was (Author: JIRAUSER295111):
Thank you for sharing the information. [Vlocity 
Training|https://www.igmguru.com/salesforce/salesforce-vlocity-training-certification/]
 enhances CPQ and guided selling as well. Salesforce Vlocity is the pioneer 
assisting many tops and arising companies obtain their wanted progress 
utilizing its Omnichannel procedures.

> Star expansion after outer join asymmetrically includes joining key
> ---
>
> Key: SPARK-40149
> URL: https://issues.apache.org/jira/browse/SPARK-40149
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Otakar Truněček
>Assignee: Wenchen Fan
>Priority: Blocker
> Fix For: 3.3.1, 3.2.3, 3.4.0
>
>
> When star expansion is used on left side of a join, the result will include 
> joining key, while on the right side of join it doesn't. I would expect the 
> behaviour to be symmetric (either include on both sides or on neither). 
> Example:
> {code:python}
> from pyspark.sql import SparkSession
> import pyspark.sql.functions as f
> spark = SparkSession.builder.getOrCreate()
> df_left = spark.range(5).withColumn('val', f.lit('left'))
> df_right = spark.range(3, 7).withColumn('val', f.lit('right'))
> df_merged = (
> df_left
> .alias('left')
> .join(df_right.alias('right'), on='id', how='full_outer')
> .withColumn('left_all', f.struct('left.*'))
> .withColumn('right_all', f.struct('right.*'))
> )
> df_merged.show()
> {code}
> result:
> {code:java}
> +---++-++-+
> | id| val|  val|left_all|right_all|
> +---++-++-+
> |  0|left| null|   {0, left}|   {null}|
> |  1|left| null|   {1, left}|   {null}|
> |  2|left| null|   {2, left}|   {null}|
> |  3|left|right|   {3, left}|  {right}|
> |  4|left|right|   {4, left}|  {right}|
> |  5|null|right|{null, null}|  {right}|
> |  6|null|right|{null, null}|  {right}|
> +---++-++-+
> {code}
> This behaviour started with release 3.2.0. Previously the key was not 
> included on either side. 
> Result from Spark 3.1.3
> {code:java}
> +---++-++-+
> | id| val|  val|left_all|right_all|
> +---++-++-+
> |  0|left| null|  {left}|   {null}|
> |  6|null|right|  {null}|  {right}|
> |  5|null|right|  {null}|  {right}|
> |  1|left| null|  {left}|   {null}|
> |  3|left|right|  {left}|  {right}|
> |  2|left| null|  {left}|   {null}|
> |  4|left|right|  {left}|  {right}|
> +---++-++-+ {code}
> I have a gut feeling this is related to these issues:
> https://issues.apache.org/jira/browse/SPARK-39376
> https://issues.apache.org/jira/browse/SPARK-34527
> https://issues.apache.org/jira/browse/SPARK-38603
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-41793) Incorrect result for window frames defined by a range clause on large decimals

2023-02-22 Thread Thomas Graves (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-41793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692408#comment-17692408
 ] 

Thomas Graves commented on SPARK-41793:
---

[~ulysses] [~cloud_fan] [~xinrong] 

We need to decide what we are doing with this for 3.4 before doing any release.

> Incorrect result for window frames defined by a range clause on large 
> decimals 
> ---
>
> Key: SPARK-41793
> URL: https://issues.apache.org/jira/browse/SPARK-41793
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gera Shegalov
>Priority: Blocker
>  Labels: correctness
>
> Context 
> https://github.com/NVIDIA/spark-rapids/issues/7429#issuecomment-1368040686
> The following windowing query on a simple two-row input should produce two 
> non-empty windows as a result
> {code}
> from pprint import pprint
> data = [
>   ('9223372036854775807', '11342371013783243717493546650944543.47'),
>   ('9223372036854775807', '.99')
> ]
> df1 = spark.createDataFrame(data, 'a STRING, b STRING')
> df2 = df1.select(df1.a.cast('LONG'), df1.b.cast('DECIMAL(38,2)'))
> df2.createOrReplaceTempView('test_table')
> df = sql('''
>   SELECT 
> COUNT(1) OVER (
>   PARTITION BY a 
>   ORDER BY b ASC 
>   RANGE BETWEEN 10.2345 PRECEDING AND 6.7890 FOLLOWING
> ) AS CNT_1 
>   FROM 
> test_table
>   ''')
> res = df.collect()
> df.explain(True)
> pprint(res)
> {code}
> Spark 3.4.0-SNAPSHOT output:
> {code}
> [Row(CNT_1=1), Row(CNT_1=0)]
> {code}
> Spark 3.3.1 output as expected:
> {code}
> Row(CNT_1=1), Row(CNT_1=1)]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42532) Update YuniKorn documentation with v1.2

2023-02-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-42532.
---
Fix Version/s: 3.4.0
 Assignee: Dongjoon Hyun
   Resolution: Fixed

> Update YuniKorn documentation with v1.2
> ---
>
> Key: SPARK-42532
> URL: https://issues.apache.org/jira/browse/SPARK-42532
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42532) Update YuniKorn documentation with v1.2

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42532:


Assignee: Apache Spark

> Update YuniKorn documentation with v1.2
> ---
>
> Key: SPARK-42532
> URL: https://issues.apache.org/jira/browse/SPARK-42532
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42532) Update YuniKorn documentation with v1.2

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42532:


Assignee: (was: Apache Spark)

> Update YuniKorn documentation with v1.2
> ---
>
> Key: SPARK-42532
> URL: https://issues.apache.org/jira/browse/SPARK-42532
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42532) Update YuniKorn documentation with v1.2

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692401#comment-17692401
 ] 

Apache Spark commented on SPARK-42532:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40132

> Update YuniKorn documentation with v1.2
> ---
>
> Key: SPARK-42532
> URL: https://issues.apache.org/jira/browse/SPARK-42532
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42532) Update YuniKorn documentation with v1.2

2023-02-22 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-42532:
-

 Summary: Update YuniKorn documentation with v1.2
 Key: SPARK-42532
 URL: https://issues.apache.org/jira/browse/SPARK-42532
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, Kubernetes
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42150) Upgrade Volcano to 1.7.0

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692384#comment-17692384
 ] 

Apache Spark commented on SPARK-42150:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40131

> Upgrade Volcano to 1.7.0
> 
>
> Key: SPARK-42150
> URL: https://issues.apache.org/jira/browse/SPARK-42150
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Kubernetes
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42522) Fix DataFrameWriterV2 to find the default source

2023-02-22 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-42522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42522.
---
Fix Version/s: 3.4.0
 Assignee: Takuya Ueshin
   Resolution: Fixed

> Fix DataFrameWriterV2 to find the default source
> 
>
> Key: SPARK-42522
> URL: https://issues.apache.org/jira/browse/SPARK-42522
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:python}
> df.writeTo("test_table").create()
> {code}
> throws:
> {noformat}
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkClassNotFoundException) [DATA_SOURCE_NOT_FOUND] Failed 
> to find the data source: . Please find packages at 
> `https://spark.apache.org/third-party-projects.html`.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42518) Scala client Write API V2

2023-02-22 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-42518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42518.
---
Fix Version/s: 3.4.0
 Assignee: Zhen Li
   Resolution: Fixed

> Scala client Write API V2
> -
>
> Key: SPARK-42518
> URL: https://issues.apache.org/jira/browse/SPARK-42518
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Zhen Li
>Priority: Major
> Fix For: 3.4.0
>
>
> Impl the Dataset#writeTo method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42531) Scala Client Add Collection Functions

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42531:


Assignee: (was: Apache Spark)

> Scala Client Add Collection Functions
> -
>
> Key: SPARK-42531
> URL: https://issues.apache.org/jira/browse/SPARK-42531
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42531) Scala Client Add Collection Functions

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42531:


Assignee: Apache Spark

> Scala Client Add Collection Functions
> -
>
> Key: SPARK-42531
> URL: https://issues.apache.org/jira/browse/SPARK-42531
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42531) Scala Client Add Collection Functions

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692367#comment-17692367
 ] 

Apache Spark commented on SPARK-42531:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40130

> Scala Client Add Collection Functions
> -
>
> Key: SPARK-42531
> URL: https://issues.apache.org/jira/browse/SPARK-42531
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42531) Scala Client Add Collection Functions

2023-02-22 Thread Jira

Herman van Hövell created SPARK-42531:
-

 Summary: Scala Client Add Collection Functions
 Key: SPARK-42531
 URL: https://issues.apache.org/jira/browse/SPARK-42531
 Project: Spark
  Issue Type: Task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Herman van Hövell






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42529) Support Cube and Rollup

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42529:


Assignee: Apache Spark  (was: Rui Wang)

> Support Cube and Rollup
> ---
>
> Key: SPARK-42529
> URL: https://issues.apache.org/jira/browse/SPARK-42529
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42529) Support Cube and Rollup

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692349#comment-17692349
 ] 

Apache Spark commented on SPARK-42529:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40129

> Support Cube and Rollup
> ---
>
> Key: SPARK-42529
> URL: https://issues.apache.org/jira/browse/SPARK-42529
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42529) Support Cube and Rollup

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42529:


Assignee: Rui Wang  (was: Apache Spark)

> Support Cube and Rollup
> ---
>
> Key: SPARK-42529
> URL: https://issues.apache.org/jira/browse/SPARK-42529
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42529) Support Cube and Rollup

2023-02-22 Thread Rui Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Wang updated SPARK-42529:
-
Summary: Support Cube and Rollup  (was: Support Cube,Rollup,Pivot)

> Support Cube and Rollup
> ---
>
> Key: SPARK-42529
> URL: https://issues.apache.org/jira/browse/SPARK-42529
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42466) spark.kubernetes.file.upload.path not deleting files under HDFS after job completes

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692340#comment-17692340
 ] 

Apache Spark commented on SPARK-42466:
--

User 'shrprasa' has created a pull request for this issue:
https://github.com/apache/spark/pull/40128

> spark.kubernetes.file.upload.path not deleting files under HDFS after job 
> completes
> ---
>
> Key: SPARK-42466
> URL: https://issues.apache.org/jira/browse/SPARK-42466
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Jagadeeswara Rao
>Priority: Major
>
> In cluster mode after uploading files to HDFS location using 
> spark.kubernetes.file.upload.path property files are not getting cleared . 
> File is successfully uploaded to hdfs location in this format 
> spark-upload-[randomUUID] using {{KubernetesUtils}} is requested to  
> uploadFileUri . 
> [https://github.com/apache/spark/blob/76a134ade60a9f354aca01eaca0b2e2477c6bd43/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala#L310]
> following is driver log  , driver is completed successfully and shutdownhook 
> is not cleared the hdfs files.
> {code:java}
> 23/02/16 18:06:56 INFO KubernetesClusterSchedulerBackend: Shutting down all 
> executors
> 23/02/16 18:06:56 INFO 
> KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each 
> executor to shut down
> 23/02/16 18:06:56 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed.
> 23/02/16 18:06:57 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> 23/02/16 18:06:57 INFO MemoryStore: MemoryStore cleared
> 23/02/16 18:06:57 INFO BlockManager: BlockManager stopped
> 23/02/16 18:06:57 INFO BlockManagerMaster: BlockManagerMaster stopped
> 23/02/16 18:06:57 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
> 23/02/16 18:06:57 INFO SparkContext: Successfully stopped SparkContext
> 23/02/16 18:06:57 INFO ShutdownHookManager: Shutdown hook called
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /tmp/spark-efb8f725-4ead-4729-a8e0-f478280121b7
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /spark-local2/spark-66dbf7e6-fe7e-4655-8724-69d76d93fc1f
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /spark-local1/spark-53aefaee-58a5-4fce-b5b0-5e29f42e337f{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42466) spark.kubernetes.file.upload.path not deleting files under HDFS after job completes

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692339#comment-17692339
 ] 

Apache Spark commented on SPARK-42466:
--

User 'shrprasa' has created a pull request for this issue:
https://github.com/apache/spark/pull/40128

> spark.kubernetes.file.upload.path not deleting files under HDFS after job 
> completes
> ---
>
> Key: SPARK-42466
> URL: https://issues.apache.org/jira/browse/SPARK-42466
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Jagadeeswara Rao
>Priority: Major
>
> In cluster mode after uploading files to HDFS location using 
> spark.kubernetes.file.upload.path property files are not getting cleared . 
> File is successfully uploaded to hdfs location in this format 
> spark-upload-[randomUUID] using {{KubernetesUtils}} is requested to  
> uploadFileUri . 
> [https://github.com/apache/spark/blob/76a134ade60a9f354aca01eaca0b2e2477c6bd43/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala#L310]
> following is driver log  , driver is completed successfully and shutdownhook 
> is not cleared the hdfs files.
> {code:java}
> 23/02/16 18:06:56 INFO KubernetesClusterSchedulerBackend: Shutting down all 
> executors
> 23/02/16 18:06:56 INFO 
> KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each 
> executor to shut down
> 23/02/16 18:06:56 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed.
> 23/02/16 18:06:57 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> 23/02/16 18:06:57 INFO MemoryStore: MemoryStore cleared
> 23/02/16 18:06:57 INFO BlockManager: BlockManager stopped
> 23/02/16 18:06:57 INFO BlockManagerMaster: BlockManagerMaster stopped
> 23/02/16 18:06:57 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
> 23/02/16 18:06:57 INFO SparkContext: Successfully stopped SparkContext
> 23/02/16 18:06:57 INFO ShutdownHookManager: Shutdown hook called
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /tmp/spark-efb8f725-4ead-4729-a8e0-f478280121b7
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /spark-local2/spark-66dbf7e6-fe7e-4655-8724-69d76d93fc1f
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /spark-local1/spark-53aefaee-58a5-4fce-b5b0-5e29f42e337f{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42466) spark.kubernetes.file.upload.path not deleting files under HDFS after job completes

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42466:


Assignee: (was: Apache Spark)

> spark.kubernetes.file.upload.path not deleting files under HDFS after job 
> completes
> ---
>
> Key: SPARK-42466
> URL: https://issues.apache.org/jira/browse/SPARK-42466
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Jagadeeswara Rao
>Priority: Major
>
> In cluster mode after uploading files to HDFS location using 
> spark.kubernetes.file.upload.path property files are not getting cleared . 
> File is successfully uploaded to hdfs location in this format 
> spark-upload-[randomUUID] using {{KubernetesUtils}} is requested to  
> uploadFileUri . 
> [https://github.com/apache/spark/blob/76a134ade60a9f354aca01eaca0b2e2477c6bd43/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala#L310]
> following is driver log  , driver is completed successfully and shutdownhook 
> is not cleared the hdfs files.
> {code:java}
> 23/02/16 18:06:56 INFO KubernetesClusterSchedulerBackend: Shutting down all 
> executors
> 23/02/16 18:06:56 INFO 
> KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each 
> executor to shut down
> 23/02/16 18:06:56 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed.
> 23/02/16 18:06:57 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> 23/02/16 18:06:57 INFO MemoryStore: MemoryStore cleared
> 23/02/16 18:06:57 INFO BlockManager: BlockManager stopped
> 23/02/16 18:06:57 INFO BlockManagerMaster: BlockManagerMaster stopped
> 23/02/16 18:06:57 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
> 23/02/16 18:06:57 INFO SparkContext: Successfully stopped SparkContext
> 23/02/16 18:06:57 INFO ShutdownHookManager: Shutdown hook called
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /tmp/spark-efb8f725-4ead-4729-a8e0-f478280121b7
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /spark-local2/spark-66dbf7e6-fe7e-4655-8724-69d76d93fc1f
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /spark-local1/spark-53aefaee-58a5-4fce-b5b0-5e29f42e337f{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42466) spark.kubernetes.file.upload.path not deleting files under HDFS after job completes

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42466:


Assignee: Apache Spark

> spark.kubernetes.file.upload.path not deleting files under HDFS after job 
> completes
> ---
>
> Key: SPARK-42466
> URL: https://issues.apache.org/jira/browse/SPARK-42466
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.2.0
>Reporter: Jagadeeswara Rao
>Assignee: Apache Spark
>Priority: Major
>
> In cluster mode after uploading files to HDFS location using 
> spark.kubernetes.file.upload.path property files are not getting cleared . 
> File is successfully uploaded to hdfs location in this format 
> spark-upload-[randomUUID] using {{KubernetesUtils}} is requested to  
> uploadFileUri . 
> [https://github.com/apache/spark/blob/76a134ade60a9f354aca01eaca0b2e2477c6bd43/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesUtils.scala#L310]
> following is driver log  , driver is completed successfully and shutdownhook 
> is not cleared the hdfs files.
> {code:java}
> 23/02/16 18:06:56 INFO KubernetesClusterSchedulerBackend: Shutting down all 
> executors
> 23/02/16 18:06:56 INFO 
> KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each 
> executor to shut down
> 23/02/16 18:06:56 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
> been closed.
> 23/02/16 18:06:57 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> 23/02/16 18:06:57 INFO MemoryStore: MemoryStore cleared
> 23/02/16 18:06:57 INFO BlockManager: BlockManager stopped
> 23/02/16 18:06:57 INFO BlockManagerMaster: BlockManagerMaster stopped
> 23/02/16 18:06:57 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
> 23/02/16 18:06:57 INFO SparkContext: Successfully stopped SparkContext
> 23/02/16 18:06:57 INFO ShutdownHookManager: Shutdown hook called
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /tmp/spark-efb8f725-4ead-4729-a8e0-f478280121b7
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /spark-local2/spark-66dbf7e6-fe7e-4655-8724-69d76d93fc1f
> 23/02/16 18:06:57 INFO ShutdownHookManager: Deleting directory 
> /spark-local1/spark-53aefaee-58a5-4fce-b5b0-5e29f42e337f{code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42530) Remove Hadoop 2 from PySpark installation guide

2023-02-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-42530:
-

Assignee: Dongjoon Hyun

> Remove Hadoop 2 from PySpark installation guide
> ---
>
> Key: SPARK-42530
> URL: https://issues.apache.org/jira/browse/SPARK-42530
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42530) Remove Hadoop 2 from PySpark installation guide

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692333#comment-17692333
 ] 

Apache Spark commented on SPARK-42530:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40127

> Remove Hadoop 2 from PySpark installation guide
> ---
>
> Key: SPARK-42530
> URL: https://issues.apache.org/jira/browse/SPARK-42530
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42530) Remove Hadoop 2 from PySpark installation guide

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42530:


Assignee: (was: Apache Spark)

> Remove Hadoop 2 from PySpark installation guide
> ---
>
> Key: SPARK-42530
> URL: https://issues.apache.org/jira/browse/SPARK-42530
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42530) Remove Hadoop 2 from PySpark installation guide

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692332#comment-17692332
 ] 

Apache Spark commented on SPARK-42530:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40127

> Remove Hadoop 2 from PySpark installation guide
> ---
>
> Key: SPARK-42530
> URL: https://issues.apache.org/jira/browse/SPARK-42530
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-42530) Remove Hadoop 2 from PySpark installation guide

2023-02-22 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42530:


Assignee: Apache Spark

> Remove Hadoop 2 from PySpark installation guide
> ---
>
> Key: SPARK-42530
> URL: https://issues.apache.org/jira/browse/SPARK-42530
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42530) Remove Hadoop 2 from PySpark installation guide

2023-02-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42530:
--
Summary: Remove Hadoop 2 from PySpark installation guide  (was: Update 
PySpark installation guide by hiding Hadoop 2)

> Remove Hadoop 2 from PySpark installation guide
> ---
>
> Key: SPARK-42530
> URL: https://issues.apache.org/jira/browse/SPARK-42530
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42530) Update PySpark installation guide by hiding Hadoop 2

2023-02-22 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-42530:
--
Summary: Update PySpark installation guide by hiding Hadoop 2  (was: Update 
PySpark installation by hiding Hadoop 2)

> Update PySpark installation guide by hiding Hadoop 2
> 
>
> Key: SPARK-42530
> URL: https://issues.apache.org/jira/browse/SPARK-42530
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42530) Update PySpark installation by hiding Hadoop 2

2023-02-22 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-42530:
-

 Summary: Update PySpark installation by hiding Hadoop 2
 Key: SPARK-42530
 URL: https://issues.apache.org/jira/browse/SPARK-42530
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, PySpark
Affects Versions: 3.4.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40822) Use stable derived-column-alias algorithm, suitable for CREATE VIEW

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692308#comment-17692308
 ] 

Apache Spark commented on SPARK-40822:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/40126

> Use stable derived-column-alias algorithm, suitable for CREATE VIEW 
> 
>
> Key: SPARK-40822
> URL: https://issues.apache.org/jira/browse/SPARK-40822
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> Spark has the ability derive column aliases for expressions if no alias was 
> provided by the user.
> E.g.
> CREATE TABLE T(c1 INT, c2 INT);
> SELECT c1, `(c1 + 1)`, c3 FROM (SELECT c1, c1 + 1, c1 * c2 AS c3 FROM T);
> This is a valuable feature. However, the current implementation works by 
> pretty printing the expression from the logical plan.  This has multiple 
> downsides:
>  * The derived names can be unintuitive. For example the brackets in `(c1 + 
> 1)` or outright ugly, such as:
> SELECT `substr(hello, 1, 2147483647)` FROM (SELECT substr('hello', 1)) AS T;
>  * We cannot guarantee stability across versions since the logical lan of an 
> expression may change.
> The later is a major reason why we cannot allow CREATE VIEW without a column 
> list except in "trivial" cases.
> CREATE VIEW v AS SELECT c1, c1 + 1, c1 * c2 AS c3 FROM T;
> Not allowed to create a permanent view `spark_catalog`.`default`.`v` without 
> explicitly assigning an alias for expression (c1 + 1).
> There are two way we can go about fixing this:
>  # Stop deriving column aliases from the expression. Instead generate unique 
> names such as `_col_1` based on their position in the select list. This is 
> ugly and takes away the "nice" headers on result sets
>  # Move the derivation of the name upstream. That is instead of pretty 
> printing the logical plan we pretty print the lexer output, or a sanitized 
> version of the expression as typed.
> The statement as typed is stable by definition. The lexer is stable because i 
> has no reason to change. And if it ever did we have a better chance to manage 
> the change.
> In this feature we propose the following semantic:
>  # If the column alias can be trivially derived (some of these can stack), do 
> so:
>  ** a (qualified) column reference => the unqualified column identifier
> cat.sch.tab.col => col
>  ** A field reference => the fieldname
> struct.field1.field2 => field2
>  ** A cast(column AS type) => column
> cast(col1 AS INT) => col1
>  ** A map lookup with literal key => keyname
> map.key => key
> map['key'] => key
>  ** A parameter less function => unqualified function name
> current_schema() => current_schema
>  # Take the lexer tokens of the expression, eliminate comments, and append 
> them.
> foo(tab1.c1 + /* this is a plus*/
> 1) => `foo(tab1.c1+1)`
>  
> Of course we wan this change under a config.
> If the config is set we can allow CREATE VIEW to exploit this and use the 
> derived expressions.
> PS: The exact mechanics of formatting the name is very much debatable. 
> E.g.spaces between token, squeezing out comments - upper casing - preserving 
> quotes or double quotes...)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42468) Implement agg by (String, String)*

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692307#comment-17692307
 ] 

Apache Spark commented on SPARK-42468:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40125

> Implement agg by (String, String)*
> --
>
> Key: SPARK-42468
> URL: https://issues.apache.org/jira/browse/SPARK-42468
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-42529) Support Cube,Rollup,Pivot

2023-02-22 Thread Rui Wang (Jira)

Rui Wang created SPARK-42529:


 Summary: Support Cube,Rollup,Pivot
 Key: SPARK-42529
 URL: https://issues.apache.org/jira/browse/SPARK-42529
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.4.0
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-42527) Scala Client add Window functions

2023-02-22 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-42527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42527.
---
Fix Version/s: 3.4.0
 Assignee: Yang Jie
   Resolution: Fixed

> Scala Client add Window functions
> -
>
> Key: SPARK-42527
> URL: https://issues.apache.org/jira/browse/SPARK-42527
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42527) Scala Client add Window functions

2023-02-22 Thread Jira



[ 
https://issues.apache.org/jira/browse/SPARK-42527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692226#comment-17692226
 ] 

Herman van Hövell commented on SPARK-42527:
---

I have merged this to 3.4. It might be in 3.4.0 if RC fails, or 3.4.1 if it 
passes.

> Scala Client add Window functions
> -
>
> Key: SPARK-42527
> URL: https://issues.apache.org/jira/browse/SPARK-42527
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37980) Extend METADATA column to support row indices for file based data sources

2023-02-22 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17692170#comment-17692170
 ] 

Apache Spark commented on SPARK-37980:
--

User 'olaky' has created a pull request for this issue:
https://github.com/apache/spark/pull/40124

> Extend METADATA column to support row indices for file based data sources
> -
>
> Key: SPARK-37980
> URL: https://issues.apache.org/jira/browse/SPARK-37980
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Prakhar Jain
>Assignee: Ala Luszczak
>Priority: Major
> Fix For: 3.4.0
>
>
> Spark recently added hidden metadata column support for File based 
> datasources as part of  SPARK-37273.
> We should extend it to support ROW_INDEX/ROW_POSITION also.
>  
> Meaning of  ROW_POSITION:
> ROW_INDEX/ROW_POSITION is basically an index of a row within a file. E.g. 5th 
> row in the file will have ROW_INDEX 5.
>  
> Use cases: 
> Row Indexes can be used in a variety of ways. A (fileName, rowIndex) tuple 
> uniquely identifies row in a table. This information can be used to mark rows 
> e.g. this can be used by indexer etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 127 matches

Mail list logo