date:20201120

[jira] [Commented] (SPARK-28704) Test backward compatibility on JDK9+ once we have a version supports JDK9+

2020-11-20 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236582#comment-17236582
 ] 

Apache Spark commented on SPARK-28704:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/30451

> Test backward compatibility on JDK9+ once we have a version supports JDK9+
> --
>
> Key: SPARK-28704
> URL: https://issues.apache.org/jira/browse/SPARK-28704
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.1.0
>
>
> We skip test HiveExternalCatalogVersionsSuite when testing with JAVA_9 or 
> later because our previous version does not support JAVA_9 or later. We 
> should add it back once we have a version supports JAVA_9 or later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-28704) Test backward compatibility on JDK9+ once we have a version supports JDK9+

2020-11-20 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236583#comment-17236583
 ] 

Apache Spark commented on SPARK-28704:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/30451

> Test backward compatibility on JDK9+ once we have a version supports JDK9+
> --
>
> Key: SPARK-28704
> URL: https://issues.apache.org/jira/browse/SPARK-28704
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.1.0
>
>
> We skip test HiveExternalCatalogVersionsSuite when testing with JAVA_9 or 
> later because our previous version does not support JAVA_9 or later. We 
> should add it back once we have a version supports JAVA_9 or later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2020-11-20 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236578#comment-17236578
 ] 

Hyukjin Kwon commented on SPARK-21187:
--

 Awesome [~bryanc]. It was a super super long task :-).

> Complete support for remaining Spark data types in Arrow Converters
> ---
>
> Key: SPARK-21187
> URL: https://issues.apache.org/jira/browse/SPARK-21187
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
> Fix For: 3.1.0
>
>
> This is to track adding the remaining type support in Arrow Converters. 
> Currently, only primitive data types are supported. '
> Remaining types:
>  * -*Date*-
>  * -*Timestamp*-
>  * *Complex*: -Struct-, -Array-, -Map-
>  * -*Decimal*-
>  * -*Binary*-
>  * -*Categorical*- when converting from Pandas
> Some things to do before closing this out:
>  * -Look to upgrading to Arrow 0.7 for better Decimal support (can now write 
> values as BigDecimal)-
>  * -Need to add some user docs-
>  * -Make sure Python tests are thorough-
>  * Check into complex type support mentioned in comments by [~leif], should 
> we support mulit-indexing?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33505) Fix insert into `InMemoryPartitionTable`

2020-11-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33505:
-

Assignee: Maxim Gekk

> Fix insert into `InMemoryPartitionTable`
> 
>
> Key: SPARK-33505
> URL: https://issues.apache.org/jira/browse/SPARK-33505
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Currently, INSERT INTO a partitioned table in V2 in-memory catalog doesn't 
> create partitions. The example below demonstrates the issue:
> {code:scala}
>   test("insert into partitioned table") {
> val t = "testpart.ns1.ns2.tbl"
> withTable(t) {
>   spark.sql(
> s"""
>|CREATE TABLE $t (id bigint, name string, data string)
>|USING foo
>|PARTITIONED BY (id, name)""".stripMargin)
>   spark.sql(s"INSERT INTO $t PARTITION(id = 1, name = 'Max') SELECT 
> 'abc'")
>   val partTable = catalog("testpart").asTableCatalog
> .loadTable(Identifier.of(Array("ns1", "ns2"), 
> "tbl")).asInstanceOf[InMemoryPartitionTable]
>   assert(partTable.partitionExists(InternalRow.fromSeq(Seq(1, 
> UTF8String.fromString("Max")
> }
>   }
> {code}
> The partitionExists() function return false for the partitions that must be 
> created.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33505) Fix insert into `InMemoryPartitionTable`

2020-11-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33505.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30449
[https://github.com/apache/spark/pull/30449]

> Fix insert into `InMemoryPartitionTable`
> 
>
> Key: SPARK-33505
> URL: https://issues.apache.org/jira/browse/SPARK-33505
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.1.0
>
>
> Currently, INSERT INTO a partitioned table in V2 in-memory catalog doesn't 
> create partitions. The example below demonstrates the issue:
> {code:scala}
>   test("insert into partitioned table") {
> val t = "testpart.ns1.ns2.tbl"
> withTable(t) {
>   spark.sql(
> s"""
>|CREATE TABLE $t (id bigint, name string, data string)
>|USING foo
>|PARTITIONED BY (id, name)""".stripMargin)
>   spark.sql(s"INSERT INTO $t PARTITION(id = 1, name = 'Max') SELECT 
> 'abc'")
>   val partTable = catalog("testpart").asTableCatalog
> .loadTable(Identifier.of(Array("ns1", "ns2"), 
> "tbl")).asInstanceOf[InMemoryPartitionTable]
>   assert(partTable.partitionExists(InternalRow.fromSeq(Seq(1, 
> UTF8String.fromString("Max")
> }
>   }
> {code}
> The partitionExists() function return false for the partitions that must be 
> created.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33506) t 130038ERROR ContextCleaner:91 - Error cleaning broadcast

2020-11-20 Thread Takeshi Yamamuro (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236534#comment-17236534
 ] 

Takeshi Yamamuro commented on SPARK-33506:
--

I think the description is not enough to reproduce the issue above. Please 
describe it more. 

> t 130038ERROR ContextCleaner:91 - Error cleaning broadcast
> --
>
> Key: SPARK-33506
> URL: https://issues.apache.org/jira/browse/SPARK-33506
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.3
>Reporter: amit sharma
>Priority: Major
>
> I am using spark 2.3.3  with 16 workers with 30 cores each. I facing similar 
> exception. This issue is occurring once a while and spark streaming process 
> do not handle any request after this. We need to restart the streaming 
> process.
>  
> Error cleaning broadcast 130038ERROR ContextCleaner:91 - Error cleaning 
> broadcast 130038
> org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [240 
> seconds]. This timeout is controlled by spark.network.timeout
>  at 
> org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
>  at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
>  at 
> org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
>  at 
> scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
>  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
>  at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
>  at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)
>  at 
> org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:148)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:321)
>  at 
> org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)
>  at 
> org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66)
>  at 
> org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:238)
>  at 
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$1.apply(ContextCleaner.scala:194)
>  at 
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$an



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33507) Improve and fix cache behavior in v1 and v2

2020-11-20 Thread Chao Sun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236521#comment-17236521
 ] 

Chao Sun commented on SPARK-33507:
--

[~dongjoon] Yes this is for Spark 3.1 _mostly_ (some JIRAs are in 2.4.x though 
such as SPARK-33290)

> Improve and fix cache behavior in v1 and v2
> ---
>
> Key: SPARK-33507
> URL: https://issues.apache.org/jira/browse/SPARK-33507
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> This is an umbrella JIRA to track fixes & improvements for caching behavior 
> in Spark datasource v1 and v2, which includes:
>   - fix some existing cache behavior in v1.
>   - fix inconsistent cache behaviors between v1 and v2
>   - implement new features in v2 to align with those in v1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33507) Improve and fix cache behavior in v1 and v2

2020-11-20 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236515#comment-17236515
 ] 

Dongjoon Hyun commented on SPARK-33507:
---

BTW, are you targeting this on Apache Spark 3.1?

> Improve and fix cache behavior in v1 and v2
> ---
>
> Key: SPARK-33507
> URL: https://issues.apache.org/jira/browse/SPARK-33507
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> This is an umbrella JIRA to track fixes & improvements for caching behavior 
> in Spark datasource v1 and v2, which includes:
>   - fix some existing cache behavior in v1.
>   - fix inconsistent cache behaviors between v1 and v2
>   - implement new features in v2 to align with those in v1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33507) Improve and fix cache behavior in v1 and v2

2020-11-20 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236512#comment-17236512
 ] 

Dongjoon Hyun commented on SPARK-33507:
---

Thank you for working on this area, [~csun]. This looks very important to me 
for Apache Iceberg use-cases, too.

> Improve and fix cache behavior in v1 and v2
> ---
>
> Key: SPARK-33507
> URL: https://issues.apache.org/jira/browse/SPARK-33507
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Priority: Major
>
> This is an umbrella JIRA to track fixes & improvements for caching behavior 
> in Spark datasource v1 and v2, which includes:
>   - fix some existing cache behavior in v1.
>   - fix inconsistent cache behaviors between v1 and v2
>   - implement new features in v2 to align with those in v1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33507) Improve and fix cache behavior in v1 and v2

2020-11-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33507:
-

Assignee: Chao Sun

> Improve and fix cache behavior in v1 and v2
> ---
>
> Key: SPARK-33507
> URL: https://issues.apache.org/jira/browse/SPARK-33507
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> This is an umbrella JIRA to track fixes & improvements for caching behavior 
> in Spark datasource v1 and v2, which includes:
>   - fix some existing cache behavior in v1.
>   - fix inconsistent cache behaviors between v1 and v2
>   - implement new features in v2 to align with those in v1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32670) Group exception messages in Catalyst Analyzer in one file

2020-11-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-32670.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29497
[https://github.com/apache/spark/pull/29497]

> Group exception messages in Catalyst Analyzer in one file
> -
>
> Key: SPARK-32670
> URL: https://issues.apache.org/jira/browse/SPARK-32670
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>Priority: Minor
> Fix For: 3.1.0
>
>
> For standardization of error messages and its maintenance, we can try to 
> group the exception messages into a single file. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33507) Improve and fix cache behavior in v1 and v2

2020-11-20 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-33507:
-
Description: 
This is an umbrella JIRA to track fixes & improvements for caching behavior in 
Spark datasource v1 and v2, which includes:
  - fix some existing cache behavior in v1.
  - fix inconsistent cache behaviors between v1 and v2
  - implement new features in v2 to align with those in v1.

  was:This is an umbrella JIRA to track fixes & improvements for caching 
behavior in Spark datasource v1 and v2.


> Improve and fix cache behavior in v1 and v2
> ---
>
> Key: SPARK-33507
> URL: https://issues.apache.org/jira/browse/SPARK-33507
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Priority: Major
>
> This is an umbrella JIRA to track fixes & improvements for caching behavior 
> in Spark datasource v1 and v2, which includes:
>   - fix some existing cache behavior in v1.
>   - fix inconsistent cache behaviors between v1 and v2
>   - implement new features in v2 to align with those in v1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33305) DSv2: DROP TABLE command should also invalidate cache

2020-11-20 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-33305:
-
Parent: (was: SPARK-33392)
Issue Type: Bug  (was: Sub-task)

> DSv2: DROP TABLE command should also invalidate cache
> -
>
> Key: SPARK-33305
> URL: https://issues.apache.org/jira/browse/SPARK-33305
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.1.0
>
>
> Different from DSv1, {{DROP TABLE}} command in DSv2 currently only drops the 
> table but doesn't invalidate all caches referencing the table. We should make 
> the behavior consistent between v1 and v2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33290) REFRESH TABLE should invalidate cache even though the table itself may not be cached

2020-11-20 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-33290:
-
Parent: SPARK-33507
Issue Type: Sub-task  (was: Bug)

> REFRESH TABLE should invalidate cache even though the table itself may not be 
> cached
> 
>
> Key: SPARK-33290
> URL: https://issues.apache.org/jira/browse/SPARK-33290
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.1, 3.1.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: correctness
> Fix For: 2.4.8, 3.0.2, 3.1.0
>
>
> For the following example:
> {code}
> CREATE TABLE t ...;
> CREATE VIEW t1 AS SELECT * FROM t;
> REFRESH TABLE t
> {code}
> If t is cached, t1 will be invalidated. However if t is not cached as above, 
> the REFRESH command won't invalidate view t1. This could lead to incorrect 
> result if the view is used later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33305) DSv2: DROP TABLE command should also invalidate cache

2020-11-20 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-33305:
-
Parent: SPARK-33507
Issue Type: Sub-task  (was: Bug)

> DSv2: DROP TABLE command should also invalidate cache
> -
>
> Key: SPARK-33305
> URL: https://issues.apache.org/jira/browse/SPARK-33305
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.1.0
>
>
> Different from DSv1, {{DROP TABLE}} command in DSv2 currently only drops the 
> table but doesn't invalidate all caches referencing the table. We should make 
> the behavior consistent between v1 and v2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33435) DSv2: REFRESH TABLE should invalidate caches

2020-11-20 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-33435:
-
Parent: SPARK-33507
Issue Type: Sub-task  (was: Bug)

> DSv2: REFRESH TABLE should invalidate caches
> 
>
> Key: SPARK-33435
> URL: https://issues.apache.org/jira/browse/SPARK-33435
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>  Labels: DSv2, correctness
> Fix For: 3.0.2, 3.1.0
>
>
> Currently, in DSv2 {{RefreshTableExec}}, we only invalidate metadata cache 
> but not all the caches that referencing the table to be refreshed. This may 
> cause correctness issue if these caches go stale and get queried later.
> Note that since we don't support caching a v2 table yet, we can't recache the 
> table itself at the moment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33492) DSv2: Append/Overwrite/ReplaceTable should invalidate cache

2020-11-20 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-33492:
-
Parent: SPARK-33507
Issue Type: Sub-task  (was: Bug)

> DSv2: Append/Overwrite/ReplaceTable should invalidate cache
> ---
>
> Key: SPARK-33492
> URL: https://issues.apache.org/jira/browse/SPARK-33492
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.1.0
>
>
> Unlike in DSv1, currently in DSv2 we don't invalidate table caches for 
> operations such as append, overwrite table by expr/partition, replace table, 
> etc. We should fix these so that the behavior is consistent between v1 and v2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33507) Improve and fix cache behavior in v1 and v2

2020-11-20 Thread Chao Sun (Jira)

Chao Sun created SPARK-33507:


 Summary: Improve and fix cache behavior in v1 and v2
 Key: SPARK-33507
 URL: https://issues.apache.org/jira/browse/SPARK-33507
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Chao Sun


This is an umbrella JIRA to track fixes & improvements for caching behavior in 
Spark datasource v1 and v2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33492) DSv2: Append/Overwrite/ReplaceTable should invalidate cache

2020-11-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33492:
--
Fix Version/s: (was: 3.2.0)
   3.1.0

> DSv2: Append/Overwrite/ReplaceTable should invalidate cache
> ---
>
> Key: SPARK-33492
> URL: https://issues.apache.org/jira/browse/SPARK-33492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.1.0
>
>
> Unlike in DSv1, currently in DSv2 we don't invalidate table caches for 
> operations such as append, overwrite table by expr/partition, replace table, 
> etc. We should fix these so that the behavior is consistent between v1 and v2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33492) DSv2: Append/Overwrite/ReplaceTable should invalidate cache

2020-11-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33492.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30429
[https://github.com/apache/spark/pull/30429]

> DSv2: Append/Overwrite/ReplaceTable should invalidate cache
> ---
>
> Key: SPARK-33492
> URL: https://issues.apache.org/jira/browse/SPARK-33492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.2.0
>
>
> Unlike in DSv1, currently in DSv2 we don't invalidate table caches for 
> operations such as append, overwrite table by expr/partition, replace table, 
> etc. We should fix these so that the behavior is consistent between v1 and v2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33492) DSv2: Append/Overwrite/ReplaceTable should invalidate cache

2020-11-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33492:
-

Assignee: Chao Sun

> DSv2: Append/Overwrite/ReplaceTable should invalidate cache
> ---
>
> Key: SPARK-33492
> URL: https://issues.apache.org/jira/browse/SPARK-33492
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
>
> Unlike in DSv1, currently in DSv2 we don't invalidate table caches for 
> operations such as append, overwrite table by expr/partition, replace table, 
> etc. We should fix these so that the behavior is consistent between v1 and v2.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33506) t 130038ERROR ContextCleaner:91 - Error cleaning broadcast

2020-11-20 Thread amit sharma (Jira)

amit sharma created SPARK-33506:
---

 Summary: t 130038ERROR ContextCleaner:91 - Error cleaning broadcast
 Key: SPARK-33506
 URL: https://issues.apache.org/jira/browse/SPARK-33506
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.3
Reporter: amit sharma


I am using spark 2.3.3  with 16 workers with 30 cores each. I facing similar 
exception. This issue is occurring once a while and spark streaming process do 
not handle any request after this. We need to restart the streaming process.

 

Error cleaning broadcast 130038ERROR ContextCleaner:91 - Error cleaning 
broadcast 130038
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [240 
seconds]. This timeout is controlled by spark.network.timeout
 at 
org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
 at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
 at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
 at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
 at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
 at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
 at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)
 at 
org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:148)
 at 
org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:321)
 at 
org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)
 at 
org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66)
 at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:238)
 at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$1.apply(ContextCleaner.scala:194)
 at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$an



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25316) Spark error - ERROR ContextCleaner: Error cleaning broadcast 22, Exception thrown in awaitResult:

2020-11-20 Thread amit sharma (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-25316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236473#comment-17236473
 ] 

amit sharma commented on SPARK-25316:
-

I am using spark 2.3.3  with 16 workers with 30 cores each. I facing similar 
exception

 

Error cleaning broadcast 130038ERROR ContextCleaner:91 - Error cleaning 
broadcast 130038
org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [240 
seconds]. This timeout is controlled by spark.network.timeout
 at 
org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
 at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
 at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
 at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
 at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
 at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
 at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)
 at 
org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:148)
 at 
org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:321)
 at 
org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)
 at 
org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:66)
 at org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:238)
 at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$1.apply(ContextCleaner.scala:194)
 at 
org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$an

> Spark error - ERROR ContextCleaner: Error cleaning broadcast 22,  Exception 
> thrown in awaitResult: 
> ---
>
> Key: SPARK-25316
> URL: https://issues.apache.org/jira/browse/SPARK-25316
> Project: Spark
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 2.2.2
>Reporter: Vidya
>Priority: Major
>  Labels: bulk-closed
>
> While running spark load on EMR with c3 instaces, we see following error 
> ERROR ContextCleaner: Error cleaning broadcast 22
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>  
> Whats the cause of the error and how do we fix it?
>  
> Stage 30:=> (374 + 20) / 600]
> [Stage 30:=> (419 + 20) / 600]
> [Stage 30:==> (471 + 4) / 
> 600]18/08/02 21:06:09 ERROR TransportResponseHandler: Still have 1 requests 
> outstanding when connection from /10.154.21.145:45990 is closed
> 18/08/02 21:06:09 ERROR ContextCleaner: Error cleaning broadcast 22
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>  at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
>  at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
>  at 
> org.apache.spark.storage.BlockManagerMaster.removeBroadcast(BlockManagerMaster.scala:161)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast$.unpersist(TorrentBroadcast.scala:306)
>  at 
> org.apache.spark.broadcast.TorrentBroadcastFactory.unbroadcast(TorrentBroadcastFactory.scala:45)
>  at 
> org.apache.spark.broadcast.BroadcastManager.unbroadcast(BroadcastManager.scala:60)
>  at 
> org.apache.spark.ContextCleaner.doCleanupBroadcast(ContextCleaner.scala:238)
>  at 
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$1.apply(ContextCleaner.scala:194)
>  at 
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1$$anonfun$apply$mcV$sp$1.apply(ContextCleaner.scala:185)
>  at scala.Option.foreach(Option.scala:257)
>  at 
> org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:185)
>  at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1279)
>  at 
> org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178)
>  at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73)
> Caused by: java.io.IOException: Connection reset by peer
>  at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>  at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>  at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
>  at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221)
>  at

[jira] [Reopened] (SPARK-33185) YARN: Print direct links to driver logs alongside application report in cluster mode

2020-11-20 Thread Erik Krogen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Krogen reopened SPARK-33185:
-

> YARN: Print direct links to driver logs alongside application report in 
> cluster mode
> 
>
> Key: SPARK-33185
> URL: https://issues.apache.org/jira/browse/SPARK-33185
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.1.0
>
>
> Currently when run in {{cluster}} mode on YARN, the Spark {{yarn.Client}} 
> will print out the application report into the logs, to be easily viewed by 
> users. For example:
> {code}
> INFO yarn.Client: 
>client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
>diagnostics: N/A
>ApplicationMaster host: X.X.X.X
>ApplicationMaster RPC port: 0
>queue: default
>start time: 1602782566027
>final status: UNDEFINED
>tracking URL: http://hostname:/proxy/application_/
>user: xkrogen
> {code}
> Typically, the tracking URL can be used to find the logs of the 
> ApplicationMaster/driver while the application is running. Later, the Spark 
> History Server can be used to track this information down, using the 
> stdout/stderr links on the Executors page.
> However, in the situation when the driver crashed _before_ writing out a 
> history file, the SHS may not be aware of this application, and thus does not 
> contain links to the driver logs. When this situation arises, it can be 
> difficult for users to debug further, since they can't easily find their 
> driver logs.
> It is possible to reach the logs by using the {{yarn logs}} commands, but the 
> average Spark user isn't aware of this and shouldn't have to be.
> I propose adding, alongside the application report, some additional lines 
> like:
> {code}
>  Driver Logs (stdout): 
> http://hostname:8042/node/containerlogs/container_/xkrogen/stdout?start=-4096
>  Driver Logs (stderr): 
> http://hostname:8042/node/containerlogs/container_/xkrogen/stderr?start=-4096
> {code}
> With this information available, users can quickly jump to their driver logs, 
> even if it crashed before the SHS became aware of the application. This has 
> the additional benefit of providing a quick way to access driver logs, which 
> often contain useful information, in a single click (instead of navigating 
> through the Spark UI).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33185) YARN: Print direct links to driver logs alongside application report in cluster mode

2020-11-20 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33185:


Assignee: Apache Spark  (was: Erik Krogen)

> YARN: Print direct links to driver logs alongside application report in 
> cluster mode
> 
>
> Key: SPARK-33185
> URL: https://issues.apache.org/jira/browse/SPARK-33185
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.1
>Reporter: Erik Krogen
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.1.0
>
>
> Currently when run in {{cluster}} mode on YARN, the Spark {{yarn.Client}} 
> will print out the application report into the logs, to be easily viewed by 
> users. For example:
> {code}
> INFO yarn.Client: 
>client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
>diagnostics: N/A
>ApplicationMaster host: X.X.X.X
>ApplicationMaster RPC port: 0
>queue: default
>start time: 1602782566027
>final status: UNDEFINED
>tracking URL: http://hostname:/proxy/application_/
>user: xkrogen
> {code}
> Typically, the tracking URL can be used to find the logs of the 
> ApplicationMaster/driver while the application is running. Later, the Spark 
> History Server can be used to track this information down, using the 
> stdout/stderr links on the Executors page.
> However, in the situation when the driver crashed _before_ writing out a 
> history file, the SHS may not be aware of this application, and thus does not 
> contain links to the driver logs. When this situation arises, it can be 
> difficult for users to debug further, since they can't easily find their 
> driver logs.
> It is possible to reach the logs by using the {{yarn logs}} commands, but the 
> average Spark user isn't aware of this and shouldn't have to be.
> I propose adding, alongside the application report, some additional lines 
> like:
> {code}
>  Driver Logs (stdout): 
> http://hostname:8042/node/containerlogs/container_/xkrogen/stdout?start=-4096
>  Driver Logs (stderr): 
> http://hostname:8042/node/containerlogs/container_/xkrogen/stderr?start=-4096
> {code}
> With this information available, users can quickly jump to their driver logs, 
> even if it crashed before the SHS became aware of the application. This has 
> the additional benefit of providing a quick way to access driver logs, which 
> often contain useful information, in a single click (instead of navigating 
> through the Spark UI).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33185) YARN: Print direct links to driver logs alongside application report in cluster mode

2020-11-20 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33185:


Assignee: Erik Krogen  (was: Apache Spark)

> YARN: Print direct links to driver logs alongside application report in 
> cluster mode
> 
>
> Key: SPARK-33185
> URL: https://issues.apache.org/jira/browse/SPARK-33185
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.1.0
>
>
> Currently when run in {{cluster}} mode on YARN, the Spark {{yarn.Client}} 
> will print out the application report into the logs, to be easily viewed by 
> users. For example:
> {code}
> INFO yarn.Client: 
>client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
>diagnostics: N/A
>ApplicationMaster host: X.X.X.X
>ApplicationMaster RPC port: 0
>queue: default
>start time: 1602782566027
>final status: UNDEFINED
>tracking URL: http://hostname:/proxy/application_/
>user: xkrogen
> {code}
> Typically, the tracking URL can be used to find the logs of the 
> ApplicationMaster/driver while the application is running. Later, the Spark 
> History Server can be used to track this information down, using the 
> stdout/stderr links on the Executors page.
> However, in the situation when the driver crashed _before_ writing out a 
> history file, the SHS may not be aware of this application, and thus does not 
> contain links to the driver logs. When this situation arises, it can be 
> difficult for users to debug further, since they can't easily find their 
> driver logs.
> It is possible to reach the logs by using the {{yarn logs}} commands, but the 
> average Spark user isn't aware of this and shouldn't have to be.
> I propose adding, alongside the application report, some additional lines 
> like:
> {code}
>  Driver Logs (stdout): 
> http://hostname:8042/node/containerlogs/container_/xkrogen/stdout?start=-4096
>  Driver Logs (stderr): 
> http://hostname:8042/node/containerlogs/container_/xkrogen/stderr?start=-4096
> {code}
> With this information available, users can quickly jump to their driver logs, 
> even if it crashed before the SHS became aware of the application. This has 
> the additional benefit of providing a quick way to access driver logs, which 
> often contain useful information, in a single click (instead of navigating 
> through the Spark UI).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33185) YARN: Print direct links to driver logs alongside application report in cluster mode

2020-11-20 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236464#comment-17236464
 ] 

Apache Spark commented on SPARK-33185:
--

User 'xkrogen' has created a pull request for this issue:
https://github.com/apache/spark/pull/30450

> YARN: Print direct links to driver logs alongside application report in 
> cluster mode
> 
>
> Key: SPARK-33185
> URL: https://issues.apache.org/jira/browse/SPARK-33185
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.1.0
>
>
> Currently when run in {{cluster}} mode on YARN, the Spark {{yarn.Client}} 
> will print out the application report into the logs, to be easily viewed by 
> users. For example:
> {code}
> INFO yarn.Client: 
>client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
>diagnostics: N/A
>ApplicationMaster host: X.X.X.X
>ApplicationMaster RPC port: 0
>queue: default
>start time: 1602782566027
>final status: UNDEFINED
>tracking URL: http://hostname:/proxy/application_/
>user: xkrogen
> {code}
> Typically, the tracking URL can be used to find the logs of the 
> ApplicationMaster/driver while the application is running. Later, the Spark 
> History Server can be used to track this information down, using the 
> stdout/stderr links on the Executors page.
> However, in the situation when the driver crashed _before_ writing out a 
> history file, the SHS may not be aware of this application, and thus does not 
> contain links to the driver logs. When this situation arises, it can be 
> difficult for users to debug further, since they can't easily find their 
> driver logs.
> It is possible to reach the logs by using the {{yarn logs}} commands, but the 
> average Spark user isn't aware of this and shouldn't have to be.
> I propose adding, alongside the application report, some additional lines 
> like:
> {code}
>  Driver Logs (stdout): 
> http://hostname:8042/node/containerlogs/container_/xkrogen/stdout?start=-4096
>  Driver Logs (stderr): 
> http://hostname:8042/node/containerlogs/container_/xkrogen/stderr?start=-4096
> {code}
> With this information available, users can quickly jump to their driver logs, 
> even if it crashed before the SHS became aware of the application. This has 
> the additional benefit of providing a quick way to access driver logs, which 
> often contain useful information, in a single click (instead of navigating 
> through the Spark UI).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33185) YARN: Print direct links to driver logs alongside application report in cluster mode

2020-11-20 Thread Erik Krogen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236466#comment-17236466
 ] 

Erik Krogen commented on SPARK-33185:
-

I found that the existing logic doesn't work properly in restricted-credentials 
environments where delegation tokens are used for passing credentials, instead 
of a fully-fledged Kerberos credential. I put up a new PR to address this issue.

> YARN: Print direct links to driver logs alongside application report in 
> cluster mode
> 
>
> Key: SPARK-33185
> URL: https://issues.apache.org/jira/browse/SPARK-33185
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.1.0
>
>
> Currently when run in {{cluster}} mode on YARN, the Spark {{yarn.Client}} 
> will print out the application report into the logs, to be easily viewed by 
> users. For example:
> {code}
> INFO yarn.Client: 
>client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
>diagnostics: N/A
>ApplicationMaster host: X.X.X.X
>ApplicationMaster RPC port: 0
>queue: default
>start time: 1602782566027
>final status: UNDEFINED
>tracking URL: http://hostname:/proxy/application_/
>user: xkrogen
> {code}
> Typically, the tracking URL can be used to find the logs of the 
> ApplicationMaster/driver while the application is running. Later, the Spark 
> History Server can be used to track this information down, using the 
> stdout/stderr links on the Executors page.
> However, in the situation when the driver crashed _before_ writing out a 
> history file, the SHS may not be aware of this application, and thus does not 
> contain links to the driver logs. When this situation arises, it can be 
> difficult for users to debug further, since they can't easily find their 
> driver logs.
> It is possible to reach the logs by using the {{yarn logs}} commands, but the 
> average Spark user isn't aware of this and shouldn't have to be.
> I propose adding, alongside the application report, some additional lines 
> like:
> {code}
>  Driver Logs (stdout): 
> http://hostname:8042/node/containerlogs/container_/xkrogen/stdout?start=-4096
>  Driver Logs (stderr): 
> http://hostname:8042/node/containerlogs/container_/xkrogen/stderr?start=-4096
> {code}
> With this information available, users can quickly jump to their driver logs, 
> even if it crashed before the SHS became aware of the application. This has 
> the additional benefit of providing a quick way to access driver logs, which 
> often contain useful information, in a single click (instead of navigating 
> through the Spark UI).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33185) YARN: Print direct links to driver logs alongside application report in cluster mode

2020-11-20 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236465#comment-17236465
 ] 

Apache Spark commented on SPARK-33185:
--

User 'xkrogen' has created a pull request for this issue:
https://github.com/apache/spark/pull/30450

> YARN: Print direct links to driver logs alongside application report in 
> cluster mode
> 
>
> Key: SPARK-33185
> URL: https://issues.apache.org/jira/browse/SPARK-33185
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 3.0.1
>Reporter: Erik Krogen
>Assignee: Erik Krogen
>Priority: Major
> Fix For: 3.1.0
>
>
> Currently when run in {{cluster}} mode on YARN, the Spark {{yarn.Client}} 
> will print out the application report into the logs, to be easily viewed by 
> users. For example:
> {code}
> INFO yarn.Client: 
>client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
>diagnostics: N/A
>ApplicationMaster host: X.X.X.X
>ApplicationMaster RPC port: 0
>queue: default
>start time: 1602782566027
>final status: UNDEFINED
>tracking URL: http://hostname:/proxy/application_/
>user: xkrogen
> {code}
> Typically, the tracking URL can be used to find the logs of the 
> ApplicationMaster/driver while the application is running. Later, the Spark 
> History Server can be used to track this information down, using the 
> stdout/stderr links on the Executors page.
> However, in the situation when the driver crashed _before_ writing out a 
> history file, the SHS may not be aware of this application, and thus does not 
> contain links to the driver logs. When this situation arises, it can be 
> difficult for users to debug further, since they can't easily find their 
> driver logs.
> It is possible to reach the logs by using the {{yarn logs}} commands, but the 
> average Spark user isn't aware of this and shouldn't have to be.
> I propose adding, alongside the application report, some additional lines 
> like:
> {code}
>  Driver Logs (stdout): 
> http://hostname:8042/node/containerlogs/container_/xkrogen/stdout?start=-4096
>  Driver Logs (stderr): 
> http://hostname:8042/node/containerlogs/container_/xkrogen/stderr?start=-4096
> {code}
> With this information available, users can quickly jump to their driver logs, 
> even if it crashed before the SHS became aware of the application. This has 
> the additional benefit of providing a quick way to access driver logs, which 
> often contain useful information, in a single click (instead of navigating 
> through the Spark UI).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33502) Large number of SELECT columns causes StackOverflowError

2020-11-20 Thread Arwin S Tio (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arwin S Tio updated SPARK-33502:

Description: 
On Spark 2.4.7 Standalone Mode on my laptop (Macbook Pro 2015), I ran the 
following:
{code:java}
public class TestSparkStackOverflow {
  public static void main(String [] args) {
SparkSession spark = SparkSession
  .builder()
  .config("spark.master", "local[8]")
  .appName(TestSparkStackOverflow.class.getSimpleName())
  .getOrCreate();

StructType inputSchema = new StructType();
inputSchema = inputSchema.add("foo", DataTypes.StringType);

Dataset inputDf = spark.createDataFrame(
  Arrays.asList(
RowFactory.create("1"),
RowFactory.create("2"),
RowFactory.create("3")
  ),
  inputSchema
);
 
List lotsOfColumns = new ArrayList<>();
for (int i = 0; i < 3000; i++) {
  lotsOfColumns.add(lit("").as("field" + i).cast(DataTypes.StringType));
}
lotsOfColumns.add(new Column("foo"));

inputDf
  
.select(JavaConverters.collectionAsScalaIterableConverter(lotsOfColumns).asScala().toSeq())
  .write()
  .format("csv")
  .mode(SaveMode.Append)
  .save("file:///tmp/testoutput");
  }
}
 {code}
 

And I get a StackOverflowError:
{code:java}
Exception in thread "main" org.apache.spark.SparkException: Job 
aborted.Exception in thread "main" org.apache.spark.SparkException: Job 
aborted. at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
 at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) 
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
 at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
 at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696) 
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305) 
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291) at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249) at 
udp.task.TestSparkStackOverflow.main(TestSparkStackOverflow.java:52)Caused by: 
java.lang.StackOverflowError at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1522) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
... redacted {code}
 

The StackOverflowError goes away at around 500 columns.

[jira] [Commented] (SPARK-33502) Large number of SELECT columns causes StackOverflowError

2020-11-20 Thread Arwin S Tio (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236434#comment-17236434
 ] 

Arwin S Tio commented on SPARK-33502:
-

Note, running my program with "-Xss3072k" fixed it

> Large number of SELECT columns causes StackOverflowError
> 
>
> Key: SPARK-33502
> URL: https://issues.apache.org/jira/browse/SPARK-33502
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7
>Reporter: Arwin S Tio
>Priority: Minor
>
> On Spark 2.4.7 Standalone Mode on my laptop (Macbook Pro 2015), I ran the 
> following:
> {code:java}
> public class TestSparkStackOverflow {
>   public static void main(String [] args) {
> SparkSession spark = SparkSession
>   .builder()
>   .config("spark.master", "local[8]")
>   .appName(TestSparkStackOverflow.class.getSimpleName())
>   .getOrCreate();StructType inputSchema = new StructType();
> inputSchema = inputSchema.add("foo", DataTypes.StringType);
> Dataset inputDf = spark.createDataFrame(
>   Arrays.asList(
> RowFactory.create("1"),
> RowFactory.create("2"),
> RowFactory.create("3")
>   ),
>   inputSchema
> );
>  
> List lotsOfColumns = new ArrayList<>();
> for (int i = 0; i < 3000; i++) {
>   lotsOfColumns.add(lit("").as("field" + i).cast(DataTypes.StringType));
> }
> lotsOfColumns.add(new Column("foo"));
> inputDf
>   
> .select(JavaConverters.collectionAsScalaIterableConverter(lotsOfColumns).asScala().toSeq())
>   .write()
>   .format("csv")
>   .mode(SaveMode.Append)
>   .save("file:///tmp/testoutput");
>   }
> }
>  {code}
>  
> And I get a StackOverflowError:
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: Job 
> aborted.Exception in thread "main" org.apache.spark.SparkException: Job 
> aborted. at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) 
> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
>  at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
>  at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696) at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305)
>  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291) at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249) at 
> udp.task.TestSparkStackOverflow.main(TestSparkStackOverflow.java:52)Caused 
> by: java.lang.StackOverflowError at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1522) 
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) 
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) 
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) 
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) 
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) 
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
>

[jira] [Updated] (SPARK-33502) Large number of SELECT columns causes StackOverflowError

2020-11-20 Thread Arwin S Tio (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arwin S Tio updated SPARK-33502:

Description: 
On Spark 2.4.7 Standalone Mode on my laptop (Macbook Pro 2015), I ran the 
following:
{code:java}
public class TestSparkStackOverflow {
  public static void main(String [] args) {
SparkSession spark = SparkSession
  .builder()
  .config("spark.master", "local[8]")
  .appName(TestSparkStackOverflow.class.getSimpleName())
  .getOrCreate();StructType inputSchema = new StructType();

inputSchema = inputSchema.add("foo", DataTypes.StringType);Dataset 
inputDf = spark.createDataFrame(
  Arrays.asList(
RowFactory.create("1"),
RowFactory.create("2"),
RowFactory.create("3")
  ),
  inputSchema
);
 
List lotsOfColumns = new ArrayList<>();
for (int i = 0; i < 3000; i++) {
  lotsOfColumns.add(lit("").as("field" + i).cast(DataTypes.StringType));
}
lotsOfColumns.add(new Column("foo"));

inputDf
  
.select(JavaConverters.collectionAsScalaIterableConverter(lotsOfColumns).asScala().toSeq())
  .write()
  .format("csv")
  .mode(SaveMode.Append)
  .save("file:///tmp/testoutput");
  }
}
 {code}
 

And I get a StackOverflowError:
{code:java}
Exception in thread "main" org.apache.spark.SparkException: Job 
aborted.Exception in thread "main" org.apache.spark.SparkException: Job 
aborted. at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
 at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) 
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
 at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
 at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696) 
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305) 
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291) at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249) at 
udp.task.TestSparkStackOverflow.main(TestSparkStackOverflow.java:52)Caused by: 
java.lang.StackOverflowError at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1522) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
... redacted {code}
 

The StackOverflowError goes away at around 500 columns.

 

When

[jira] [Updated] (SPARK-33502) Large number of SELECT columns causes StackOverflowError

2020-11-20 Thread Arwin S Tio (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arwin S Tio updated SPARK-33502:

Description: 
On Spark 2.4.7 Standalone Mode on my laptop (Macbook Pro 2015), I ran the 
following:
{code:java}
public class TestSparkStackOverflow {
  public static void main(String [] args) {
SparkSession spark = SparkSession
  .builder()
  .config("spark.master", "local[8]")
  .appName(TestSparkStackOverflow.class.getSimpleName())
  .getOrCreate();StructType inputSchema = new StructType();

inputSchema = inputSchema.add("foo", DataTypes.StringType);Dataset 
inputDf = spark.createDataFrame(
  Arrays.asList(
RowFactory.create("1"),
RowFactory.create("2"),
RowFactory.create("3")
  ),
  inputSchema
);
 
List lotsOfColumns = new ArrayList<>();
for (int i = 0; i < 3000; i++) {
  lotsOfColumns.add(lit("").as("field" + i).cast(DataTypes.StringType));
}

inputDf
  
.select(JavaConverters.collectionAsScalaIterableConverter(lotsOfColumns).asScala().toSeq())
  .write()
  .format("csv")
  .mode(SaveMode.Append)
  .save("file:///tmp/testoutput");
  }
}
 {code}
 

And I get a StackOverflowError:
{code:java}
Exception in thread "main" org.apache.spark.SparkException: Job 
aborted.Exception in thread "main" org.apache.spark.SparkException: Job 
aborted. at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
 at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) 
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
 at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
 at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696) 
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305) 
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291) at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249) at 
udp.task.TestSparkStackOverflow.main(TestSparkStackOverflow.java:52)Caused by: 
java.lang.StackOverflowError at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1522) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
... redacted {code}
 

The StackOverflowError goes away at around 500 columns.

 

When running it through the debugger, I found that it happens when

[jira] [Resolved] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2020-11-20 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler resolved SPARK-21187.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

With MapType now added, all basic types are supported. I changed nested 
timestamps/dates to a separate issue and I think we can resolve this now.

> Complete support for remaining Spark data types in Arrow Converters
> ---
>
> Key: SPARK-21187
> URL: https://issues.apache.org/jira/browse/SPARK-21187
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
> Fix For: 3.1.0
>
>
> This is to track adding the remaining type support in Arrow Converters. 
> Currently, only primitive data types are supported. '
> Remaining types:
>  * -*Date*-
>  * -*Timestamp*-
>  * *Complex*: -Struct-, -Array-, -Map-
>  * -*Decimal*-
>  * -*Binary*-
>  * -*Categorical*- when converting from Pandas
> Some things to do before closing this out:
>  * -Look to upgrading to Arrow 0.7 for better Decimal support (can now write 
> values as BigDecimal)-
>  * -Need to add some user docs-
>  * -Make sure Python tests are thorough-
>  * Check into complex type support mentioned in comments by [~leif], should 
> we support mulit-indexing?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2020-11-20 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated SPARK-21187:
-
Description: 
This is to track adding the remaining type support in Arrow Converters. 
Currently, only primitive data types are supported. '

Remaining types:
 * -*Date*-
 * -*Timestamp*-
 * *Complex*: -Struct-, -Array-, -Map-
 * -*Decimal*-
 * -*Binary*-
 * -*Categorical*- when converting from Pandas

Some things to do before closing this out:
 * -Look to upgrading to Arrow 0.7 for better Decimal support (can now write 
values as BigDecimal)-
 * -Need to add some user docs-
 * -Make sure Python tests are thorough-
 * Check into complex type support mentioned in comments by [~leif], should we 
support mulit-indexing?

  was:
This is to track adding the remaining type support in Arrow Converters. 
Currently, only primitive data types are supported. '

Remaining types:
 * -*Date*-
 * -*Timestamp*-
 * *Complex*: Struct, -Array-, Arrays of Date/Timestamps, Map
 * -*Decimal*-
 * -*Binary*-
 * -*Categorical*- when converting from Pandas

Some things to do before closing this out:
 * -Look to upgrading to Arrow 0.7 for better Decimal support (can now write 
values as BigDecimal)-
 * -Need to add some user docs-
 * -Make sure Python tests are thorough-
 * Check into complex type support mentioned in comments by [~leif], should we 
support mulit-indexing?


> Complete support for remaining Spark data types in Arrow Converters
> ---
>
> Key: SPARK-21187
> URL: https://issues.apache.org/jira/browse/SPARK-21187
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Bryan Cutler
>Priority: Major
>
> This is to track adding the remaining type support in Arrow Converters. 
> Currently, only primitive data types are supported. '
> Remaining types:
>  * -*Date*-
>  * -*Timestamp*-
>  * *Complex*: -Struct-, -Array-, -Map-
>  * -*Decimal*-
>  * -*Binary*-
>  * -*Categorical*- when converting from Pandas
> Some things to do before closing this out:
>  * -Look to upgrading to Arrow 0.7 for better Decimal support (can now write 
> values as BigDecimal)-
>  * -Need to add some user docs-
>  * -Make sure Python tests are thorough-
>  * Check into complex type support mentioned in comments by [~leif], should 
> we support mulit-indexing?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32285) Add PySpark support for nested timestamps with arrow

2020-11-20 Thread Bryan Cutler (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Cutler updated SPARK-32285:
-
Parent: (was: SPARK-21187)
Issue Type: Improvement  (was: Sub-task)

> Add PySpark support for nested timestamps with arrow
> 
>
> Key: SPARK-32285
> URL: https://issues.apache.org/jira/browse/SPARK-32285
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Bryan Cutler
>Priority: Major
>
> Currently with arrow optimizations, there is post-processing done in pandas 
> for timestamp columns to localize timezone. This is not done for nested 
> columns with timestamps such as StructType or ArrayType.
> Adding support for this is needed for Apache Arrow 1.0.0 upgrade due to use 
> of structs with timestamps in groupedby key over a window.
> As a simple first step, timestamps with 1 level nesting could be done first 
> and this will satisfy the immediate need.
> NOTE: with Arrow 1.0.0, it might be possible to do the timezone processing 
> with pyarrow.array.cast, which could be easier done than in pandas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33466) Imputer support mode(most_frequent) strategy

2020-11-20 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-33466.
--
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30397
[https://github.com/apache/spark/pull/30397]

> Imputer support mode(most_frequent) strategy
> 
>
> Key: SPARK-33466
> URL: https://issues.apache.org/jira/browse/SPARK-33466
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
> Fix For: 3.2.0
>
>
> [sklearn.impute.SimpleImputer|https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer]
>  supports *most_frequent(mode)*, which replace missing using the most 
> frequent value along each column.
> It should be easy to implement it in MLlib.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33505) Fix insert into `InMemoryPartitionTable`

2020-11-20 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236389#comment-17236389
 ] 

Apache Spark commented on SPARK-33505:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30449

> Fix insert into `InMemoryPartitionTable`
> 
>
> Key: SPARK-33505
> URL: https://issues.apache.org/jira/browse/SPARK-33505
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Currently, INSERT INTO a partitioned table in V2 in-memory catalog doesn't 
> create partitions. The example below demonstrates the issue:
> {code:scala}
>   test("insert into partitioned table") {
> val t = "testpart.ns1.ns2.tbl"
> withTable(t) {
>   spark.sql(
> s"""
>|CREATE TABLE $t (id bigint, name string, data string)
>|USING foo
>|PARTITIONED BY (id, name)""".stripMargin)
>   spark.sql(s"INSERT INTO $t PARTITION(id = 1, name = 'Max') SELECT 
> 'abc'")
>   val partTable = catalog("testpart").asTableCatalog
> .loadTable(Identifier.of(Array("ns1", "ns2"), 
> "tbl")).asInstanceOf[InMemoryPartitionTable]
>   assert(partTable.partitionExists(InternalRow.fromSeq(Seq(1, 
> UTF8String.fromString("Max")
> }
>   }
> {code}
> The partitionExists() function return false for the partitions that must be 
> created.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33505) Fix insert into `InMemoryPartitionTable`

2020-11-20 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33505:


Assignee: (was: Apache Spark)

> Fix insert into `InMemoryPartitionTable`
> 
>
> Key: SPARK-33505
> URL: https://issues.apache.org/jira/browse/SPARK-33505
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Currently, INSERT INTO a partitioned table in V2 in-memory catalog doesn't 
> create partitions. The example below demonstrates the issue:
> {code:scala}
>   test("insert into partitioned table") {
> val t = "testpart.ns1.ns2.tbl"
> withTable(t) {
>   spark.sql(
> s"""
>|CREATE TABLE $t (id bigint, name string, data string)
>|USING foo
>|PARTITIONED BY (id, name)""".stripMargin)
>   spark.sql(s"INSERT INTO $t PARTITION(id = 1, name = 'Max') SELECT 
> 'abc'")
>   val partTable = catalog("testpart").asTableCatalog
> .loadTable(Identifier.of(Array("ns1", "ns2"), 
> "tbl")).asInstanceOf[InMemoryPartitionTable]
>   assert(partTable.partitionExists(InternalRow.fromSeq(Seq(1, 
> UTF8String.fromString("Max")
> }
>   }
> {code}
> The partitionExists() function return false for the partitions that must be 
> created.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33505) Fix insert into `InMemoryPartitionTable`

2020-11-20 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33505:


Assignee: Apache Spark

> Fix insert into `InMemoryPartitionTable`
> 
>
> Key: SPARK-33505
> URL: https://issues.apache.org/jira/browse/SPARK-33505
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> Currently, INSERT INTO a partitioned table in V2 in-memory catalog doesn't 
> create partitions. The example below demonstrates the issue:
> {code:scala}
>   test("insert into partitioned table") {
> val t = "testpart.ns1.ns2.tbl"
> withTable(t) {
>   spark.sql(
> s"""
>|CREATE TABLE $t (id bigint, name string, data string)
>|USING foo
>|PARTITIONED BY (id, name)""".stripMargin)
>   spark.sql(s"INSERT INTO $t PARTITION(id = 1, name = 'Max') SELECT 
> 'abc'")
>   val partTable = catalog("testpart").asTableCatalog
> .loadTable(Identifier.of(Array("ns1", "ns2"), 
> "tbl")).asInstanceOf[InMemoryPartitionTable]
>   assert(partTable.partitionExists(InternalRow.fromSeq(Seq(1, 
> UTF8String.fromString("Max")
> }
>   }
> {code}
> The partitionExists() function return false for the partitions that must be 
> created.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33505) Fix insert into `InMemoryPartitionTable`

2020-11-20 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-33505:
--

 Summary: Fix insert into `InMemoryPartitionTable`
 Key: SPARK-33505
 URL: https://issues.apache.org/jira/browse/SPARK-33505
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


Currently, INSERT INTO a partitioned table in V2 in-memory catalog doesn't 
create partitions. The example below demonstrates the issue:

{code:scala}
  test("insert into partitioned table") {
val t = "testpart.ns1.ns2.tbl"
withTable(t) {
  spark.sql(
s"""
   |CREATE TABLE $t (id bigint, name string, data string)
   |USING foo
   |PARTITIONED BY (id, name)""".stripMargin)
  spark.sql(s"INSERT INTO $t PARTITION(id = 1, name = 'Max') SELECT 'abc'")

  val partTable = catalog("testpart").asTableCatalog
.loadTable(Identifier.of(Array("ns1", "ns2"), 
"tbl")).asInstanceOf[InMemoryPartitionTable]
  assert(partTable.partitionExists(InternalRow.fromSeq(Seq(1, 
UTF8String.fromString("Max")
}
  }
{code}

The partitionExists() function return false for the partitions that must be 
created.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33472) IllegalArgumentException when applying RemoveRedundantSorts before EnsureRequirements

2020-11-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33472:
--
Fix Version/s: 2.4.8

> IllegalArgumentException when applying RemoveRedundantSorts before 
> EnsureRequirements
> -
>
> Key: SPARK-33472
> URL: https://issues.apache.org/jira/browse/SPARK-33472
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 2.4.8, 3.0.2, 3.1.0
>
>
> `RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check 
> whether a sort node is redundant. Currently, it is added before 
> `EnsureRequirements`. Since `PartitioningCollection` requires left and right 
> partitioning to have the same number of partitions, which is not necessarily 
> true before applying `EnsureRequirements`, the rule can fail with the 
> following exception:
> {{IllegalArgumentException: requirement failed: PartitioningCollection 
> requires all of its partitionings have the same numPartitions.}}
> We should switch the order between these two rules to satisfy the requirement 
> when instantiating `PartitioningCollection`.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33472) IllegalArgumentException when applying RemoveRedundantSorts before EnsureRequirements

2020-11-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33472.
---
Fix Version/s: 3.1.0
   3.0.2
 Assignee: Allison Wang
   Resolution: Fixed

> IllegalArgumentException when applying RemoveRedundantSorts before 
> EnsureRequirements
> -
>
> Key: SPARK-33472
> URL: https://issues.apache.org/jira/browse/SPARK-33472
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.8, 3.0.2, 3.1.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
> Fix For: 3.0.2, 3.1.0
>
>
> `RemoveRedundantSorts` rule uses SparkPlan's `outputPartitioning` to check 
> whether a sort node is redundant. Currently, it is added before 
> `EnsureRequirements`. Since `PartitioningCollection` requires left and right 
> partitioning to have the same number of partitions, which is not necessarily 
> true before applying `EnsureRequirements`, the rule can fail with the 
> following exception:
> {{IllegalArgumentException: requirement failed: PartitioningCollection 
> requires all of its partitionings have the same numPartitions.}}
> We should switch the order between these two rules to satisfy the requirement 
> when instantiating `PartitioningCollection`.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33466) Imputer support mode(most_frequent) strategy

2020-11-20 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-33466:


Assignee: zhengruifeng

> Imputer support mode(most_frequent) strategy
> 
>
> Key: SPARK-33466
> URL: https://issues.apache.org/jira/browse/SPARK-33466
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
>
> [sklearn.impute.SimpleImputer|https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer]
>  supports *most_frequent(mode)*, which replace missing using the most 
> frequent value along each column.
> It should be easy to implement it in MLlib.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32381) Expose the ability for users to use parallel file & avoid location information discovery in RDDs

2020-11-20 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236311#comment-17236311
 ] 

Apache Spark commented on SPARK-32381:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/30447

> Expose the ability for users to use parallel file & avoid location 
> information discovery in RDDs
> 
>
> Key: SPARK-32381
> URL: https://issues.apache.org/jira/browse/SPARK-32381
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.1.0
>
>
> We already have this in SQL so it's mostly a matter of re-organizing the code 
> a bit and agreeing on how to best expose this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-28704) Test backward compatibility on JDK9+ once we have a version supports JDK9+

2020-11-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28704:
-

Assignee: angerszhu

> Test backward compatibility on JDK9+ once we have a version supports JDK9+
> --
>
> Key: SPARK-28704
> URL: https://issues.apache.org/jira/browse/SPARK-28704
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: angerszhu
>Priority: Major
>
> We skip test HiveExternalCatalogVersionsSuite when testing with JAVA_9 or 
> later because our previous version does not support JAVA_9 or later. We 
> should add it back once we have a version supports JAVA_9 or later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28704) Test backward compatibility on JDK9+ once we have a version supports JDK9+

2020-11-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28704:
--
Fix Version/s: (was: 3.2.0)
   3.1.0

> Test backward compatibility on JDK9+ once we have a version supports JDK9+
> --
>
> Key: SPARK-28704
> URL: https://issues.apache.org/jira/browse/SPARK-28704
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.1.0
>
>
> We skip test HiveExternalCatalogVersionsSuite when testing with JAVA_9 or 
> later because our previous version does not support JAVA_9 or later. We 
> should add it back once we have a version supports JAVA_9 or later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-28704) Test backward compatibility on JDK9+ once we have a version supports JDK9+

2020-11-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28704.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

Issue resolved by pull request 30428
[https://github.com/apache/spark/pull/30428]

> Test backward compatibility on JDK9+ once we have a version supports JDK9+
> --
>
> Key: SPARK-28704
> URL: https://issues.apache.org/jira/browse/SPARK-28704
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.0
>
>
> We skip test HiveExternalCatalogVersionsSuite when testing with JAVA_9 or 
> later because our previous version does not support JAVA_9 or later. We 
> should add it back once we have a version supports JAVA_9 or later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33504) The application log in the Spark history server contains sensitive attributes such as password that should be redated instead of plain text

2020-11-20 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236261#comment-17236261
 ] 

Apache Spark commented on SPARK-33504:
--

User 'akiyamaneko' has created a pull request for this issue:
https://github.com/apache/spark/pull/30446

> The application log in the Spark history server contains sensitive attributes 
> such as password that should be redated instead of plain text
> ---
>
> Key: SPARK-33504
> URL: https://issues.apache.org/jira/browse/SPARK-33504
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
> Environment: Spark 3.0.1
>Reporter: akiyamaneko
>Priority: Major
> Attachments: SparkListenerEnvironmentUpdate log shows ok.png, 
> SparkListenerStageSubmitted-log-wrong.png, SparkListernerJobStart-wrong.png
>
>
> We found the secure attributes in SparkListenerJobStart and 
> SparkListenerStageSubmitted events would not been redated, resulting in 
> sensitive attributes can be viewd directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33504) The application log in the Spark history server contains sensitive attributes such as password that should be redated instead of plain text

2020-11-20 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33504:


Assignee: Apache Spark

> The application log in the Spark history server contains sensitive attributes 
> such as password that should be redated instead of plain text
> ---
>
> Key: SPARK-33504
> URL: https://issues.apache.org/jira/browse/SPARK-33504
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
> Environment: Spark 3.0.1
>Reporter: akiyamaneko
>Assignee: Apache Spark
>Priority: Major
> Attachments: SparkListenerEnvironmentUpdate log shows ok.png, 
> SparkListenerStageSubmitted-log-wrong.png, SparkListernerJobStart-wrong.png
>
>
> We found the secure attributes in SparkListenerJobStart and 
> SparkListenerStageSubmitted events would not been redated, resulting in 
> sensitive attributes can be viewd directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33504) The application log in the Spark history server contains sensitive attributes such as password that should be redated instead of plain text

2020-11-20 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33504:


Assignee: (was: Apache Spark)

> The application log in the Spark history server contains sensitive attributes 
> such as password that should be redated instead of plain text
> ---
>
> Key: SPARK-33504
> URL: https://issues.apache.org/jira/browse/SPARK-33504
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
> Environment: Spark 3.0.1
>Reporter: akiyamaneko
>Priority: Major
> Attachments: SparkListenerEnvironmentUpdate log shows ok.png, 
> SparkListenerStageSubmitted-log-wrong.png, SparkListernerJobStart-wrong.png
>
>
> We found the secure attributes in SparkListenerJobStart and 
> SparkListenerStageSubmitted events would not been redated, resulting in 
> sensitive attributes can be viewd directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33504) The application log in the Spark history server contains sensitive attributes such as password that should be redated instead of plain text

2020-11-20 Thread akiyamaneko (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

akiyamaneko updated SPARK-33504:

Attachment: SparkListernerJobStart-wrong.png

> The application log in the Spark history server contains sensitive attributes 
> such as password that should be redated instead of plain text
> ---
>
> Key: SPARK-33504
> URL: https://issues.apache.org/jira/browse/SPARK-33504
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
> Environment: Spark 3.0.1
>Reporter: akiyamaneko
>Priority: Major
> Attachments: SparkListenerEnvironmentUpdate log shows ok.png, 
> SparkListenerStageSubmitted-log-wrong.png, SparkListernerJobStart-wrong.png
>
>
> We found the secure attributes in SparkListenerJobStart and 
> SparkListenerStageSubmitted events would not been redated, resulting in 
> sensitive attributes can be viewd directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33504) The application log in the Spark history server contains sensitive attributes such as password that should be redated instead of plain text

2020-11-20 Thread akiyamaneko (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

akiyamaneko updated SPARK-33504:

Attachment: SparkListenerStageSubmitted-log-wrong.png

> The application log in the Spark history server contains sensitive attributes 
> such as password that should be redated instead of plain text
> ---
>
> Key: SPARK-33504
> URL: https://issues.apache.org/jira/browse/SPARK-33504
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
> Environment: Spark 3.0.1
>Reporter: akiyamaneko
>Priority: Major
> Attachments: SparkListenerEnvironmentUpdate log shows ok.png, 
> SparkListenerStageSubmitted-log-wrong.png, SparkListernerJobStart-wrong.png
>
>
> We found the secure attributes in SparkListenerJobStart and 
> SparkListenerStageSubmitted events would not been redated, resulting in 
> sensitive attributes can be viewd directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33504) The application log in the Spark history server contains sensitive attributes such as password that should be redated instead of plain text

2020-11-20 Thread akiyamaneko (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

akiyamaneko updated SPARK-33504:

Attachment: SparkListenerEnvironmentUpdate log shows ok.png

> The application log in the Spark history server contains sensitive attributes 
> such as password that should be redated instead of plain text
> ---
>
> Key: SPARK-33504
> URL: https://issues.apache.org/jira/browse/SPARK-33504
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
> Environment: Spark 3.0.1
>Reporter: akiyamaneko
>Priority: Major
> Attachments: SparkListenerEnvironmentUpdate log shows ok.png, 
> SparkListenerStageSubmitted-log-wrong.png, SparkListernerJobStart-wrong.png
>
>
> We found the secure attributes in SparkListenerJobStart and 
> SparkListenerStageSubmitted events would not been redated, resulting in 
> sensitive attributes can be viewd directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33504) The application log in the Spark history server contains sensitive attributes such as password that should be redated instead of plain text

2020-11-20 Thread akiyamaneko (Jira)

akiyamaneko created SPARK-33504:
---

 Summary: The application log in the Spark history server contains 
sensitive attributes such as password that should be redated instead of plain 
text
 Key: SPARK-33504
 URL: https://issues.apache.org/jira/browse/SPARK-33504
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.1
 Environment: Spark 3.0.1
Reporter: akiyamaneko


We found the secure attributes in SparkListenerJobStart and 
SparkListenerStageSubmitted events would not been redated, resulting in 
sensitive attributes can be viewd directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33468) ParseUrl should fail if input string is not a valid url

2020-11-20 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33468:
---

Assignee: ulysses you

> ParseUrl should fail if input string is not a valid url
> ---
>
> Key: SPARK-33468
> URL: https://issues.apache.org/jira/browse/SPARK-33468
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Assignee: ulysses you
>Priority: Minor
>
> ParseUrl should fail if input string is not a valid url.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33468) ParseUrl should fail if input string is not a valid url

2020-11-20 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33468.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30399
[https://github.com/apache/spark/pull/30399]

> ParseUrl should fail if input string is not a valid url
> ---
>
> Key: SPARK-33468
> URL: https://issues.apache.org/jira/browse/SPARK-33468
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Assignee: ulysses you
>Priority: Minor
> Fix For: 3.1.0
>
>
> ParseUrl should fail if input string is not a valid url.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33422) Incomplete menu item display in documention

2020-11-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-33422:


Assignee: liucht-inspur

> Incomplete menu item display in documention
> ---
>
> Key: SPARK-33422
> URL: https://issues.apache.org/jira/browse/SPARK-33422
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0, 3.0.1
>Reporter: liucht-inspur
>Assignee: liucht-inspur
>Priority: Minor
> Attachments: left-menu.jpg
>
>
> The bottom menu item cannot be displayed when the left menu tree is long
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-33422) Incomplete menu item display in documention

2020-11-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33422.
--
Fix Version/s: 3.2.0
   3.0.2
   Resolution: Fixed

Issue resolved by pull request 30335
[https://github.com/apache/spark/pull/30335]

> Incomplete menu item display in documention
> ---
>
> Key: SPARK-33422
> URL: https://issues.apache.org/jira/browse/SPARK-33422
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0, 3.0.1
>Reporter: liucht-inspur
>Assignee: liucht-inspur
>Priority: Minor
> Fix For: 3.0.2, 3.2.0
>
> Attachments: left-menu.jpg
>
>
> The bottom menu item cannot be displayed when the left menu tree is long
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33422) Incomplete menu item display in documention

2020-11-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33422:
-
Fix Version/s: (was: 3.2.0)
   3.1.0

> Incomplete menu item display in documention
> ---
>
> Key: SPARK-33422
> URL: https://issues.apache.org/jira/browse/SPARK-33422
> Project: Spark
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 3.0.0, 3.0.1
>Reporter: liucht-inspur
>Assignee: liucht-inspur
>Priority: Minor
> Fix For: 3.0.2, 3.1.0
>
> Attachments: left-menu.jpg
>
>
> The bottom menu item cannot be displayed when the left menu tree is long
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33503) Refactor SortOrder class to allow multiple childrens

2020-11-20 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236139#comment-17236139
 ] 

Apache Spark commented on SPARK-33503:
--

User 'prakharjain09' has created a pull request for this issue:
https://github.com/apache/spark/pull/30430

> Refactor SortOrder class to allow multiple childrens
> 
>
> Key: SPARK-33503
> URL: https://issues.apache.org/jira/browse/SPARK-33503
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Prakhar Jain
>Priority: Major
>
> Currently the SortOrder is a UnaryExpression with only one child. It contains 
> a field "sameOrderExpression" that needs some special handling as done in 
> [https://github.com/apache/spark/pull/30302] .
>  
> One of the suggestion in 
> [https://github.com/apache/spark/pull/30302#discussion_r526104333] is to make 
> sameOrderExpression as children of SortOrder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33503) Refactor SortOrder class to allow multiple childrens

2020-11-20 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33503:


Assignee: (was: Apache Spark)

> Refactor SortOrder class to allow multiple childrens
> 
>
> Key: SPARK-33503
> URL: https://issues.apache.org/jira/browse/SPARK-33503
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Prakhar Jain
>Priority: Major
>
> Currently the SortOrder is a UnaryExpression with only one child. It contains 
> a field "sameOrderExpression" that needs some special handling as done in 
> [https://github.com/apache/spark/pull/30302] .
>  
> One of the suggestion in 
> [https://github.com/apache/spark/pull/30302#discussion_r526104333] is to make 
> sameOrderExpression as children of SortOrder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33503) Refactor SortOrder class to allow multiple childrens

2020-11-20 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236138#comment-17236138
 ] 

Apache Spark commented on SPARK-33503:
--

User 'prakharjain09' has created a pull request for this issue:
https://github.com/apache/spark/pull/30430

> Refactor SortOrder class to allow multiple childrens
> 
>
> Key: SPARK-33503
> URL: https://issues.apache.org/jira/browse/SPARK-33503
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Prakhar Jain
>Priority: Major
>
> Currently the SortOrder is a UnaryExpression with only one child. It contains 
> a field "sameOrderExpression" that needs some special handling as done in 
> [https://github.com/apache/spark/pull/30302] .
>  
> One of the suggestion in 
> [https://github.com/apache/spark/pull/30302#discussion_r526104333] is to make 
> sameOrderExpression as children of SortOrder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33503) Refactor SortOrder class to allow multiple childrens

2020-11-20 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33503:


Assignee: Apache Spark

> Refactor SortOrder class to allow multiple childrens
> 
>
> Key: SPARK-33503
> URL: https://issues.apache.org/jira/browse/SPARK-33503
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Prakhar Jain
>Assignee: Apache Spark
>Priority: Major
>
> Currently the SortOrder is a UnaryExpression with only one child. It contains 
> a field "sameOrderExpression" that needs some special handling as done in 
> [https://github.com/apache/spark/pull/30302] .
>  
> One of the suggestion in 
> [https://github.com/apache/spark/pull/30302#discussion_r526104333] is to make 
> sameOrderExpression as children of SortOrder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33503) Refactor SortOrder class to allow multiple childrens

2020-11-20 Thread Prakhar Jain (Jira)

Prakhar Jain created SPARK-33503:


 Summary: Refactor SortOrder class to allow multiple childrens
 Key: SPARK-33503
 URL: https://issues.apache.org/jira/browse/SPARK-33503
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.1
Reporter: Prakhar Jain


Currently the SortOrder is a UnaryExpression with only one child. It contains a 
field "sameOrderExpression" that needs some special handling as done in 
[https://github.com/apache/spark/pull/30302] .

 

One of the suggestion in 
[https://github.com/apache/spark/pull/30302#discussion_r526104333] is to make 
sameOrderExpression as children of SortOrder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33502) Large number of SELECT columns causes StackOverflowError

2020-11-20 Thread Arwin S Tio (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arwin S Tio updated SPARK-33502:

Description: 
On Spark 2.4.7 Standalone Mode on my laptop (Macbook Pro 2015), I ran the 
following:
{code:java}
public class TestSparkStackOverflow {
  public static void main(String [] args) {
SparkSession spark = SparkSession
  .builder()
  .config("spark.master", "local[8]")
  .appName(TestSparkStackOverflow.class.getSimpleName())
  .getOrCreate();StructType inputSchema = new StructType();

inputSchema = inputSchema.add("foo", DataTypes.StringType);Dataset 
inputDf = spark.createDataFrame(
  Arrays.asList(
RowFactory.create("1"),
RowFactory.create("2"),
RowFactory.create("3")
  ),
  inputSchema
);
 
List lotsOfColumns = new ArrayList<>();
for (int i = 0; i < 3000; i++) {
  lotsOfColumns.add(lit("").as("field" + i).cast(DataTypes.StringType));
}

lotsOfColumns.add(new Column("foo"));inputDf
  
.select(JavaConverters.collectionAsScalaIterableConverter(lotsOfColumns).asScala().toSeq())
  .write()
  .format("csv")
  .mode(SaveMode.Append)
  .save("file:///tmp/testoutput");
  }
}
 {code}
 

And I get a StackOverflowError:
{code:java}
Exception in thread "main" org.apache.spark.SparkException: Job 
aborted.Exception in thread "main" org.apache.spark.SparkException: Job 
aborted. at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
 at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) 
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
 at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
 at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696) 
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305) 
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291) at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249) at 
udp.task.TestSparkStackOverflow.main(TestSparkStackOverflow.java:52)Caused by: 
java.lang.StackOverflowError at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1522) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
... redacted {code}
 

The StackOverflowError goes away at around 500 columns.

 

When

[jira] [Updated] (SPARK-33502) Large number of SELECT columns causes StackOverflowError

2020-11-20 Thread Arwin S Tio (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arwin S Tio updated SPARK-33502:

Description: 
On Spark 2.4.7 Standalone Mode on my laptop (Macbook Pro 2015), I ran the 
following:
{code:java}
public class TestSparkStackOverflow {
  public static void main(String [] args) {
SparkSession spark = SparkSession
  .builder()
  .config("spark.master", "local[8]")
  .appName(TestSparkStackOverflow.class.getSimpleName())
  .getOrCreate();StructType inputSchema = new StructType();

inputSchema = inputSchema.add("foo", DataTypes.StringType);Dataset 
inputDf = spark.createDataFrame(
  Arrays.asList(
RowFactory.create("1"),
RowFactory.create("2"),
RowFactory.create("3")
  ),
  inputSchema
);
 
List lotsOfColumns = new ArrayList<>();
for (int i = 0; i < 3000; i++) {
  lotsOfColumns.add(lit("").as("field" + i).cast(DataTypes.StringType));
}

lotsOfColumns.add(new Column("foo"));inputDf
  
.select(JavaConverters.collectionAsScalaIterableConverter(lotsOfColumns).asScala().toSeq())
  .write()
  .format("csv")
  .mode(SaveMode.Append)
  .save("file:///tmp/testoutput");
  }
}
 {code}
 

And I get a StackOverflowError:
{code:java}
Exception in thread "main" org.apache.spark.SparkException: Job 
aborted.Exception in thread "main" org.apache.spark.SparkException: Job 
aborted. at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
 at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) 
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
 at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
 at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696) 
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305) 
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291) at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249) at 
udp.task.TestSparkStackOverflow.main(TestSparkStackOverflow.java:52)Caused by: 
java.lang.StackOverflowError at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1522) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
... redacted {code}
 

The StackOverflowError goes away at around 500 columns

  was:
On

[jira] [Updated] (SPARK-33502) Large number of SELECT columns causes StackOverflowError

2020-11-20 Thread Arwin S Tio (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arwin S Tio updated SPARK-33502:

Description: 
On Spark 2.4.7 Standalone Mode on my laptop (Macbook Pro 2015), I ran the 
following:
{code:java}
public class TestSparkStackOverflow {
  public static void main(String [] args) {
SparkSession spark = SparkSession
  .builder()
  .config("spark.master", "local[8]")
  .appName(TestSparkStackOverflow.class.getSimpleName())
  .getOrCreate();StructType inputSchema = new StructType();

inputSchema = inputSchema.add("foo", DataTypes.StringType);Dataset 
inputDf = spark.createDataFrame(
  Arrays.asList(
RowFactory.create("1"),
RowFactory.create("2"),
RowFactory.create("3")
  ),
  inputSchema
);
 
List lotsOfColumns = new ArrayList<>();
for (int i = 0; i < 3000; i++) {
  lotsOfColumns.add(lit("").as("field" + i).cast(DataTypes.StringType));
}

lotsOfColumns.add(new Column("foo"));inputDf
  
.select(JavaConverters.collectionAsScalaIterableConverter(lotsOfColumns).asScala().toSeq())
  .write()
  .format("csv")
  .mode(SaveMode.Append)
  .save("file:///tmp/testoutput");
  }
}
 {code}
 

And I get a StackOverflowError:
{code:java}
Exception in thread "main" org.apache.spark.SparkException: Job 
aborted.Exception in thread "main" org.apache.spark.SparkException: Job 
aborted. at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
 at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) 
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
 at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
 at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696) 
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305) 
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291) at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249) at 
udp.task.ArwinTestSparkStackOverflow.main(ArwinTestSparkStackOverflow.java:52)Caused
 by: java.lang.StackOverflowError at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1522) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
... redacted {code}
 

The StackOverflowError goes away at around 500 columns

[jira] [Created] (SPARK-33502) Large number of SELECT columns causes StackOverflowError

2020-11-20 Thread Arwin S Tio (Jira)

Arwin S Tio created SPARK-33502:
---

 Summary: Large number of SELECT columns causes StackOverflowError
 Key: SPARK-33502
 URL: https://issues.apache.org/jira/browse/SPARK-33502
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.7
Reporter: Arwin S Tio


On Spark 2.4.7 Standalone Mode on my laptop (Macbook Pro 2015), I ran the 
following:


{code:java}
public class TestSparkStackOverflow {
  public static void main(String [] args) {
SparkSession spark = SparkSession
  .builder()
  .config("spark.master", "local[8]")
  .appName(TestSparkStackOverflow.class.getSimpleName())
  .getOrCreate();StructType inputSchema = new StructType();
inputSchema = inputSchema.add("foo", DataTypes.StringType);Dataset 
inputDf = spark.createDataFrame(
  Arrays.asList(
RowFactory.create("1"),
RowFactory.create("2"),
RowFactory.create("3")
  ),
  inputSchema
);
List lotsOfColumns = new ArrayList<>();
for (int i = 0; i < 3000; i++) {
  lotsOfColumns.add(lit("").as("field" + i).cast(DataTypes.StringType));
}
lotsOfColumns.add(new Column("foo"));inputDf
  
.select(JavaConverters.collectionAsScalaIterableConverter(lotsOfColumns).asScala().toSeq())
  .write()
  .format("csv")
  .mode(SaveMode.Append)
  .save("file:///tmp/testoutput");
  }
}
 {code}
 

And I get a StackOverflowError:


{code:java}
Exception in thread "main" org.apache.spark.SparkException: Job 
aborted.Exception in thread "main" org.apache.spark.SparkException: Job 
aborted. at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
 at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
 at 
org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) 
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
 at 
org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
 at 
org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
 at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
 at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696) 
at 
org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305) 
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291) at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249) at 
udp.task.ArwinTestSparkStackOverflow.main(ArwinTestSparkStackOverflow.java:52)Caused
 by: java.lang.StackOverflowError at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1522) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at 
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at 
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at 
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at

[jira] [Assigned] (SPARK-32919) Add support in Spark driver to coordinate the shuffle map stage in push-based shuffle by selecting external shuffle services for merging shuffle partitions

2020-11-20 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-32919:
---

Assignee: Venkata krishnan Sowrirajan

> Add support in Spark driver to coordinate the shuffle map stage in push-based 
> shuffle by selecting external shuffle services for merging shuffle partitions
> ---
>
> Key: SPARK-32919
> URL: https://issues.apache.org/jira/browse/SPARK-32919
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Min Shen
>Assignee: Venkata krishnan Sowrirajan
>Priority: Major
> Fix For: 3.1.0
>
>
> In the beginning of a shuffle map stage, driver needs to select external 
> shuffle services as the mergers of the shuffle partitions for the 
> corresponding shuffle.
> We currently leverage the immediate available information about current and 
> past executor location information for this selection purpose. Ideally, this 
> would be behind a pluggable interface so that we can potentially leverage 
> information tracked outside of a Spark application for better load balancing 
> or for a disaggregate deployment environment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32919) Add support in Spark driver to coordinate the shuffle map stage in push-based shuffle by selecting external shuffle services for merging shuffle partitions

2020-11-20 Thread Mridul Muralidharan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-32919.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30164
[https://github.com/apache/spark/pull/30164]

> Add support in Spark driver to coordinate the shuffle map stage in push-based 
> shuffle by selecting external shuffle services for merging shuffle partitions
> ---
>
> Key: SPARK-32919
> URL: https://issues.apache.org/jira/browse/SPARK-32919
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Min Shen
>Priority: Major
> Fix For: 3.1.0
>
>
> In the beginning of a shuffle map stage, driver needs to select external 
> shuffle services as the mergers of the shuffle partitions for the 
> corresponding shuffle.
> We currently leverage the immediate available information about current and 
> past executor location information for this selection purpose. Ideally, this 
> would be behind a pluggable interface so that we can potentially leverage 
> information tracked outside of a Spark application for better load balancing 
> or for a disaggregate deployment environment.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33501) Encoding is not working if multiLine option is true.

2020-11-20 Thread Nilesh Patil (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilesh Patil updated SPARK-33501:
-
Attachment: 1605860036183.csv

> Encoding is not working if multiLine option is true.
> 
>
> Key: SPARK-33501
> URL: https://issues.apache.org/jira/browse/SPARK-33501
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4
>Reporter: Nilesh Patil
>Priority: Major
> Attachments: 1605860036183.csv
>
>
> If we read with mulitLine true and encoding with "ISO-8859-1" then we are 
> getting  value like this {color:#FF}AUTO EL*�*TRICA{color}. and if we 
> read with multiLine false  and encoding with "ISO-8859-1" thne we are getting 
>  value like {color:#FF}AUTO EL*É*TRICA{color}
> Below is the code we are using
> Dataset dataset1 = SparkUtil.getSparkSession().read().Dataset 
> dataset1 = SparkUtil.getSparkSession().read(). option("header", "true"). 
> option("inferSchema", true). option("delimiter", ";") .option("quote", "\"") 
> .option("multiLine", true) .option("encoding", "ISO-8859-1") .csv("file 
> path");
> dataset1.show();
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33501) Encoding is not working if multiLine option is true.

2020-11-20 Thread Nilesh Patil (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nilesh Patil updated SPARK-33501:
-
Description: 
If we read with mulitLine true and encoding with "ISO-8859-1" then we are 
getting  value like this {color:#ff}AUTO EL*�*TRICA{color}. and if we read 
with multiLine false  and encoding with "ISO-8859-1" thne we are getting  value 
like {color:#ff}AUTO EL*É*TRICA{color}

Below is the code we are using

Dataset dataset1 = SparkUtil.getSparkSession().read().Dataset 
dataset1 = SparkUtil.getSparkSession().read(). option("header", "true"). 
option("inferSchema", true). option("delimiter", ";") .option("quote", "\"") 
.option("multiLine", true) .option("encoding", "ISO-8859-1") .csv("file path");

dataset1.show();

Sample file is attached in attachement 

  was:
If we read with mulitLine true and encoding with "ISO-8859-1" then we are 
getting  value like this {color:#FF}AUTO EL*�*TRICA{color}. and if we read 
with multiLine false  and encoding with "ISO-8859-1" thne we are getting  value 
like {color:#FF}AUTO EL*É*TRICA{color}

Below is the code we are using

Dataset dataset1 = SparkUtil.getSparkSession().read().Dataset 
dataset1 = SparkUtil.getSparkSession().read(). option("header", "true"). 
option("inferSchema", true). option("delimiter", ";") .option("quote", "\"") 
.option("multiLine", true) .option("encoding", "ISO-8859-1") .csv("file path");

dataset1.show();

 


> Encoding is not working if multiLine option is true.
> 
>
> Key: SPARK-33501
> URL: https://issues.apache.org/jira/browse/SPARK-33501
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.4
>Reporter: Nilesh Patil
>Priority: Major
> Attachments: 1605860036183.csv
>
>
> If we read with mulitLine true and encoding with "ISO-8859-1" then we are 
> getting  value like this {color:#ff}AUTO EL*�*TRICA{color}. and if we 
> read with multiLine false  and encoding with "ISO-8859-1" thne we are getting 
>  value like {color:#ff}AUTO EL*É*TRICA{color}
> Below is the code we are using
> Dataset dataset1 = SparkUtil.getSparkSession().read().Dataset 
> dataset1 = SparkUtil.getSparkSession().read(). option("header", "true"). 
> option("inferSchema", true). option("delimiter", ";") .option("quote", "\"") 
> .option("multiLine", true) .option("encoding", "ISO-8859-1") .csv("file 
> path");
> dataset1.show();
> Sample file is attached in attachement 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33501) Encoding is not working if multiLine option is true.

2020-11-20 Thread Nilesh Patil (Jira)

Nilesh Patil created SPARK-33501:


 Summary: Encoding is not working if multiLine option is true.
 Key: SPARK-33501
 URL: https://issues.apache.org/jira/browse/SPARK-33501
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.4
Reporter: Nilesh Patil


If we read with mulitLine true and encoding with "ISO-8859-1" then we are 
getting  value like this {color:#FF}AUTO EL*�*TRICA{color}. and if we read 
with multiLine false  and encoding with "ISO-8859-1" thne we are getting  value 
like {color:#FF}AUTO EL*É*TRICA{color}

Below is the code we are using

Dataset dataset1 = SparkUtil.getSparkSession().read().Dataset 
dataset1 = SparkUtil.getSparkSession().read(). option("header", "true"). 
option("inferSchema", true). option("delimiter", ";") .option("quote", "\"") 
.option("multiLine", true) .option("encoding", "ISO-8859-1") .csv("file path");

dataset1.show();

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33500) Support field "EPOCH" in datetime function extract/date_part

2020-11-20 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236083#comment-17236083
 ] 

Apache Spark commented on SPARK-33500:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30445

> Support field "EPOCH" in datetime function extract/date_part
> 
>
> Key: SPARK-33500
> URL: https://issues.apache.org/jira/browse/SPARK-33500
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Support filed EPOCH in the function `extract` and `date_part`.
> For example:
> SELECT EXTRACT(EPOCH FROM TIMESTAMP'2001-02-16 20:38:40-08');
> SELECT DATE_PART('EPOCH', TIMESTAMP'2001-02-16 20:38:40-08');
> This is useful for getting the number of seconds since `1970-01-01 
> 00:00:00-00`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33500) Support field "EPOCH" in datetime function extract/date_part

2020-11-20 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33500:


Assignee: Apache Spark  (was: Gengliang Wang)

> Support field "EPOCH" in datetime function extract/date_part
> 
>
> Key: SPARK-33500
> URL: https://issues.apache.org/jira/browse/SPARK-33500
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> Support filed EPOCH in the function `extract` and `date_part`.
> For example:
> SELECT EXTRACT(EPOCH FROM TIMESTAMP'2001-02-16 20:38:40-08');
> SELECT DATE_PART('EPOCH', TIMESTAMP'2001-02-16 20:38:40-08');
> This is useful for getting the number of seconds since `1970-01-01 
> 00:00:00-00`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33500) Support field "EPOCH" in datetime function extract/date_part

2020-11-20 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33500:


Assignee: Gengliang Wang  (was: Apache Spark)

> Support field "EPOCH" in datetime function extract/date_part
> 
>
> Key: SPARK-33500
> URL: https://issues.apache.org/jira/browse/SPARK-33500
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Support filed EPOCH in the function `extract` and `date_part`.
> For example:
> SELECT EXTRACT(EPOCH FROM TIMESTAMP'2001-02-16 20:38:40-08');
> SELECT DATE_PART('EPOCH', TIMESTAMP'2001-02-16 20:38:40-08');
> This is useful for getting the number of seconds since `1970-01-01 
> 00:00:00-00`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33500) Support field "EPOCH" in datetime function extract/date_part

2020-11-20 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-33500:
---
Summary: Support field "EPOCH" in datetime function extract/date_part  
(was: Support field "EPOCH" in function extract/date_part)

> Support field "EPOCH" in datetime function extract/date_part
> 
>
> Key: SPARK-33500
> URL: https://issues.apache.org/jira/browse/SPARK-33500
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> Support filed EPOCH in the function `extract` and `date_part`.
> For example:
> SELECT EXTRACT(EPOCH FROM TIMESTAMP'2001-02-16 20:38:40-08');
> SELECT DATE_PART('EPOCH', TIMESTAMP'2001-02-16 20:38:40-08');
> This is useful for getting the number of seconds since `1970-01-01 
> 00:00:00-00`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33500) Support field "EPOCH" in function extract/date_part

2020-11-20 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-33500:
--

 Summary: Support field "EPOCH" in function extract/date_part
 Key: SPARK-33500
 URL: https://issues.apache.org/jira/browse/SPARK-33500
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


Support filed EPOCH in the function `extract` and `date_part`.
For example:
SELECT EXTRACT(EPOCH FROM TIMESTAMP'2001-02-16 20:38:40-08');
SELECT DATE_PART('EPOCH', TIMESTAMP'2001-02-16 20:38:40-08');

This is useful for getting the number of seconds since `1970-01-01 00:00:00-00`





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32512) Add basic partition command for datasourcev2

2020-11-20 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236078#comment-17236078
 ] 

Apache Spark commented on SPARK-32512:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30444

> Add basic partition command for datasourcev2
> 
>
> Key: SPARK-32512
> URL: https://issues.apache.org/jira/browse/SPARK-32512
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Jackey Lee
>Assignee: Jackey Lee
>Priority: Major
> Fix For: 3.1.0
>
>
> This Jira is trying to add basic partition command API, 
> `AlterTableAddPartitionExec` and `AlterTableDropPartitionExec`,  to support 
> operating datasourcev2 partitions. This will use the new partition API 
> defined in [SPARK-31694|https://issues.apache.org/jira/browse/SPARK-31694].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33499) Enable `-Wunused:imports` in Scala 2.13 SBT

2020-11-20 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-33499:
-
Summary: Enable `-Wunused:imports` in Scala 2.13 SBT  (was: Enabled 
`-Wunused:imports` in Scala 2.13 SBT)

> Enable `-Wunused:imports` in Scala 2.13 SBT
> ---
>
> Key: SPARK-33499
> URL: https://issues.apache.org/jira/browse/SPARK-33499
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.1.0
> Environment: !image-2020-11-20-18-49-16-384.png!
> Scala 2.13 think this is a {{unused import}} but Scala 2.12 compile failed 
> without this import, so comments {{-Wunused:imports}} in Scala 2.13 and left 
> a TODO in this pr
>  
>Reporter: Yang Jie
>Priority: Minor
> Attachments: image-2020-11-20-18-50-21-567.png
>
>
> As the image, Scala 2.13 think `scala.language.higherKinds` is a {{unused 
> import, }}but Scala 2.12 compile failed without this import, so comments 
> {{-Wunused:imports}} in Scala 2.13 and left a TODO, 
> we should enabled `-Wunused:imports` when Scala 2.12 is no longer supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33499) Enabled `-Wunused:imports` in Scala 2.13 SBT

2020-11-20 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-33499:
-
Description: 
As the image, Scala 2.13 think `scala.language.higherKinds` is a {{unused 
import, }}but Scala 2.12 compile failed without this import, so comments 
{{-Wunused:imports}} in Scala 2.13 and left a TODO, 

we should enabled `-Wunused:imports` when Scala 2.12 is no longer supported.

> Enabled `-Wunused:imports` in Scala 2.13 SBT
> 
>
> Key: SPARK-33499
> URL: https://issues.apache.org/jira/browse/SPARK-33499
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.1.0
> Environment: !image-2020-11-20-18-49-16-384.png!
> Scala 2.13 think this is a {{unused import}} but Scala 2.12 compile failed 
> without this import, so comments {{-Wunused:imports}} in Scala 2.13 and left 
> a TODO in this pr
>  
>Reporter: Yang Jie
>Priority: Minor
> Attachments: image-2020-11-20-18-50-21-567.png
>
>
> As the image, Scala 2.13 think `scala.language.higherKinds` is a {{unused 
> import, }}but Scala 2.12 compile failed without this import, so comments 
> {{-Wunused:imports}} in Scala 2.13 and left a TODO, 
> we should enabled `-Wunused:imports` when Scala 2.12 is no longer supported.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33499) Enabled `-Wunused:imports` in Scala 2.13 SBT

2020-11-20 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-33499:
-
Attachment: (was: image-2020-11-20-18-50-14-328.png)

> Enabled `-Wunused:imports` in Scala 2.13 SBT
> 
>
> Key: SPARK-33499
> URL: https://issues.apache.org/jira/browse/SPARK-33499
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.1.0
> Environment: !image-2020-11-20-18-49-16-384.png!
> Scala 2.13 think this is a {{unused import}} but Scala 2.12 compile failed 
> without this import, so comments {{-Wunused:imports}} in Scala 2.13 and left 
> a TODO in this pr
>  
>Reporter: Yang Jie
>Priority: Minor
> Attachments: image-2020-11-20-18-50-21-567.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33499) Enabled `-Wunused:imports` in Scala 2.13 SBT

2020-11-20 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-33499:
-
Attachment: image-2020-11-20-18-50-21-567.png

> Enabled `-Wunused:imports` in Scala 2.13 SBT
> 
>
> Key: SPARK-33499
> URL: https://issues.apache.org/jira/browse/SPARK-33499
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.1.0
> Environment: !image-2020-11-20-18-49-16-384.png!
> Scala 2.13 think this is a {{unused import}} but Scala 2.12 compile failed 
> without this import, so comments {{-Wunused:imports}} in Scala 2.13 and left 
> a TODO in this pr
>  
>Reporter: Yang Jie
>Priority: Minor
> Attachments: image-2020-11-20-18-50-21-567.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33499) Enabled `-Wunused:imports` in Scala 2.13 SBT

2020-11-20 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-33499:
-
Attachment: image-2020-11-20-18-50-14-328.png

> Enabled `-Wunused:imports` in Scala 2.13 SBT
> 
>
> Key: SPARK-33499
> URL: https://issues.apache.org/jira/browse/SPARK-33499
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.1.0
> Environment: !image-2020-11-20-18-49-16-384.png!
> Scala 2.13 think this is a {{unused import}} but Scala 2.12 compile failed 
> without this import, so comments {{-Wunused:imports}} in Scala 2.13 and left 
> a TODO in this pr
>  
>Reporter: Yang Jie
>Priority: Minor
> Attachments: image-2020-11-20-18-50-21-567.png
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-33499) Enabled `-Wunused:imports` in Scala 2.13 SBT

2020-11-20 Thread Yang Jie (Jira)

Yang Jie created SPARK-33499:


 Summary: Enabled `-Wunused:imports` in Scala 2.13 SBT
 Key: SPARK-33499
 URL: https://issues.apache.org/jira/browse/SPARK-33499
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 3.1.0
 Environment: !image-2020-11-20-18-49-16-384.png!

Scala 2.13 think this is a {{unused import}} but Scala 2.12 compile failed 
without this import, so comments {{-Wunused:imports}} in Scala 2.13 and left a 
TODO in this pr

 
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33497) Override maxRows in some LogicalPlan

2020-11-20 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33497:


Assignee: (was: Apache Spark)

> Override maxRows in some LogicalPlan
> 
>
> Key: SPARK-33497
> URL: https://issues.apache.org/jira/browse/SPARK-33497
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>
> Override the missed maxRows method of some LogicalPlan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33497) Override maxRows in some LogicalPlan

2020-11-20 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236050#comment-17236050
 ] 

Apache Spark commented on SPARK-33497:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/30443

> Override maxRows in some LogicalPlan
> 
>
> Key: SPARK-33497
> URL: https://issues.apache.org/jira/browse/SPARK-33497
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Priority: Minor
>
> Override the missed maxRows method of some LogicalPlan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33497) Override maxRows in some LogicalPlan

2020-11-20 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33497:


Assignee: Apache Spark

> Override maxRows in some LogicalPlan
> 
>
> Key: SPARK-33497
> URL: https://issues.apache.org/jira/browse/SPARK-33497
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: ulysses you
>Assignee: Apache Spark
>Priority: Minor
>
> Override the missed maxRows method of some LogicalPlan.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33498) Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid

2020-11-20 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236030#comment-17236030
 ] 

Apache Spark commented on SPARK-33498:
--

User 'leanken' has created a pull request for this issue:
https://github.com/apache/spark/pull/30442

> Datetime parsing should fail if the input string can't be parsed, or the 
> pattern string is invalid
> --
>
> Key: SPARK-33498
> URL: https://issues.apache.org/jira/browse/SPARK-33498
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Leanken.Lin
>Priority: Major
>
> Datetime parsing should fail if the input string can't be parsed, or the 
> pattern string is invalid, when ANSI mode is enable. This patch should update 
> GetTimeStamp, UnixTimeStamp, ToUnixTimeStamp and Cast



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33498) Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid

2020-11-20 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33498:


Assignee: (was: Apache Spark)

> Datetime parsing should fail if the input string can't be parsed, or the 
> pattern string is invalid
> --
>
> Key: SPARK-33498
> URL: https://issues.apache.org/jira/browse/SPARK-33498
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Leanken.Lin
>Priority: Major
>
> Datetime parsing should fail if the input string can't be parsed, or the 
> pattern string is invalid, when ANSI mode is enable. This patch should update 
> GetTimeStamp, UnixTimeStamp, ToUnixTimeStamp and Cast



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-33498) Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid

2020-11-20 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-33498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236028#comment-17236028
 ] 

Apache Spark commented on SPARK-33498:
--

User 'leanken' has created a pull request for this issue:
https://github.com/apache/spark/pull/30442

> Datetime parsing should fail if the input string can't be parsed, or the 
> pattern string is invalid
> --
>
> Key: SPARK-33498
> URL: https://issues.apache.org/jira/browse/SPARK-33498
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Leanken.Lin
>Priority: Major
>
> Datetime parsing should fail if the input string can't be parsed, or the 
> pattern string is invalid, when ANSI mode is enable. This patch should update 
> GetTimeStamp, UnixTimeStamp, ToUnixTimeStamp and Cast



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-33498) Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid

2020-11-20 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33498:


Assignee: Apache Spark

> Datetime parsing should fail if the input string can't be parsed, or the 
> pattern string is invalid
> --
>
> Key: SPARK-33498
> URL: https://issues.apache.org/jira/browse/SPARK-33498
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Leanken.Lin
>Assignee: Apache Spark
>Priority: Major
>
> Datetime parsing should fail if the input string can't be parsed, or the 
> pattern string is invalid, when ANSI mode is enable. This patch should update 
> GetTimeStamp, UnixTimeStamp, ToUnixTimeStamp and Cast



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33498) Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid

2020-11-20 Thread Leanken.Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leanken.Lin updated SPARK-33498:

Summary: Datetime parsing should fail if the input string can't be parsed, 
or the pattern string is invalid  (was: Datetime parsing should fail if the 
input string can't be parsed, or he pattern string is invalid)

> Datetime parsing should fail if the input string can't be parsed, or the 
> pattern string is invalid
> --
>
> Key: SPARK-33498
> URL: https://issues.apache.org/jira/browse/SPARK-33498
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Leanken.Lin
>Priority: Major
>
> Datetime parsing should fail if the input string can't be parsed, or he 
> pattern string is invalid, when ANSI mode is enable. This patch should update 
> GetTimeStamp, UnixTimeStamp, ToUnixTimeStamp and Cast



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-33498) Datetime parsing should fail if the input string can't be parsed, or the pattern string is invalid

2020-11-20 Thread Leanken.Lin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-33498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Leanken.Lin updated SPARK-33498:

Description: Datetime parsing should fail if the input string can't be 
parsed, or the pattern string is invalid, when ANSI mode is enable. This patch 
should update GetTimeStamp, UnixTimeStamp, ToUnixTimeStamp and Cast  (was: 
Datetime parsing should fail if the input string can't be parsed, or he pattern 
string is invalid, when ANSI mode is enable. This patch should update 
GetTimeStamp, UnixTimeStamp, ToUnixTimeStamp and Cast)

> Datetime parsing should fail if the input string can't be parsed, or the 
> pattern string is invalid
> --
>
> Key: SPARK-33498
> URL: https://issues.apache.org/jira/browse/SPARK-33498
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Leanken.Lin
>Priority: Major
>
> Datetime parsing should fail if the input string can't be parsed, or the 
> pattern string is invalid, when ANSI mode is enable. This patch should update 
> GetTimeStamp, UnixTimeStamp, ToUnixTimeStamp and Cast



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

95 matches

Mail list logo