[jira] [Commented] (SPARK-44884) Spark doesn't create SUCCESS file in Spark 3.3.0+ when partitionOverwriteMode is dynamic

2023-08-31 Thread Igor Dvorzhak (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17760977#comment-17760977
 ] 

Igor Dvorzhak commented on SPARK-44884:
---

Is this a duplicate of the https://issues.apache.org/jira/browse/SPARK-35279?

> Spark doesn't create SUCCESS file in Spark 3.3.0+ when partitionOverwriteMode 
> is dynamic
> 
>
> Key: SPARK-44884
> URL: https://issues.apache.org/jira/browse/SPARK-44884
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dipayan Dev
>Priority: Major
> Attachments: image-2023-08-20-18-46-53-342.png, 
> image-2023-08-25-13-01-42-137.png
>
>
> The issue is not happening in Spark 2.x (I am using 2.4.0), but only in 3.3.0 
> (tested with 3.4.1 as well)
> Code to reproduce the issue
>  
> {code:java}
> scala> spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic") 
> scala> val DF = Seq(("test1", 123)).toDF("name", "num")
> scala> DF.write.option("path", 
> "gs://test_bucket/table").mode("overwrite").partitionBy("num").format("orc").saveAsTable("test_schema.test_tb1")
>  {code}
>  
> The above code succeeds and creates external Hive table, but {*}there is no 
> SUCCESS file generated{*}.
> Adding the content of the bucket after table creation
> !image-2023-08-25-13-01-42-137.png|width=500,height=130!
>  The same code when running with spark 2.4.0 (with or without external path), 
> generates the SUCCESS file.
> {code:java}
> scala> 
> DF.write.mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("test_schema.test_tb1"){code}
> !image-2023-08-20-18-46-53-342.png|width=465,height=166!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-38101) MetadataFetchFailedException due to decommission block migrations

2022-03-09 Thread Igor Dvorzhak (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503775#comment-17503775
 ] 

Igor Dvorzhak edited comment on SPARK-38101 at 3/9/22, 6:42 PM:


Is there a workaround for this issue?


was (Author: medb):
Are there any workaround for this issue?

> MetadataFetchFailedException due to decommission block migrations
> -
>
> Key: SPARK-38101
> URL: https://issues.apache.org/jira/browse/SPARK-38101
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.1.3, 3.2.1, 3.3.0, 3.2.2
>Reporter: Emil Ejbyfeldt
>Priority: Major
>
> As noted in SPARK-34939 there is race when using broadcast for map output 
> status. Explanation from SPARK-34939
> > After map statuses are broadcasted and the executors obtain serialized 
> > broadcasted map statuses. If any fetch failure happens after, Spark 
> > scheduler invalidates cached map statuses and destroy broadcasted value of 
> > the map statuses. Then any executor trying to deserialize serialized 
> > broadcasted map statuses and access broadcasted value, IOException will be 
> > thrown. Currently we don't catch it in MapOutputTrackerWorker and above 
> > exception will fail the application.
> But if running with `spark.decommission.enabled=true` and 
> `spark.storage.decommission.shuffleBlocks.enabled=true` there is another way 
> to hit this race, when a node is decommissioning and the shuffle blocks are 
> migrated. After a block has been migrated an update will be sent to the 
> driver for each block and the map output caches will be invalidated.
> Here are a driver when we hit the race condition running with spark 3.2.0:
> {code:java}
> 2022-01-28 03:20:12,409 INFO memory.MemoryStore: Block broadcast_27 stored as 
> values in memory (estimated size 5.5 MiB, free 11.0 GiB)
> 2022-01-28 03:20:12,410 INFO spark.ShuffleStatus: Updating map output for 
> 192108 to BlockManagerId(760, ip-10-231-63-204.ec2.internal, 34707, None)
> 2022-01-28 03:20:12,410 INFO spark.ShuffleStatus: Updating map output for 
> 179529 to BlockManagerId(743, ip-10-231-34-160.ec2.internal, 44225, None)
> 2022-01-28 03:20:12,414 INFO spark.ShuffleStatus: Updating map output for 
> 187194 to BlockManagerId(761, ip-10-231-43-219.ec2.internal, 39943, None)
> 2022-01-28 03:20:12,415 INFO spark.ShuffleStatus: Updating map output for 
> 190303 to BlockManagerId(270, ip-10-231-33-206.ec2.internal, 38965, None)
> 2022-01-28 03:20:12,416 INFO spark.ShuffleStatus: Updating map output for 
> 192220 to BlockManagerId(270, ip-10-231-33-206.ec2.internal, 38965, None)
> 2022-01-28 03:20:12,416 INFO spark.ShuffleStatus: Updating map output for 
> 182306 to BlockManagerId(688, ip-10-231-43-41.ec2.internal, 35967, None)
> 2022-01-28 03:20:12,417 INFO spark.ShuffleStatus: Updating map output for 
> 190387 to BlockManagerId(772, ip-10-231-55-173.ec2.internal, 35523, None)
> 2022-01-28 03:20:12,417 INFO memory.MemoryStore: Block broadcast_27_piece0 
> stored as bytes in memory (estimated size 4.0 MiB, free 10.9 GiB)
> 2022-01-28 03:20:12,417 INFO storage.BlockManagerInfo: Added 
> broadcast_27_piece0 in memory on ip-10-231-63-1.ec2.internal:34761 (size: 4.0 
> MiB, free: 11.0 GiB)
> 2022-01-28 03:20:12,418 INFO memory.MemoryStore: Block broadcast_27_piece1 
> stored as bytes in memory (estimated size 1520.4 KiB, free 10.9 GiB)
> 2022-01-28 03:20:12,418 INFO storage.BlockManagerInfo: Added 
> broadcast_27_piece1 in memory on ip-10-231-63-1.ec2.internal:34761 (size: 
> 1520.4 KiB, free: 11.0 GiB)
> 2022-01-28 03:20:12,418 INFO spark.MapOutputTracker: Broadcast outputstatuses 
> size = 416, actual size = 5747443
> 2022-01-28 03:20:12,419 INFO spark.ShuffleStatus: Updating map output for 
> 153389 to BlockManagerId(154, ip-10-231-42-104.ec2.internal, 44717, None)
> 2022-01-28 03:20:12,419 INFO broadcast.TorrentBroadcast: Destroying 
> Broadcast(27) (from updateMapOutput at BlockManagerMasterEndpoint.scala:594)
> 2022-01-28 03:20:12,427 INFO storage.BlockManagerInfo: Added rdd_65_20310 on 
> disk on ip-10-231-32-25.ec2.internal:40657 (size: 77.6 MiB)
> 2022-01-28 03:20:12,427 INFO storage.BlockManagerInfo: Removed 
> broadcast_27_piece0 on ip-10-231-63-1.ec2.internal:34761 in memory (size: 4.0 
> MiB, free: 11.0 GiB)
> {code}
> While the Broadcast is being constructed we have updates coming in and the 
> broadcast is destroyed almost immediately. On this particular job we ended up 
> hitting the race condition a lot of times and it caused ~18 task failures and 
> stage retries within 20 seconds causing us to hit our stage retry limit and 
> the job to fail.
> As far I understand this was the expected behavior for handling this case 
> after SPARK-34939. But it seems like when combined w

[jira] [Commented] (SPARK-38101) MetadataFetchFailedException due to decommission block migrations

2022-03-09 Thread Igor Dvorzhak (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503775#comment-17503775
 ] 

Igor Dvorzhak commented on SPARK-38101:
---

Are there any workaround for this issue?

> MetadataFetchFailedException due to decommission block migrations
> -
>
> Key: SPARK-38101
> URL: https://issues.apache.org/jira/browse/SPARK-38101
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2, 3.1.3, 3.2.1, 3.3.0, 3.2.2
>Reporter: Emil Ejbyfeldt
>Priority: Major
>
> As noted in SPARK-34939 there is race when using broadcast for map output 
> status. Explanation from SPARK-34939
> > After map statuses are broadcasted and the executors obtain serialized 
> > broadcasted map statuses. If any fetch failure happens after, Spark 
> > scheduler invalidates cached map statuses and destroy broadcasted value of 
> > the map statuses. Then any executor trying to deserialize serialized 
> > broadcasted map statuses and access broadcasted value, IOException will be 
> > thrown. Currently we don't catch it in MapOutputTrackerWorker and above 
> > exception will fail the application.
> But if running with `spark.decommission.enabled=true` and 
> `spark.storage.decommission.shuffleBlocks.enabled=true` there is another way 
> to hit this race, when a node is decommissioning and the shuffle blocks are 
> migrated. After a block has been migrated an update will be sent to the 
> driver for each block and the map output caches will be invalidated.
> Here are a driver when we hit the race condition running with spark 3.2.0:
> {code:java}
> 2022-01-28 03:20:12,409 INFO memory.MemoryStore: Block broadcast_27 stored as 
> values in memory (estimated size 5.5 MiB, free 11.0 GiB)
> 2022-01-28 03:20:12,410 INFO spark.ShuffleStatus: Updating map output for 
> 192108 to BlockManagerId(760, ip-10-231-63-204.ec2.internal, 34707, None)
> 2022-01-28 03:20:12,410 INFO spark.ShuffleStatus: Updating map output for 
> 179529 to BlockManagerId(743, ip-10-231-34-160.ec2.internal, 44225, None)
> 2022-01-28 03:20:12,414 INFO spark.ShuffleStatus: Updating map output for 
> 187194 to BlockManagerId(761, ip-10-231-43-219.ec2.internal, 39943, None)
> 2022-01-28 03:20:12,415 INFO spark.ShuffleStatus: Updating map output for 
> 190303 to BlockManagerId(270, ip-10-231-33-206.ec2.internal, 38965, None)
> 2022-01-28 03:20:12,416 INFO spark.ShuffleStatus: Updating map output for 
> 192220 to BlockManagerId(270, ip-10-231-33-206.ec2.internal, 38965, None)
> 2022-01-28 03:20:12,416 INFO spark.ShuffleStatus: Updating map output for 
> 182306 to BlockManagerId(688, ip-10-231-43-41.ec2.internal, 35967, None)
> 2022-01-28 03:20:12,417 INFO spark.ShuffleStatus: Updating map output for 
> 190387 to BlockManagerId(772, ip-10-231-55-173.ec2.internal, 35523, None)
> 2022-01-28 03:20:12,417 INFO memory.MemoryStore: Block broadcast_27_piece0 
> stored as bytes in memory (estimated size 4.0 MiB, free 10.9 GiB)
> 2022-01-28 03:20:12,417 INFO storage.BlockManagerInfo: Added 
> broadcast_27_piece0 in memory on ip-10-231-63-1.ec2.internal:34761 (size: 4.0 
> MiB, free: 11.0 GiB)
> 2022-01-28 03:20:12,418 INFO memory.MemoryStore: Block broadcast_27_piece1 
> stored as bytes in memory (estimated size 1520.4 KiB, free 10.9 GiB)
> 2022-01-28 03:20:12,418 INFO storage.BlockManagerInfo: Added 
> broadcast_27_piece1 in memory on ip-10-231-63-1.ec2.internal:34761 (size: 
> 1520.4 KiB, free: 11.0 GiB)
> 2022-01-28 03:20:12,418 INFO spark.MapOutputTracker: Broadcast outputstatuses 
> size = 416, actual size = 5747443
> 2022-01-28 03:20:12,419 INFO spark.ShuffleStatus: Updating map output for 
> 153389 to BlockManagerId(154, ip-10-231-42-104.ec2.internal, 44717, None)
> 2022-01-28 03:20:12,419 INFO broadcast.TorrentBroadcast: Destroying 
> Broadcast(27) (from updateMapOutput at BlockManagerMasterEndpoint.scala:594)
> 2022-01-28 03:20:12,427 INFO storage.BlockManagerInfo: Added rdd_65_20310 on 
> disk on ip-10-231-32-25.ec2.internal:40657 (size: 77.6 MiB)
> 2022-01-28 03:20:12,427 INFO storage.BlockManagerInfo: Removed 
> broadcast_27_piece0 on ip-10-231-63-1.ec2.internal:34761 in memory (size: 4.0 
> MiB, free: 11.0 GiB)
> {code}
> While the Broadcast is being constructed we have updates coming in and the 
> broadcast is destroyed almost immediately. On this particular job we ended up 
> hitting the race condition a lot of times and it caused ~18 task failures and 
> stage retries within 20 seconds causing us to hit our stage retry limit and 
> the job to fail.
> As far I understand this was the expected behavior for handling this case 
> after SPARK-34939. But it seems like when combined with decommissioning 
> hitting the race is a bit too common.
> We have observed this behavior running 3.2.0 a

[jira] [Updated] (SPARK-37025) Upgrade RoaringBitmap to 0.9.22

2021-10-16 Thread Igor Dvorzhak (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Dvorzhak updated SPARK-37025:
--
Labels:   (was: correctness)

> Upgrade RoaringBitmap to 0.9.22
> ---
>
> Key: SPARK-37025
> URL: https://issues.apache.org/jira/browse/SPARK-37025
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Igor Dvorzhak
>Assignee: Lantao Jin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37025) Upgrade RoaringBitmap to 0.9.22

2021-10-16 Thread Igor Dvorzhak (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Dvorzhak updated SPARK-37025:
--
Fix Version/s: (was: 2.4.2)
   (was: 2.3.4)
   (was: 3.0.0)
Affects Version/s: (was: 2.3.3)
   (was: 2.4.0)
   (was: 3.0.0)
   (was: 2.2.0)
   (was: 2.1.0)
   (was: 2.0.0)
   3.2.0
  Description: (was: HighlyCompressedMapStatus uses RoaringBitmap 
to record the empty blocks. But RoaringBitmap-0.5.11 couldn't be ser/deser with 
unsafe KryoSerializer.

We can use below UT to reproduce:
{code}
  test("kryo serialization with RoaringBitmap") {
val bitmap = new RoaringBitmap
bitmap.add(1787)

val safeSer = new KryoSerializer(conf).newInstance()
val bitmap2 : RoaringBitmap = safeSer.deserialize(safeSer.serialize(bitmap))
assert(bitmap2.equals(bitmap))

conf.set("spark.kryo.unsafe", "true")
val unsafeSer = new KryoSerializer(conf).newInstance()
val bitmap3 : RoaringBitmap = 
unsafeSer.deserialize(unsafeSer.serialize(bitmap))
assert(bitmap3.equals(bitmap)) // this will fail
  }
{code}
Upgrade to latest version 0.7.45 to fix it)

> Upgrade RoaringBitmap to 0.9.22
> ---
>
> Key: SPARK-37025
> URL: https://issues.apache.org/jira/browse/SPARK-37025
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Igor Dvorzhak
>Assignee: Lantao Jin
>Priority: Major
>  Labels: correctness
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37025) Upgrade RoaringBitmap to 0.9.22

2021-10-16 Thread Igor Dvorzhak (Jira)
Igor Dvorzhak created SPARK-37025:
-

 Summary: Upgrade RoaringBitmap to 0.9.22
 Key: SPARK-37025
 URL: https://issues.apache.org/jira/browse/SPARK-37025
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.0.0, 2.1.0, 2.2.0, 2.3.3, 2.4.0, 3.0.0
Reporter: Igor Dvorzhak
Assignee: Lantao Jin
 Fix For: 2.3.4, 2.4.2, 3.0.0


HighlyCompressedMapStatus uses RoaringBitmap to record the empty blocks. But 
RoaringBitmap-0.5.11 couldn't be ser/deser with unsafe KryoSerializer.

We can use below UT to reproduce:
{code}
  test("kryo serialization with RoaringBitmap") {
val bitmap = new RoaringBitmap
bitmap.add(1787)

val safeSer = new KryoSerializer(conf).newInstance()
val bitmap2 : RoaringBitmap = safeSer.deserialize(safeSer.serialize(bitmap))
assert(bitmap2.equals(bitmap))

conf.set("spark.kryo.unsafe", "true")
val unsafeSer = new KryoSerializer(conf).newInstance()
val bitmap3 : RoaringBitmap = 
unsafeSer.deserialize(unsafeSer.serialize(bitmap))
assert(bitmap3.equals(bitmap)) // this will fail
  }
{code}
Upgrade to latest version 0.7.45 to fix it



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33772) Build and Run Spark on Java 17

2021-09-14 Thread Igor Dvorzhak (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17415126#comment-17415126
 ] 

Igor Dvorzhak commented on SPARK-33772:
---

If Spark 3.2 will not support Java 17 out of the box, is it possible to at 
least document all the Java options that need to be set to make it work on JDK 
17?

> Build and Run Spark on Java 17
> --
>
> Key: SPARK-33772
> URL: https://issues.apache.org/jira/browse/SPARK-33772
> Project: Spark
>  Issue Type: New Feature
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Apache Spark supports Java 8 and Java 11 (LTS). The next Java LTS version is 
> 17.
> ||Version||Release Date||
> |Java 17 (LTS)|September 2021|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30643) Add support for embedding Hive 3

2020-01-25 Thread Igor Dvorzhak (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023726#comment-17023726
 ] 

Igor Dvorzhak commented on SPARK-30643:
---

Hi, [~dongjoon]. Thank you for fixing up this JIRA.

I think that majority of reasons that went into support of embedding Hive 2.3 
will apply to support of embedding Hive 3.

As a user I would want to have as close behavior as possible between Spark SQL 
and Hive queries in the same installation where I use both Spark and Hive. But 
if I chose to run Hive 3 and Spark with embedded Hive 2.3, then SparkSQL and 
Hive queries behavior could differ in some cases.

Personally, I'm interested in performance and correctness improvements that 
were made to Hive Server, Driver and Metastore client in Hive 3.

AWS EMR 6.0 (currently in beta) uses Hive 3, I would expect that other vendors 
will follow suit soon too.

Will follow up in dev list too, thanks!

> Add support for embedding Hive 3
> 
>
> Key: SPARK-30643
> URL: https://issues.apache.org/jira/browse/SPARK-30643
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Igor Dvorzhak
>Priority: Major
>
> Currently Spark can be compiled only against Hive 1.2.1 and Hive 2.3, 
> compilation fails against Hive 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30643) Add support for embedded Hive 3

2020-01-25 Thread Igor Dvorzhak (Jira)
Igor Dvorzhak created SPARK-30643:
-

 Summary: Add support for embedded Hive 3
 Key: SPARK-30643
 URL: https://issues.apache.org/jira/browse/SPARK-30643
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Igor Dvorzhak


Currently Spark can be compiled only against Hive 1.2.1 and Hive 2.3, 
compilation fails against Hive 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30643) Add support for embedding Hive 3

2020-01-25 Thread Igor Dvorzhak (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Dvorzhak updated SPARK-30643:
--
Summary: Add support for embedding Hive 3  (was: Add support for embedded 
Hive 3)

> Add support for embedding Hive 3
> 
>
> Key: SPARK-30643
> URL: https://issues.apache.org/jira/browse/SPARK-30643
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Igor Dvorzhak
>Priority: Major
>
> Currently Spark can be compiled only against Hive 1.2.1 and Hive 2.3, 
> compilation fails against Hive 3.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org