[jira] [Updated] (SPARK-44772) Reading blocks from remote executors causes timeout issue
[ https://issues.apache.org/jira/browse/SPARK-44772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nebi mert aydin updated SPARK-44772: Component/s: Shuffle Spark Core > Reading blocks from remote executors causes timeout issue > -- > > Key: SPARK-44772 > URL: https://issues.apache.org/jira/browse/SPARK-44772 > Project: Spark > Issue Type: Bug > Components: EC2, PySpark, Shuffle, Spark Core >Affects Versions: 3.1.2 >Reporter: nebi mert aydin >Priority: Major > > I'm using EMR 6.5 with Spark 3.1.2 > I'm shuffling 1.5 TiB of data with 3000 executors with 4 cores 23 gig memory > for executors > Also speculative mode is on. > {code:java} > // df.repartition(6000) {code} > I see lots of failures with > {code:java} > 2023-08-11 01:01:09,846 ERROR > org.apache.spark.network.server.ChunkFetchRequestHandler > (shuffle-server-4-95): Error sending result > ChunkFetchSuccess[streamChunkId=StreamChunkId[streamId=779084003612,chunkIndex=323],buffer=FileSegmentManagedBuffer[file=/mnt3/yarn/usercache/zeppelin/appcache/application_1691438567823_0012/blockmgr-0d82ca05-9429-4ff2-9f61-e779e8e60648/07/shuffle_5_114492_0.data,offset=1836997,length=618]] > to /172.31.20.110:36654; closing connection > java.nio.channels.ClosedChannelException > at > org.sparkproject.io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) > at > org.sparkproject.io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) > at > org.sparkproject.io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) > at > org.sparkproject.io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:110) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) > at > org.sparkproject.io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:302) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:808) > at > org.sparkproject.io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025) > at > org.sparkproject.io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:294) > at > org.apache.spark.network.server.ChunkFetchRequestHandler.respond(ChunkFetchRequestHandler.java:142) > at > org.apache.spark.network.server.ChunkFetchRequestHandler.processFetchRequest(ChunkFetchRequestHandler.java:116) > at > org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:107) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) > at > org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) > at > org.sparkproject.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > at > org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) > at >
[jira] [Updated] (SPARK-44772) Reading blocks from remote executors causes timeout issue
[ https://issues.apache.org/jira/browse/SPARK-44772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nebi mert aydin updated SPARK-44772: Description: I'm using EMR 6.5 with Spark 3.1.2 I'm shuffling 1.5 TiB of data with 3000 executors with 4 cores 23 gig memory for executors Also speculative mode is on. {code:java} // df.repartition(6000) {code} I see lots of failures with {code:java} 2023-08-11 01:01:09,846 ERROR org.apache.spark.network.server.ChunkFetchRequestHandler (shuffle-server-4-95): Error sending result ChunkFetchSuccess[streamChunkId=StreamChunkId[streamId=779084003612,chunkIndex=323],buffer=FileSegmentManagedBuffer[file=/mnt3/yarn/usercache/zeppelin/appcache/application_1691438567823_0012/blockmgr-0d82ca05-9429-4ff2-9f61-e779e8e60648/07/shuffle_5_114492_0.data,offset=1836997,length=618]] to /172.31.20.110:36654; closing connection java.nio.channels.ClosedChannelException at org.sparkproject.io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) at org.sparkproject.io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) at org.sparkproject.io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) at org.sparkproject.io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:110) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) at org.sparkproject.io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:302) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:808) at org.sparkproject.io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025) at org.sparkproject.io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:294) at org.apache.spark.network.server.ChunkFetchRequestHandler.respond(ChunkFetchRequestHandler.java:142) at org.apache.spark.network.server.ChunkFetchRequestHandler.processFetchRequest(ChunkFetchRequestHandler.java:116) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:107) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at org.sparkproject.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at org.sparkproject.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at
[jira] [Updated] (SPARK-44772) Reading blocks from remote executors causes timeout issue
[ https://issues.apache.org/jira/browse/SPARK-44772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nebi mert aydin updated SPARK-44772: Description: I'm using EMR 6.5 with Spark 3.1.2 I'm shuffling 1.5 TiB of data with 3000 executors with 4 cores 23 gig memory for executors {code:java} // df.repartition(6000) {code} I see lots of failures with {code:java} 2023-08-11 01:01:09,846 ERROR org.apache.spark.network.server.ChunkFetchRequestHandler (shuffle-server-4-95): Error sending result ChunkFetchSuccess[streamChunkId=StreamChunkId[streamId=779084003612,chunkIndex=323],buffer=FileSegmentManagedBuffer[file=/mnt3/yarn/usercache/zeppelin/appcache/application_1691438567823_0012/blockmgr-0d82ca05-9429-4ff2-9f61-e779e8e60648/07/shuffle_5_114492_0.data,offset=1836997,length=618]] to /172.31.20.110:36654; closing connection java.nio.channels.ClosedChannelException at org.sparkproject.io.netty.channel.AbstractChannel$AbstractUnsafe.newClosedChannelException(AbstractChannel.java:957) at org.sparkproject.io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:865) at org.sparkproject.io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) at org.sparkproject.io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:110) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:709) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:792) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) at org.sparkproject.io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:302) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:717) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:764) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:758) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:808) at org.sparkproject.io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1025) at org.sparkproject.io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:294) at org.apache.spark.network.server.ChunkFetchRequestHandler.respond(ChunkFetchRequestHandler.java:142) at org.apache.spark.network.server.ChunkFetchRequestHandler.processFetchRequest(ChunkFetchRequestHandler.java:116) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:107) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53) at org.sparkproject.io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) at org.sparkproject.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) at org.sparkproject.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) at
[jira] [Created] (SPARK-44772) Reading blocks from remote executors causes timeout issue
nebi mert aydin created SPARK-44772: --- Summary: Reading blocks from remote executors causes timeout issue Key: SPARK-44772 URL: https://issues.apache.org/jira/browse/SPARK-44772 Project: Spark Issue Type: Bug Components: EC2, PySpark Affects Versions: 3.1.2 Reporter: nebi mert aydin I'm using EMR 6.5 with Spark 3.1.2 I'm shuffling 1.5 TiB of data with 3000 executors with 4 cores 23 gig memory for executor `df.repartition(6000)` I see lots of failures with ``` 2023-08-11 01:01:09,847 ERROR org.apache.spark.network.server.ChunkFetchRequestHandler (shuffle-server-4-95): Error sending result ChunkFetchSuccess[streamChunkId=StreamChunkId[streamId=779084003612,chunkIndex=324],buffer=FileSegmentManagedBuffer[file=/mnt1/yarn/usercache/zeppelin/appcache/application_1691438567823_0012/blockmgr-b2f9bea5-068c-45c8-b324-1f132c87de54/24/shuffle_5_115515_0.data,offset=680394,length=255]] to /172.31.20.110:36654; closing connection ``` I tried to set this for kernel ``` sudo ethtool -K eth0 tso off sudo ethtool -K eth0 sg off ``` Didn't work. I guess external shuffle service is not able to send to data to other executors due to some reason. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24578) Reading remote cache block behavior changes and causes timeout issue
[ https://issues.apache.org/jira/browse/SPARK-24578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753038#comment-17753038 ] nebi mert aydin commented on SPARK-24578: - I still have this problem in Amazon EMR spark 3.1.2, any idea? > Reading remote cache block behavior changes and causes timeout issue > > > Key: SPARK-24578 > URL: https://issues.apache.org/jira/browse/SPARK-24578 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.3.0, 2.3.1 >Reporter: Wenbo Zhao >Assignee: Wenbo Zhao >Priority: Blocker > Fix For: 2.3.2, 2.4.0 > > > After Spark 2.3, we observed lots of errors like the following in some of our > production job > {code:java} > 18/06/15 20:59:42 ERROR TransportRequestHandler: Error sending result > ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=91672904003, > chunkIndex=0}, > buffer=org.apache.spark.storage.BlockManagerManagedBuffer@783a9324} to > /172.22.18.7:60865; closing connection > java.io.IOException: Broken pipe > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:65) > at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:471) > at > org.apache.spark.network.protocol.MessageWithHeader.writeNioBuffer(MessageWithHeader.java:156) > at > org.apache.spark.network.protocol.MessageWithHeader.copyByteBuf(MessageWithHeader.java:142) > at > org.apache.spark.network.protocol.MessageWithHeader.transferTo(MessageWithHeader.java:123) > at > io.netty.channel.socket.nio.NioSocketChannel.doWriteFileRegion(NioSocketChannel.java:355) > at > io.netty.channel.nio.AbstractNioByteChannel.doWrite(AbstractNioByteChannel.java:224) > at > io.netty.channel.socket.nio.NioSocketChannel.doWrite(NioSocketChannel.java:382) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:934) > at > io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.flush0(AbstractNioChannel.java:362) > at > io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:901) > at > io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1321) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768) > at > io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749) > at > io.netty.channel.ChannelOutboundHandlerAdapter.flush(ChannelOutboundHandlerAdapter.java:115) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768) > at > io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749) > at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush0(AbstractChannelHandlerContext.java:776) > at > io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:768) > at > io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:749) > at > io.netty.channel.DefaultChannelPipeline.flush(DefaultChannelPipeline.java:983) > at io.netty.channel.AbstractChannel.flush(AbstractChannel.java:248) > at > io.netty.channel.nio.AbstractNioByteChannel$1.run(AbstractNioByteChannel.java:284) > at > io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) > at > io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463) > at > io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) > at > io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) > {code} > > Here is a small reproducible for a small cluster of 2 executors (say host-1 > and host-2) each with 8 cores. Here, the memory of driver and executors are > not an import factor here as long as it is big enough, say 20G. > {code:java} > val n = 1 > val df0 = sc.parallelize(1 to n).toDF > val df = df0.withColumn("x0", rand()).withColumn("x0", rand() > ).withColumn("x1", rand() > ).withColumn("x2", rand() > ).withColumn("x3", rand() > ).withColumn("x4", rand() > ).withColumn("x5", rand() > ).withColumn("x6", rand() > ).withColumn("x7", rand() > ).withColumn("x8", rand() > ).withColumn("x9", rand())
[jira] [Commented] (SPARK-44719) NoClassDefFoundError when using Hive UDF
[ https://issues.apache.org/jira/browse/SPARK-44719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753037#comment-17753037 ] Snoot.io commented on SPARK-44719: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/42447 > NoClassDefFoundError when using Hive UDF > > > Key: SPARK-44719 > URL: https://issues.apache.org/jira/browse/SPARK-44719 > Project: Spark > Issue Type: Bug > Components: Build, SQL >Affects Versions: 3.5.0 >Reporter: Yuming Wang >Priority: Major > Attachments: HiveUDFs-1.0-SNAPSHOT.jar > > > How to reproduce: > {noformat} > spark-sql (default)> add jar > /Users/yumwang/Downloads/HiveUDFs-1.0-SNAPSHOT.jar; > Time taken: 0.413 seconds > spark-sql (default)> CREATE TEMPORARY FUNCTION long_to_ip as > 'net.petrabarus.hiveudfs.LongToIP'; > Time taken: 0.038 seconds > spark-sql (default)> SELECT long_to_ip(2130706433L) FROM range(10); > 23/08/08 20:17:58 ERROR SparkSQLDriver: Failed in [SELECT > long_to_ip(2130706433L) FROM range(10)] > java.lang.NoClassDefFoundError: org/codehaus/jackson/map/type/TypeFactory > at org.apache.hadoop.hive.ql.udf.UDFJson.(UDFJson.java:64) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > ... > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44461) Enable Process Isolation for streaming python worker
[ https://issues.apache.org/jira/browse/SPARK-44461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753036#comment-17753036 ] Snoot.io commented on SPARK-44461: -- User 'WweiL' has created a pull request for this issue: https://github.com/apache/spark/pull/42443 > Enable Process Isolation for streaming python worker > > > Key: SPARK-44461 > URL: https://issues.apache.org/jira/browse/SPARK-44461 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.4.1 >Reporter: Raghu Angadi >Priority: Major > > Enable PI for Python worker used for foreachBatch() & streaming listener in > Connect. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44768) Improve WSCG handling of row buffer by accounting for executor memory . Exploding nested arrays can easily lead to out of memory errors.
[ https://issues.apache.org/jira/browse/SPARK-44768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44768: Description: The code sample below is to showcase the wholestagecodegen generated when exploding nested arrays. The data sample in the dataframe is quite small so it will not trigger the Out of Memory error . However if the array is larger and the row size is large , you will definitely end up with an OOM error . consider a scenario where you flatten a nested array // e.g you can use the following steps to create the dataframe //create a partClass case class case class partClass (PARTNAME: String , PartNumber: String , PartPrice : Double ) //create a nested array array class case class array_array_class ( col_int: Int, arr_arr_string : Seq[Seq[String]], arr_arr_bigint : Seq[Seq[Long]], col_string : String, parts_s : Seq[Seq[partClass]] ) //create a dataframe var df_array_array = sc.parallelize( Seq( (1,Seq(Seq("a","b" ,"c" ,"d") ,Seq("aa","bb" ,"cc","dd")) , Seq(Seq(1000,2), Seq(3,-1)),"ItemPart1", Seq(Seq(partClass("PNAME1","P1",20.75),partClass("PNAME1_1","P1_1",30.75))) ) , (2,Seq(Seq("ab","bc" ,"cd" ,"de") ,Seq("aab","bbc" ,"ccd","dde"),Seq("aab")) , Seq(Seq(-1000,-2,-1,-2), Seq(0,3,-1)),"ItemPart2", Seq(Seq(partClass("PNAME2","P2",170.75),partClass("PNAME2_1","P2_1",33.75),partClass("PNAME2_2","P2_2",73.75))) ) ) ).toDF("c1" ,"c2" ,"c3" ,"c4" ,"c5") //explode a nested array var result = df_array_array.select( col("c1"), explode(col("c2"))).select('c1 , explode('col)) result.explain The physical for this operator is seen below. - Physical plan == Physical Plan == *(1) Generate explode(col#27), [c1#17|#17], false, [col#30|#30] +- *(1) Filter ((size(col#27, true) > 0) AND isnotnull(col#27)) +- *(1) Generate explode(c2#18), [c1#17|#17], false, [col#27|#27] +- *(1) Project [_1#6 AS c1#17, _2#7 AS c2#18|#6 AS c1#17, _2#7 AS c2#18] +- *(1) Filter ((size(_2#7, true) > 0) AND isnotnull(_2#7)) +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._1 AS _1#6, mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -1), mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -2), staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -2), StringType, ObjectType(class java.lang.String)), true, false, true), validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -1), ArrayType(StringType,true), ObjectType(interface scala.collection.Seq)), None), knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._2, None) AS _2#7, mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -3), mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -4), assertnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -4), IntegerType, IntegerType)), validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -3), ArrayType(IntegerType,false), ObjectType(interface scala.collection.Seq)), None), knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._3, None) AS _3#8, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._4, true, false, true) AS _4#9, mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -5), mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -6), if (isnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -6), StructField(PARTNAME,StringType,true), StructField(PartNumber,StringType,true), StructField(PartPrice,DoubleType,false), ObjectType(class $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$partClass null else named_struct(PARTNAME, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -6), StructField(PARTNAME,StringType,true), StructField(PartNumber,StringType,true), StructField(PartPrice,DoubleType,false), ObjectType(class $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$partClass))).PARTNAME, true, false, true), PartNumber, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -6), StructField(PARTNAME,StringType,true), StructField(PartNumber,StringType,true), StructField(PartPrice,DoubleType,false), ObjectType(class
[jira] [Commented] (SPARK-44768) Improve WSCG handling of row buffer by accounting for executor memory . Exploding nested arrays can easily lead to out of memory errors.
[ https://issues.apache.org/jira/browse/SPARK-44768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753035#comment-17753035 ] Franck Tago commented on SPARK-44768: - !image-2023-08-10-20-32-55-684.png! > Improve WSCG handling of row buffer by accounting for executor memory . > Exploding nested arrays can easily lead to out of memory errors. > --- > > Key: SPARK-44768 > URL: https://issues.apache.org/jira/browse/SPARK-44768 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.2, 3.4.0, 3.4.1 >Reporter: Franck Tago >Priority: Major > Attachments: image-2023-08-10-20-32-55-684.png, > spark-jira_wscg_code.txt > > > consider a scenario where you flatten a nested array > // e.g you can use the following steps to create the dataframe > //create a partClass case class > case class partClass (PARTNAME: String , PartNumber: String , PartPrice : > Double ) > //create a nested array array class > case class array_array_class ( > col_int: Int, > arr_arr_string : Seq[Seq[String]], > arr_arr_bigint : Seq[Seq[Long]], > col_string : String, > parts_s : Seq[Seq[partClass]] > > ) > //create a dataframe > var df_array_array = sc.parallelize( > Seq( > (1,Seq(Seq("a","b" ,"c" ,"d") ,Seq("aa","bb" ,"cc","dd")) , > Seq(Seq(1000,2), Seq(3,-1)),"ItemPart1", > Seq(Seq(partClass("PNAME1","P1",20.75),partClass("PNAME1_1","P1_1",30.75))) > ) , > > (2,Seq(Seq("ab","bc" ,"cd" ,"de") ,Seq("aab","bbc" > ,"ccd","dde"),Seq("aab")) , Seq(Seq(-1000,-2,-1,-2), > Seq(0,3,-1)),"ItemPart2", > > Seq(Seq(partClass("PNAME2","P2",170.75),partClass("PNAME2_1","P2_1",33.75),partClass("PNAME2_2","P2_2",73.75))) > ) > > ) > ).toDF("c1" ,"c2" ,"c3" ,"c4" ,"c5") > //explode a nested array > var result = df_array_array.select( col("c1"), > explode(col("c2"))).select('c1 , explode('col)) > result.explain > > The physical for this operator is seen below. > - > Physical plan > == Physical Plan == > *(1) Generate explode(col#27), [c1#17|#17], false, [col#30|#30] > +- *(1) Filter ((size(col#27, true) > 0) AND isnotnull(col#27)) > +- *(1) Generate explode(c2#18), [c1#17|#17], false, [col#27|#27] > +- *(1) Project [_1#6 AS c1#17, _2#7 AS c2#18|#6 AS c1#17, _2#7 AS > c2#18] > +- *(1) Filter ((size(_2#7, true) > 0) AND isnotnull(_2#7)) > +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, > scala.Tuple5, true]))._1 AS _1#6, mapobjects(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -1), > mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), > true, -2), staticinvoke(class org.apache.spark.unsafe.types.UTF8String, > StringType, fromString, validateexternaltype(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -2), StringType, ObjectType(class > java.lang.String)), true, false, true), > validateexternaltype(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -1), ArrayType(StringType,true), > ObjectType(interface scala.collection.Seq)), None), > knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._2, None) AS _2#7, > mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), > true, -3), mapobjects(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -4), > assertnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -4), IntegerType, IntegerType)), > validateexternaltype(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -3), ArrayType(IntegerType,false), > ObjectType(interface scala.collection.Seq)), None), > knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._3, None) AS _3#8, > staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, > fromString, knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._4, > true, false, true) AS _4#9, mapobjects(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -5), > mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), > true, -6), if (isnull(validateexternaltype(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -6), > StructField(PARTNAME,StringType,true), > StructField(PartNumber,StringType,true), > StructField(PartPrice,DoubleType,false), ObjectType(class > $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$partClass null else > named_struct(PARTNAME, staticinvoke(class > org.apache.spark.unsafe.types.UTF8String, StringType, fromString, > knownnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class > java.lang.Object),
[jira] [Updated] (SPARK-44768) Improve WSCG handling of row buffer by accounting for executor memory . Exploding nested arrays can easily lead to out of memory errors.
[ https://issues.apache.org/jira/browse/SPARK-44768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44768: Attachment: image-2023-08-10-20-32-55-684.png > Improve WSCG handling of row buffer by accounting for executor memory . > Exploding nested arrays can easily lead to out of memory errors. > --- > > Key: SPARK-44768 > URL: https://issues.apache.org/jira/browse/SPARK-44768 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.2, 3.4.0, 3.4.1 >Reporter: Franck Tago >Priority: Major > Attachments: image-2023-08-10-20-32-55-684.png, > spark-jira_wscg_code.txt > > > consider a scenario where you flatten a nested array > // e.g you can use the following steps to create the dataframe > //create a partClass case class > case class partClass (PARTNAME: String , PartNumber: String , PartPrice : > Double ) > //create a nested array array class > case class array_array_class ( > col_int: Int, > arr_arr_string : Seq[Seq[String]], > arr_arr_bigint : Seq[Seq[Long]], > col_string : String, > parts_s : Seq[Seq[partClass]] > > ) > //create a dataframe > var df_array_array = sc.parallelize( > Seq( > (1,Seq(Seq("a","b" ,"c" ,"d") ,Seq("aa","bb" ,"cc","dd")) , > Seq(Seq(1000,2), Seq(3,-1)),"ItemPart1", > Seq(Seq(partClass("PNAME1","P1",20.75),partClass("PNAME1_1","P1_1",30.75))) > ) , > > (2,Seq(Seq("ab","bc" ,"cd" ,"de") ,Seq("aab","bbc" > ,"ccd","dde"),Seq("aab")) , Seq(Seq(-1000,-2,-1,-2), > Seq(0,3,-1)),"ItemPart2", > > Seq(Seq(partClass("PNAME2","P2",170.75),partClass("PNAME2_1","P2_1",33.75),partClass("PNAME2_2","P2_2",73.75))) > ) > > ) > ).toDF("c1" ,"c2" ,"c3" ,"c4" ,"c5") > //explode a nested array > var result = df_array_array.select( col("c1"), > explode(col("c2"))).select('c1 , explode('col)) > result.explain > > The physical for this operator is seen below. > - > Physical plan > == Physical Plan == > *(1) Generate explode(col#27), [c1#17|#17], false, [col#30|#30] > +- *(1) Filter ((size(col#27, true) > 0) AND isnotnull(col#27)) > +- *(1) Generate explode(c2#18), [c1#17|#17], false, [col#27|#27] > +- *(1) Project [_1#6 AS c1#17, _2#7 AS c2#18|#6 AS c1#17, _2#7 AS > c2#18] > +- *(1) Filter ((size(_2#7, true) > 0) AND isnotnull(_2#7)) > +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, > scala.Tuple5, true]))._1 AS _1#6, mapobjects(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -1), > mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), > true, -2), staticinvoke(class org.apache.spark.unsafe.types.UTF8String, > StringType, fromString, validateexternaltype(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -2), StringType, ObjectType(class > java.lang.String)), true, false, true), > validateexternaltype(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -1), ArrayType(StringType,true), > ObjectType(interface scala.collection.Seq)), None), > knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._2, None) AS _2#7, > mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), > true, -3), mapobjects(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -4), > assertnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -4), IntegerType, IntegerType)), > validateexternaltype(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -3), ArrayType(IntegerType,false), > ObjectType(interface scala.collection.Seq)), None), > knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._3, None) AS _3#8, > staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, > fromString, knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._4, > true, false, true) AS _4#9, mapobjects(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -5), > mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), > true, -6), if (isnull(validateexternaltype(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -6), > StructField(PARTNAME,StringType,true), > StructField(PartNumber,StringType,true), > StructField(PartPrice,DoubleType,false), ObjectType(class > $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$partClass null else > named_struct(PARTNAME, staticinvoke(class > org.apache.spark.unsafe.types.UTF8String, StringType, fromString, > knownnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -6),
[jira] [Commented] (SPARK-43194) PySpark 3.4.0 cannot convert timestamp-typed objects to pandas with pandas 2.0
[ https://issues.apache.org/jira/browse/SPARK-43194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753034#comment-17753034 ] Berg Lloyd-Haig commented on SPARK-43194: - This is affecting us also with parquet read from AWS Glue Catalog (Hive-compatible metastore) with TimestampType fields. > PySpark 3.4.0 cannot convert timestamp-typed objects to pandas with pandas 2.0 > -- > > Key: SPARK-43194 > URL: https://issues.apache.org/jira/browse/SPARK-43194 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 > Environment: {code} > In [4]: import pandas as pd > In [5]: pd.__version__ > Out[5]: '2.0.0' > In [6]: import pyspark as ps > In [7]: ps.__version__ > Out[7]: '3.4.0' > {code} >Reporter: Phillip Cloud >Priority: Major > > {code} > In [1]: from pyspark.sql import SparkSession > In [2]: session = SparkSession.builder.appName("test").getOrCreate() > 23/04/19 09:21:42 WARN Utils: Your hostname, albatross resolves to a loopback > address: 127.0.0.2; using 192.168.1.170 instead (on interface enp5s0) > 23/04/19 09:21:42 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to > another address > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > 23/04/19 09:21:42 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > In [3]: session.sql("select now()").toPandas() > {code} > Results in: > {code} > ... > TypeError: Casting to unit-less dtype 'datetime64' is not supported. Pass > e.g. 'datetime64[ns]' instead. > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44771) Remove 'sudo' in 'pip install' suggestions in the dev scripts
[ https://issues.apache.org/jira/browse/SPARK-44771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-44771. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42444 [https://github.com/apache/spark/pull/42444] > Remove 'sudo' in 'pip install' suggestions in the dev scripts > - > > Key: SPARK-44771 > URL: https://issues.apache.org/jira/browse/SPARK-44771 > Project: Spark > Issue Type: Task > Components: Build >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Trivial > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42132) DeduplicateRelations rule breaks plan when co-grouping the same DataFrame
[ https://issues.apache.org/jira/browse/SPARK-42132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jia Fan resolved SPARK-42132. - Fix Version/s: 3.5.0 Resolution: Fixed > DeduplicateRelations rule breaks plan when co-grouping the same DataFrame > - > > Key: SPARK-42132 > URL: https://issues.apache.org/jira/browse/SPARK-42132 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.3, 3.1.3, 3.3.0, 3.3.1, 3.2.3, 3.4.0, 3.5.0 >Reporter: Enrico Minack >Priority: Major > Labels: correctness > Fix For: 3.5.0 > > > Co-grouping two DataFrames that share references breaks on the > DeduplicateRelations rule: > {code:java} > val df = spark.range(3) > val left_grouped_df = df.groupBy("id").as[Long, Long] > val right_grouped_df = df.groupBy("id").as[Long, Long] > val cogroup_df = left_grouped_df.cogroup(right_grouped_df) { > case (key, left, right) => left > } > cogroup_df.explain() > {code} > {code:java} > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- SerializeFromObject [input[0, bigint, false] AS value#12L] >+- CoGroup, id#0: bigint, id#0: bigint, id#0: bigint, [id#13L], [id#13L], > [id#13L], [id#13L], obj#11: bigint > :- !Sort [id#13L ASC NULLS FIRST], false, 0 > : +- !Exchange hashpartitioning(id#13L, 200), ENSURE_REQUIREMENTS, > [plan_id=16] > : +- Range (0, 3, step=1, splits=16) > +- Sort [id#13L ASC NULLS FIRST], false, 0 > +- Exchange hashpartitioning(id#13L, 200), ENSURE_REQUIREMENTS, > [plan_id=17] > +- Range (0, 3, step=1, splits=16) > {code} > The DataFrame cannot be computed: > {code:java} > cogroup_df.show() > {code} > {code:java} > java.lang.IllegalStateException: Couldn't find id#13L in [id#0L] > {code} > The rule replaces `id#0L` on the right side with `id#13L` while replacing all > occurrences in `CoGroup`. Some occurrences of `id#0L` in `CoGroup`refer to > the left side and should not be replaced. Further, `id#0L` of the right > deserializer is not replaced. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44771) Remove 'sudo' in 'pip install' suggestions in the dev scripts
Gengliang Wang created SPARK-44771: -- Summary: Remove 'sudo' in 'pip install' suggestions in the dev scripts Key: SPARK-44771 URL: https://issues.apache.org/jira/browse/SPARK-44771 Project: Spark Issue Type: Task Components: Build Affects Versions: 4.0.0 Reporter: Gengliang Wang Assignee: Gengliang Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44763) Fix a bug of promoting string as double in binary arithmetic with interval
[ https://issues.apache.org/jira/browse/SPARK-44763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-44763. Target Version/s: 4.0.0 Resolution: Fixed > Fix a bug of promoting string as double in binary arithmetic with interval > > > Key: SPARK-44763 > URL: https://issues.apache.org/jira/browse/SPARK-44763 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > The following query works on branch-3.5 or below, but fails on the latest > master: > ``` > select concat(DATE'2020-12-31', ' ', date_format('09:03:08', 'HH:mm:ss')) + > (INTERVAL '03' HOUR) > ``` > > The direct reason is now we mark `cast(date as string)` as resolved during > type coercion after changes [https://github.com/apache/spark/pull/42089.] As > a result, there are two transforms from CombinedTypeCoercionRule > ``` > Rule ConcatCoercion Transformed concat(2020-12-31, , > date_format(cast(09:03:08 as timestamp), HH:mm:ss, > Some(America/Los_Angeles))) to concat(cast(2020-12-31 as string), , > date_format(cast(09:03:08 as timestamp), HH:mm:ss, Some(America/Los_Angeles))) > Rule PromoteStrings Transformed (concat(cast(2020-12-31 as string), , > date_format(cast(09:03:08 as timestamp), HH:mm:ss, > Some(America/Los_Angeles))) + INTERVAL '03' HOUR) to > (cast(concat(cast(2020-12-31 as string), , date_format(cast(09:03:08 as > timestamp), HH:mm:ss, Some(America/Los_Angeles))) as double) + INTERVAL '03' > HOUR) > ``` > The second transform doesn't happen in previous releases since > cast(2020-12-31 as string) used to be unresolved after the first transform. > > The fix is simple, the analyzer should not promote string as double in binary > arithmetic with ANSI interval. The changes in > [https://github.com/apache/spark/pull/42089|https://github.com/apache/spark/pull/42089.] > are valid and we should keep it. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44763) Fix a bug of promoting string as double in binary arithmetic with interval
[ https://issues.apache.org/jira/browse/SPARK-44763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17753020#comment-17753020 ] Gengliang Wang commented on SPARK-44763: Resolved in https://github.com/apache/spark/pull/42436 > Fix a bug of promoting string as double in binary arithmetic with interval > > > Key: SPARK-44763 > URL: https://issues.apache.org/jira/browse/SPARK-44763 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 4.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > The following query works on branch-3.5 or below, but fails on the latest > master: > ``` > select concat(DATE'2020-12-31', ' ', date_format('09:03:08', 'HH:mm:ss')) + > (INTERVAL '03' HOUR) > ``` > > The direct reason is now we mark `cast(date as string)` as resolved during > type coercion after changes [https://github.com/apache/spark/pull/42089.] As > a result, there are two transforms from CombinedTypeCoercionRule > ``` > Rule ConcatCoercion Transformed concat(2020-12-31, , > date_format(cast(09:03:08 as timestamp), HH:mm:ss, > Some(America/Los_Angeles))) to concat(cast(2020-12-31 as string), , > date_format(cast(09:03:08 as timestamp), HH:mm:ss, Some(America/Los_Angeles))) > Rule PromoteStrings Transformed (concat(cast(2020-12-31 as string), , > date_format(cast(09:03:08 as timestamp), HH:mm:ss, > Some(America/Los_Angeles))) + INTERVAL '03' HOUR) to > (cast(concat(cast(2020-12-31 as string), , date_format(cast(09:03:08 as > timestamp), HH:mm:ss, Some(America/Los_Angeles))) as double) + INTERVAL '03' > HOUR) > ``` > The second transform doesn't happen in previous releases since > cast(2020-12-31 as string) used to be unresolved after the first transform. > > The fix is simple, the analyzer should not promote string as double in binary > arithmetic with ANSI interval. The changes in > [https://github.com/apache/spark/pull/42089|https://github.com/apache/spark/pull/42089.] > are valid and we should keep it. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43781) IllegalStateException when cogrouping two datasets derived from the same source
[ https://issues.apache.org/jira/browse/SPARK-43781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-43781. - Fix Version/s: 3.5.0 Assignee: Jia Fan Resolution: Fixed > IllegalStateException when cogrouping two datasets derived from the same > source > --- > > Key: SPARK-43781 > URL: https://issues.apache.org/jira/browse/SPARK-43781 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.1, 3.4.0 > Environment: Reproduces in a unit test, using Spark 3.3.1, the Java > API, and a {{local[2]}} SparkSession. >Reporter: Derek Murray >Assignee: Jia Fan >Priority: Major > Fix For: 3.5.0 > > > Attempting to {{cogroup}} two datasets derived from the same source dataset > yields an {{IllegalStateException}} when the query is executed. > Minimal reproducer: > {code:java} > StructType inputType = DataTypes.createStructType( > new StructField[]{ > DataTypes.createStructField("id", DataTypes.LongType, false), > DataTypes.createStructField("type", DataTypes.StringType, false) > } > ); > StructType keyType = DataTypes.createStructType( > new StructField[]{ > DataTypes.createStructField("id", DataTypes.LongType, false) > } > ); > List inputRows = new ArrayList<>(); > inputRows.add(RowFactory.create(1L, "foo")); > inputRows.add(RowFactory.create(1L, "bar")); > inputRows.add(RowFactory.create(2L, "foo")); > Dataset input = sparkSession.createDataFrame(inputRows, inputType); > KeyValueGroupedDataset fooGroups = input > .filter("type = 'foo'") > .groupBy("id") > .as(RowEncoder.apply(keyType), RowEncoder.apply(inputType)); > KeyValueGroupedDataset barGroups = input > .filter("type = 'bar'") > .groupBy("id") > .as(RowEncoder.apply(keyType), RowEncoder.apply(inputType)); > Dataset result = fooGroups.cogroup( > barGroups, > (CoGroupFunction) (row, iterator, iterator1) -> new > ArrayList().iterator(), > RowEncoder.apply(inputType)); > result.explain(); > result.show();{code} > Explain output (note mismatch in column IDs between Sort/Exchagne and > LocalTableScan on the first input to the CoGroup): > {code:java} > == Physical Plan == > AdaptiveSparkPlan isFinalPlan=false > +- SerializeFromObject > [validateexternaltype(getexternalrowfield(assertnotnull(input[0, > org.apache.spark.sql.Row, true]), 0, id), LongType, false) AS id#37L, > staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, > fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, > org.apache.spark.sql.Row, true]), 1, type), StringType, false), true, false, > true) AS type#38] > +- CoGroup > org.apache.spark.sql.KeyValueGroupedDataset$$Lambda$1478/1869116781@77856cc5, > createexternalrow(id#16L, StructField(id,LongType,false)), > createexternalrow(id#16L, type#17.toString, StructField(id,LongType,false), > StructField(type,StringType,false)), createexternalrow(id#16L, > type#17.toString, StructField(id,LongType,false), > StructField(type,StringType,false)), [id#39L], [id#39L], [id#39L, type#40], > [id#39L, type#40], obj#36: org.apache.spark.sql.Row > :- !Sort [id#39L ASC NULLS FIRST], false, 0 > : +- !Exchange hashpartitioning(id#39L, 2), ENSURE_REQUIREMENTS, > [plan_id=19] > : +- LocalTableScan [id#16L, type#17] > +- Sort [id#39L ASC NULLS FIRST], false, 0 > +- Exchange hashpartitioning(id#39L, 2), ENSURE_REQUIREMENTS, > [plan_id=20] > +- LocalTableScan [id#39L, type#40]{code} > Exception: > {code:java} > java.lang.IllegalStateException: Couldn't find id#39L in [id#16L,type#17] > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:80) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:73) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$3(TreeNode.scala:589) > at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:75) > at scala.collection.immutable.ArraySeq.map(ArraySeq.scala:35) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:698) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:589) > at >
[jira] [Reopened] (SPARK-44760) Index Out Of Bound for JIRA resolution in merge_spark_pr
[ https://issues.apache.org/jira/browse/SPARK-44760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-44760: -- Assignee: (was: Kent Yao) Reverted at https://github.com/apache/spark/commit/3164ff51dd249670b8505a5ea8d361cf53c9db94 > Index Out Of Bound for JIRA resolution in merge_spark_pr > > > Key: SPARK-44760 > URL: https://issues.apache.org/jira/browse/SPARK-44760 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > > I -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44760) Index Out Of Bound for JIRA resolution in merge_spark_pr
[ https://issues.apache.org/jira/browse/SPARK-44760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-44760: - Fix Version/s: (was: 4.0.0) > Index Out Of Bound for JIRA resolution in merge_spark_pr > > > Key: SPARK-44760 > URL: https://issues.apache.org/jira/browse/SPARK-44760 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > > I -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43872) Enable DataFramePlotMatplotlibTests for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43872: Assignee: Haejoon Lee > Enable DataFramePlotMatplotlibTests for pandas 2.0.0. > - > > Key: SPARK-43872 > URL: https://issues.apache.org/jira/browse/SPARK-43872 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > test list: > * test_area_plot > * test_area_plot_stacked_false > * test_area_plot_y > * test_bar_plot > * test_bar_with_x_y > * test_barh_plot_with_x_y > * test_barh_plot > * test_line_plot > * test_pie_plot > * test_scatter_plot > * test_hist_plot > * test_kde_plot -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44705) Make PythonRunner single-threaded
[ https://issues.apache.org/jira/browse/SPARK-44705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44705. -- Assignee: Utkarsh Agarwal Resolution: Fixed Fixed in https://github.com/apache/spark/pull/42385 > Make PythonRunner single-threaded > - > > Key: SPARK-44705 > URL: https://issues.apache.org/jira/browse/SPARK-44705 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Utkarsh Agarwal >Assignee: Utkarsh Agarwal >Priority: Major > > PythonRunner, a utility that executes Python UDFs in Spark, uses two threads > in a producer-consumer model today. This multi-threading model is problematic > and confusing as Spark's execution model within a task is commonly understood > to be single-threaded. > More importantly, this departure of a double-threaded execution resulted in a > series of customer issues involving [race > conditions|https://issues.apache.org/jira/browse/SPARK-33277] and > [deadlocks|https://issues.apache.org/jira/browse/SPARK-38677] between threads > as the code was hard to reason about. There have been multiple attempts to > reign in these issues, viz., [fix > 1|https://issues.apache.org/jira/browse/SPARK-22535], [fix > 2|https://github.com/apache/spark/pull/30177], [fix > 3|https://github.com/apache/spark/commit/243c321db2f02f6b4d926114bd37a6e74c2be185]. > Moreover, the fixes have made the code base somewhat abstruse by introducing > multiple daemon [monitor > threads|https://github.com/apache/spark/blob/a3a32912be04d3760cb34eb4b79d6d481bbec502/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala#L579] > to detect deadlocks. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44765) Make ReleaseExecute retry in ExecutePlanResponseReattachableIterator reuse common mechanism
[ https://issues.apache.org/jira/browse/SPARK-44765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44765: Assignee: Juliusz Sompolski > Make ReleaseExecute retry in ExecutePlanResponseReattachableIterator reuse > common mechanism > --- > > Key: SPARK-44765 > URL: https://issues.apache.org/jira/browse/SPARK-44765 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Juliusz Sompolski >Assignee: Juliusz Sompolski >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-43032) Add StreamingQueryManager API
[ https://issues.apache.org/jira/browse/SPARK-43032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-43032: - Fix Version/s: 3.5.0 4.0.0 > Add StreamingQueryManager API > - > > Key: SPARK-43032 > URL: https://issues.apache.org/jira/browse/SPARK-43032 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.5.0 >Reporter: Raghu Angadi >Assignee: Wei Liu >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > Add StreamingQueryManager API. It would include API that can be directly > support. API like registering streaming listener will be handled separately. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44766) Cache the pandas converter for Python UDTFs
[ https://issues.apache.org/jira/browse/SPARK-44766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44766. -- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull request 42439 [https://github.com/apache/spark/pull/42439] > Cache the pandas converter for Python UDTFs > --- > > Key: SPARK-44766 > URL: https://issues.apache.org/jira/browse/SPARK-44766 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > We should cache the pandas converter when serializing pandas dataframe to > arrow batch. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44766) Cache the pandas converter for Python UDTFs
[ https://issues.apache.org/jira/browse/SPARK-44766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44766: Assignee: Allison Wang > Cache the pandas converter for Python UDTFs > --- > > Key: SPARK-44766 > URL: https://issues.apache.org/jira/browse/SPARK-44766 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Allison Wang >Assignee: Allison Wang >Priority: Major > > We should cache the pandas converter when serializing pandas dataframe to > arrow batch. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44768) Improve WSCG handling of row buffer by accounting for executor memory . Exploding nested arrays can easily lead to out of memory errors.
[ https://issues.apache.org/jira/browse/SPARK-44768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44768: Description: consider a scenario where you flatten a nested array // e.g you can use the following steps to create the dataframe //create a partClass case class case class partClass (PARTNAME: String , PartNumber: String , PartPrice : Double ) //create a nested array array class case class array_array_class ( col_int: Int, arr_arr_string : Seq[Seq[String]], arr_arr_bigint : Seq[Seq[Long]], col_string : String, parts_s : Seq[Seq[partClass]] ) //create a dataframe var df_array_array = sc.parallelize( Seq( (1,Seq(Seq("a","b" ,"c" ,"d") ,Seq("aa","bb" ,"cc","dd")) , Seq(Seq(1000,2), Seq(3,-1)),"ItemPart1", Seq(Seq(partClass("PNAME1","P1",20.75),partClass("PNAME1_1","P1_1",30.75))) ) , (2,Seq(Seq("ab","bc" ,"cd" ,"de") ,Seq("aab","bbc" ,"ccd","dde"),Seq("aab")) , Seq(Seq(-1000,-2,-1,-2), Seq(0,3,-1)),"ItemPart2", Seq(Seq(partClass("PNAME2","P2",170.75),partClass("PNAME2_1","P2_1",33.75),partClass("PNAME2_2","P2_2",73.75))) ) ) ).toDF("c1" ,"c2" ,"c3" ,"c4" ,"c5") //explode a nested array var result = df_array_array.select( col("c1"), explode(col("c2"))).select('c1 , explode('col)) result.explain The physical for this operator is seen below. - Physical plan == Physical Plan == *(1) Generate explode(col#27), [c1#17|#17], false, [col#30|#30] +- *(1) Filter ((size(col#27, true) > 0) AND isnotnull(col#27)) +- *(1) Generate explode(c2#18), [c1#17|#17], false, [col#27|#27] +- *(1) Project [_1#6 AS c1#17, _2#7 AS c2#18|#6 AS c1#17, _2#7 AS c2#18] +- *(1) Filter ((size(_2#7, true) > 0) AND isnotnull(_2#7)) +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._1 AS _1#6, mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -1), mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -2), staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -2), StringType, ObjectType(class java.lang.String)), true, false, true), validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -1), ArrayType(StringType,true), ObjectType(interface scala.collection.Seq)), None), knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._2, None) AS _2#7, mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -3), mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -4), assertnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -4), IntegerType, IntegerType)), validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -3), ArrayType(IntegerType,false), ObjectType(interface scala.collection.Seq)), None), knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._3, None) AS _3#8, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._4, true, false, true) AS _4#9, mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -5), mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -6), if (isnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -6), StructField(PARTNAME,StringType,true), StructField(PartNumber,StringType,true), StructField(PartPrice,DoubleType,false), ObjectType(class $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$partClass null else named_struct(PARTNAME, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -6), StructField(PARTNAME,StringType,true), StructField(PartNumber,StringType,true), StructField(PartPrice,DoubleType,false), ObjectType(class $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$partClass))).PARTNAME, true, false, true), PartNumber, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -6), StructField(PARTNAME,StringType,true), StructField(PartNumber,StringType,true), StructField(PartPrice,DoubleType,false), ObjectType(class $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$partClass))).PartNumber, true, false, true), PartPrice, knownnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -6), StructField(PARTNAME,StringType,true), StructField(PartNumber,StringType,true),
[jira] [Updated] (SPARK-44768) Improve WSCG handling of row buffer by accounting for executor memory . Exploding nested arrays can easily lead to out of memory errors.
[ https://issues.apache.org/jira/browse/SPARK-44768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44768: Attachment: spark-jira_wscg_code.txt > Improve WSCG handling of row buffer by accounting for executor memory . > Exploding nested arrays can easily lead to out of memory errors. > --- > > Key: SPARK-44768 > URL: https://issues.apache.org/jira/browse/SPARK-44768 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.2, 3.4.0, 3.4.1 >Reporter: Franck Tago >Priority: Major > Attachments: spark-jira_wscg_code.txt > > > consider a scenario where you flatten a nested array > // e.g you can use the following steps to create the dataframe > /create a partClass > case class partClass (PARTNAME: String , PartNumber: String , PartPrice : > Double ) > //create a nested array array class > case class array_array_class ( > col_int: Int, > arr_arr_string : Seq[Seq[String]], > arr_arr_bigint : Seq[Seq[Long]], > col_string : String, > parts_s : Seq[Seq[partClass]] > > ) > //create a dataframe > var df_array_array = sc.parallelize( > Seq( > (1,Seq(Seq("a","b" ,"c" ,"d") ,Seq("aa","bb" ,"cc","dd")) , > Seq(Seq(1000,2), Seq(3,-1)),"ItemPart1", > Seq(Seq(partClass("PNAME1","P1",20.75),partClass("PNAME1_1","P1_1",30.75))) > ) , > > (2,Seq(Seq("ab","bc" ,"cd" ,"de") ,Seq("aab","bbc" > ,"ccd","dde"),Seq("aab")) , Seq(Seq(-1000,-2,-1,-2), > Seq(0,3,-1)),"ItemPart2", > > Seq(Seq(partClass("PNAME2","P2",170.75),partClass("PNAME2_1","P2_1",33.75),partClass("PNAME2_2","P2_2",73.75))) > ) > > ) > ).toDF("c1" ,"c2" ,"c3" ,"c4" ,"c5") > //explode a nested array > var result = df_array_array.select( col("c1"), > explode(col("c2"))).select('c1 , explode('col)) > result.explain > > The physical for this operator is seen below. > - > Physical plan > == Physical Plan == > *(1) Generate explode(col#27), [c1#17], false, [col#30] > +- *(1) Filter ((size(col#27, true) > 0) AND isnotnull(col#27)) > +- *(1) Generate explode(c2#18), [c1#17], false, [col#27] > +- *(1) Project [_1#6 AS c1#17, _2#7 AS c2#18] > +- *(1) Filter ((size(_2#7, true) > 0) AND isnotnull(_2#7)) > +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, > scala.Tuple5, true]))._1 AS _1#6, mapobjects(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -1), > mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), > true, -2), staticinvoke(class org.apache.spark.unsafe.types.UTF8String, > StringType, fromString, validateexternaltype(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -2), StringType, ObjectType(class > java.lang.String)), true, false, true), > validateexternaltype(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -1), ArrayType(StringType,true), > ObjectType(interface scala.collection.Seq)), None), > knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._2, None) AS _2#7, > mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), > true, -3), mapobjects(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -4), > assertnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -4), IntegerType, IntegerType)), > validateexternaltype(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -3), ArrayType(IntegerType,false), > ObjectType(interface scala.collection.Seq)), None), > knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._3, None) AS _3#8, > staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, > fromString, knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._4, > true, false, true) AS _4#9, mapobjects(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -5), > mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), > true, -6), if (isnull(validateexternaltype(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -6), > StructField(PARTNAME,StringType,true), > StructField(PartNumber,StringType,true), > StructField(PartPrice,DoubleType,false), ObjectType(class > $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$partClass null else > named_struct(PARTNAME, staticinvoke(class > org.apache.spark.unsafe.types.UTF8String, StringType, fromString, > knownnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -6), StructField(PARTNAME,StringType,true), > StructField(PartNumber,StringType,true), >
[jira] [Created] (SPARK-44770) Add a displayOrder variable to WebUITab to specify the order in which tabs appear
Jason Li created SPARK-44770: Summary: Add a displayOrder variable to WebUITab to specify the order in which tabs appear Key: SPARK-44770 URL: https://issues.apache.org/jira/browse/SPARK-44770 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 3.5.1 Reporter: Jason Li Add a displayOrder variable to WebUITab to specify the order in which tabs appear. Currently, the tabs are ordered by when they get attached, which isn't always desired. The default is MIN_VALUE, meaning if it's not specified, it will appear in the order added before any tabs with a non-default displayOrder. For example, we would like to have the SQL Tab appear before the Connect tab; however, based on the code flow, the Connect tab will be attached first and with the current logic, that tab would also appear first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44769) Add SQL statement to create an empty array with a type
Holden Karau created SPARK-44769: Summary: Add SQL statement to create an empty array with a type Key: SPARK-44769 URL: https://issues.apache.org/jira/browse/SPARK-44769 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Holden Karau -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44768) Improve WSCG handling of row buffer by accounting for executor memory . Exploding nested arrays can easily lead to out of memory errors.
[ https://issues.apache.org/jira/browse/SPARK-44768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44768: Summary: Improve WSCG handling of row buffer by accounting for executor memory . Exploding nested arrays can easily lead to out of memory errors. (was: Improve WSCG handling of row buffer by accounting for executor memory) > Improve WSCG handling of row buffer by accounting for executor memory . > Exploding nested arrays can easily lead to out of memory errors. > --- > > Key: SPARK-44768 > URL: https://issues.apache.org/jira/browse/SPARK-44768 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.2, 3.4.0, 3.4.1 >Reporter: Franck Tago >Priority: Major > > consider a scenario where you flatten a nested array > // e.g you can use the following steps to create the dataframe > /create a partClass > case class partClass (PARTNAME: String , PartNumber: String , PartPrice : > Double ) > //create a nested array array class > case class array_array_class ( > col_int: Int, > arr_arr_string : Seq[Seq[String]], > arr_arr_bigint : Seq[Seq[Long]], > col_string : String, > parts_s : Seq[Seq[partClass]] > > ) > //create a dataframe > var df_array_array = sc.parallelize( > Seq( > (1,Seq(Seq("a","b" ,"c" ,"d") ,Seq("aa","bb" ,"cc","dd")) , > Seq(Seq(1000,2), Seq(3,-1)),"ItemPart1", > Seq(Seq(partClass("PNAME1","P1",20.75),partClass("PNAME1_1","P1_1",30.75))) > ) , > > (2,Seq(Seq("ab","bc" ,"cd" ,"de") ,Seq("aab","bbc" > ,"ccd","dde"),Seq("aab")) , Seq(Seq(-1000,-2,-1,-2), > Seq(0,3,-1)),"ItemPart2", > > Seq(Seq(partClass("PNAME2","P2",170.75),partClass("PNAME2_1","P2_1",33.75),partClass("PNAME2_2","P2_2",73.75))) > ) > > ) > ).toDF("c1" ,"c2" ,"c3" ,"c4" ,"c5") > //explode a nested array > var result = df_array_array.select( col("c1"), > explode(col("c2"))).select('c1 , explode('col)) > result.explain > > The physical for this operator is seen below. > - > Physical plan > == Physical Plan == > *(1) Generate explode(col#27), [c1#17], false, [col#30] > +- *(1) Filter ((size(col#27, true) > 0) AND isnotnull(col#27)) > +- *(1) Generate explode(c2#18), [c1#17], false, [col#27] > +- *(1) Project [_1#6 AS c1#17, _2#7 AS c2#18] > +- *(1) Filter ((size(_2#7, true) > 0) AND isnotnull(_2#7)) > +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, > scala.Tuple5, true]))._1 AS _1#6, mapobjects(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -1), > mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), > true, -2), staticinvoke(class org.apache.spark.unsafe.types.UTF8String, > StringType, fromString, validateexternaltype(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -2), StringType, ObjectType(class > java.lang.String)), true, false, true), > validateexternaltype(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -1), ArrayType(StringType,true), > ObjectType(interface scala.collection.Seq)), None), > knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._2, None) AS _2#7, > mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), > true, -3), mapobjects(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -4), > assertnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -4), IntegerType, IntegerType)), > validateexternaltype(lambdavariable(MapObject, ObjectType(class > java.lang.Object), true, -3), ArrayType(IntegerType,false), > ObjectType(interface scala.collection.Seq)), None), > knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._3, None) AS _3#8, > staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, > fromString, knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._4, > true, false, true) AS _4#9, mapobjects(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -5), > mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), > true, -6), if (isnull(validateexternaltype(lambdavariable(MapObject, > ObjectType(class java.lang.Object), true, -6), > StructField(PARTNAME,StringType,true), > StructField(PartNumber,StringType,true), > StructField(PartPrice,DoubleType,false), ObjectType(class > $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$partClass null else > named_struct(PARTNAME, staticinvoke(class > org.apache.spark.unsafe.types.UTF8String, StringType, fromString, > knownnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class >
[jira] [Created] (SPARK-44768) Improve WSCG handling of row buffer by accounting for executor memory
Franck Tago created SPARK-44768: --- Summary: Improve WSCG handling of row buffer by accounting for executor memory Key: SPARK-44768 URL: https://issues.apache.org/jira/browse/SPARK-44768 Project: Spark Issue Type: Bug Components: Optimizer Affects Versions: 3.4.1, 3.4.0, 3.3.2 Reporter: Franck Tago consider a scenario where you flatten a nested array // e.g you can use the following steps to create the dataframe /create a partClass case class partClass (PARTNAME: String , PartNumber: String , PartPrice : Double ) //create a nested array array class case class array_array_class ( col_int: Int, arr_arr_string : Seq[Seq[String]], arr_arr_bigint : Seq[Seq[Long]], col_string : String, parts_s : Seq[Seq[partClass]] ) //create a dataframe var df_array_array = sc.parallelize( Seq( (1,Seq(Seq("a","b" ,"c" ,"d") ,Seq("aa","bb" ,"cc","dd")) , Seq(Seq(1000,2), Seq(3,-1)),"ItemPart1", Seq(Seq(partClass("PNAME1","P1",20.75),partClass("PNAME1_1","P1_1",30.75))) ) , (2,Seq(Seq("ab","bc" ,"cd" ,"de") ,Seq("aab","bbc" ,"ccd","dde"),Seq("aab")) , Seq(Seq(-1000,-2,-1,-2), Seq(0,3,-1)),"ItemPart2", Seq(Seq(partClass("PNAME2","P2",170.75),partClass("PNAME2_1","P2_1",33.75),partClass("PNAME2_2","P2_2",73.75))) ) ) ).toDF("c1" ,"c2" ,"c3" ,"c4" ,"c5") //explode a nested array var result = df_array_array.select( col("c1"), explode(col("c2"))).select('c1 , explode('col)) result.explain The physical for this operator is seen below. - Physical plan == Physical Plan == *(1) Generate explode(col#27), [c1#17], false, [col#30] +- *(1) Filter ((size(col#27, true) > 0) AND isnotnull(col#27)) +- *(1) Generate explode(c2#18), [c1#17], false, [col#27] +- *(1) Project [_1#6 AS c1#17, _2#7 AS c2#18] +- *(1) Filter ((size(_2#7, true) > 0) AND isnotnull(_2#7)) +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._1 AS _1#6, mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -1), mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -2), staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -2), StringType, ObjectType(class java.lang.String)), true, false, true), validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -1), ArrayType(StringType,true), ObjectType(interface scala.collection.Seq)), None), knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._2, None) AS _2#7, mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -3), mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -4), assertnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -4), IntegerType, IntegerType)), validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -3), ArrayType(IntegerType,false), ObjectType(interface scala.collection.Seq)), None), knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._3, None) AS _3#8, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(assertnotnull(input[0, scala.Tuple5, true]))._4, true, false, true) AS _4#9, mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -5), mapobjects(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -6), if (isnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -6), StructField(PARTNAME,StringType,true), StructField(PartNumber,StringType,true), StructField(PartPrice,DoubleType,false), ObjectType(class $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$partClass null else named_struct(PARTNAME, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -6), StructField(PARTNAME,StringType,true), StructField(PartNumber,StringType,true), StructField(PartPrice,DoubleType,false), ObjectType(class $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$partClass))).PARTNAME, true, false, true), PartNumber, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, knownnotnull(validateexternaltype(lambdavariable(MapObject, ObjectType(class java.lang.Object), true, -6), StructField(PARTNAME,StringType,true), StructField(PartNumber,StringType,true), StructField(PartPrice,DoubleType,false), ObjectType(class $line14.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$partClass))).PartNumber, true, false, true), PartPrice,
[jira] [Commented] (SPARK-44767) Plugin API for PySpark and SparkR workers
[ https://issues.apache.org/jira/browse/SPARK-44767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752926#comment-17752926 ] Willi Raschkowski commented on SPARK-44767: --- I put up a proposal implementation here: https://github.com/apache/spark/pull/42440 > Plugin API for PySpark and SparkR workers > - > > Key: SPARK-44767 > URL: https://issues.apache.org/jira/browse/SPARK-44767 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Willi Raschkowski >Priority: Major > > An API to customize Python and R workers allows for extensibility beyond what > can be expressed via static configs and environment variables like, e.g., > {{spark.pyspark.python}}. > A use case for this is overriding {{PATH}} when using {{spark.archives}} > with, say, conda-pack (as documented > [here|https://spark.apache.org/docs/3.1.1/api/python/user_guide/python_packaging.html#using-conda]). > Some packages rely on binaries. And if we want to use those packages in > Spark, we need to include their binaries in the {{PATH}}. > But we can't set the {{PATH}} via some config because 1) the environment with > its binaries may be at a dynamic location (archives are unpacked on the > driver [into a directory with random > name|https://github.com/apache/spark/blob/5db87787d5cc1cefb51ec77e49bac7afaa46d300/core/src/main/scala/org/apache/spark/SparkFiles.scala#L33-L37]), > and 2) we may not want to override the {{PATH}} that's pre-configured on the > hosts. > Other use cases unlocked by this include overriding the executable > dynamically (e.g., to select a version) or forking/redirecting the worker's > output stream. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44767) Plugin API for PySpark and SparkR workers
[ https://issues.apache.org/jira/browse/SPARK-44767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Willi Raschkowski updated SPARK-44767: -- Summary: Plugin API for PySpark and SparkR workers (was: Plugin API for PySpark and SparkR subprocesses) > Plugin API for PySpark and SparkR workers > - > > Key: SPARK-44767 > URL: https://issues.apache.org/jira/browse/SPARK-44767 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Willi Raschkowski >Priority: Major > > An API to customize Python and R workers allows for extensibility beyond what > can be expressed via static configs and environment variables like, e.g., > {{spark.pyspark.python}}. > A use case for this is overriding {{PATH}} when using {{spark.archives}} > with, say, conda-pack (as documented > [here|https://spark.apache.org/docs/3.1.1/api/python/user_guide/python_packaging.html#using-conda]). > Some packages rely on binaries. And if we want to use those packages in > Spark, we need to include their binaries in the {{PATH}}. > But we can't set the {{PATH}} via some config because 1) the environment with > its binaries may be at a dynamic location (archives are unpacked on the > driver [into a directory with random > name|https://github.com/apache/spark/blob/5db87787d5cc1cefb51ec77e49bac7afaa46d300/core/src/main/scala/org/apache/spark/SparkFiles.scala#L33-L37]), > and 2) we may not want to override the {{PATH}} that's pre-configured on the > hosts. > Other use cases unlocked by this include overriding the executable > dynamically (e.g., to select a version) or forking/redirecting the worker's > output stream. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44767) Plugin API for PySpark and SparkR subprocesses
Willi Raschkowski created SPARK-44767: - Summary: Plugin API for PySpark and SparkR subprocesses Key: SPARK-44767 URL: https://issues.apache.org/jira/browse/SPARK-44767 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 3.4.1 Reporter: Willi Raschkowski An API to customize Python and R workers allows for extensibility beyond what can be expressed via static configs and environment variables like, e.g., {{spark.pyspark.python}}. A use case we had for this is overriding {{PATH}} when using {{spark.archives}} with, say, conda-pack (as documented [here|https://spark.apache.org/docs/3.1.1/api/python/user_guide/python_packaging.html#using-conda]). Some packages rely on binaries. And if we want to use those packages in Spark, we need to include their binaries in the {{PATH}}. But we can't set the {{PATH}} via some config because 1) the environment with its binaries may be at a dynamic location (archives are unpacked on the driver [into a directory with random name|https://github.com/apache/spark/blob/5db87787d5cc1cefb51ec77e49bac7afaa46d300/core/src/main/scala/org/apache/spark/SparkFiles.scala#L33-L37]), and 2) we may not want to override the {{PATH}} that's pre-configured on the hosts. Other use cases unlocked by this include overriding the executable dynamically (e.g., to select a version) or forking/redirecting the worker's output stream. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44767) Plugin API for PySpark and SparkR subprocesses
[ https://issues.apache.org/jira/browse/SPARK-44767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Willi Raschkowski updated SPARK-44767: -- Description: An API to customize Python and R workers allows for extensibility beyond what can be expressed via static configs and environment variables like, e.g., {{spark.pyspark.python}}. A use case for this is overriding {{PATH}} when using {{spark.archives}} with, say, conda-pack (as documented [here|https://spark.apache.org/docs/3.1.1/api/python/user_guide/python_packaging.html#using-conda]). Some packages rely on binaries. And if we want to use those packages in Spark, we need to include their binaries in the {{PATH}}. But we can't set the {{PATH}} via some config because 1) the environment with its binaries may be at a dynamic location (archives are unpacked on the driver [into a directory with random name|https://github.com/apache/spark/blob/5db87787d5cc1cefb51ec77e49bac7afaa46d300/core/src/main/scala/org/apache/spark/SparkFiles.scala#L33-L37]), and 2) we may not want to override the {{PATH}} that's pre-configured on the hosts. Other use cases unlocked by this include overriding the executable dynamically (e.g., to select a version) or forking/redirecting the worker's output stream. was: An API to customize Python and R workers allows for extensibility beyond what can be expressed via static configs and environment variables like, e.g., {{spark.pyspark.python}}. A use case we had for this is overriding {{PATH}} when using {{spark.archives}} with, say, conda-pack (as documented [here|https://spark.apache.org/docs/3.1.1/api/python/user_guide/python_packaging.html#using-conda]). Some packages rely on binaries. And if we want to use those packages in Spark, we need to include their binaries in the {{PATH}}. But we can't set the {{PATH}} via some config because 1) the environment with its binaries may be at a dynamic location (archives are unpacked on the driver [into a directory with random name|https://github.com/apache/spark/blob/5db87787d5cc1cefb51ec77e49bac7afaa46d300/core/src/main/scala/org/apache/spark/SparkFiles.scala#L33-L37]), and 2) we may not want to override the {{PATH}} that's pre-configured on the hosts. Other use cases unlocked by this include overriding the executable dynamically (e.g., to select a version) or forking/redirecting the worker's output stream. > Plugin API for PySpark and SparkR subprocesses > -- > > Key: SPARK-44767 > URL: https://issues.apache.org/jira/browse/SPARK-44767 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: Willi Raschkowski >Priority: Major > > An API to customize Python and R workers allows for extensibility beyond what > can be expressed via static configs and environment variables like, e.g., > {{spark.pyspark.python}}. > A use case for this is overriding {{PATH}} when using {{spark.archives}} > with, say, conda-pack (as documented > [here|https://spark.apache.org/docs/3.1.1/api/python/user_guide/python_packaging.html#using-conda]). > Some packages rely on binaries. And if we want to use those packages in > Spark, we need to include their binaries in the {{PATH}}. > But we can't set the {{PATH}} via some config because 1) the environment with > its binaries may be at a dynamic location (archives are unpacked on the > driver [into a directory with random > name|https://github.com/apache/spark/blob/5db87787d5cc1cefb51ec77e49bac7afaa46d300/core/src/main/scala/org/apache/spark/SparkFiles.scala#L33-L37]), > and 2) we may not want to override the {{PATH}} that's pre-configured on the > hosts. > Other use cases unlocked by this include overriding the executable > dynamically (e.g., to select a version) or forking/redirecting the worker's > output stream. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44766) Cache the pandas converter for Python UDTFs
Allison Wang created SPARK-44766: Summary: Cache the pandas converter for Python UDTFs Key: SPARK-44766 URL: https://issues.apache.org/jira/browse/SPARK-44766 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.5.0 Reporter: Allison Wang We should cache the pandas converter when serializing pandas dataframe to arrow batch. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44765) Make ReleaseExecute retry in ExecutePlanResponseReattachableIterator reuse common mechanism
Juliusz Sompolski created SPARK-44765: - Summary: Make ReleaseExecute retry in ExecutePlanResponseReattachableIterator reuse common mechanism Key: SPARK-44765 URL: https://issues.apache.org/jira/browse/SPARK-44765 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44764) Streaming process improvement
Wei Liu created SPARK-44764: --- Summary: Streaming process improvement Key: SPARK-44764 URL: https://issues.apache.org/jira/browse/SPARK-44764 Project: Spark Issue Type: New Feature Components: Connect, Structured Streaming Affects Versions: 4.0.0 Reporter: Wei Liu # Deduplicate or remove StreamingPythonRunner if possible, it is very similar to existing PythonRunner # Use the DAEMON mode for starting a new process -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44759) Do not combine multiple Generate operators in the same WholeStageCodeGen node because it can easily cause OOM failures if arrays are relatively large
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Description: This is an issue since the WSCG implementation of the generate node. Because WSCG compute rows in batches , the combination of WSCG and the explode operation consume a lot of the dedicated executor memory. This is even more true when the WSCG node contains multiple explode nodes. This is the case when flattening a nested array. The generate node used to flatten array generally produces an amount of output rows that is significantly higher than the input rows. the number of output rows generated is even drastically higher when flattening a nested array . When we combine more that 1 generate node in the same WholeStageCodeGen node, we run a high risk of running out of memory for multiple reasons. 1- As you can see from snapshots added in the comments , the rows created in the nested loop are saved in a writer buffer. In this case because the rows were big , the job failed with an Out Of Memory Exception error . 2_ The generated WholeStageCodeGen result in a nested loop that for each row , will explode the parent array and then explode the inner array. The rows are accumulated in the writer buffer without accounting for the row size. Please view the attached Spark Gui and Spark Dag In my case the wholestagecodegen includes 2 explode nodes. Because the array elements are large , we end up with an Out Of Memory error. I recommend that we do not merge multiple explode nodes in the same whole stage code gen node . Doing so leads to potential memory issues. In our case , the job execution failed with an OOM error because the the WSCG executed into a nested for loop . was: This is an issue since the WSCG implementation of the generate node. The generate node used to flatten array generally produces an amount of output rows that is significantly higher than the input rows to the node. the number of output rows generated is even drastically higher when flattening a nested array . When we combine more that 1 generate node in the same WholeStageCodeGen node, we run a high risk of running out of memory for multiple reasons. 1- As you can see from the attachment , the rows created in the nested loop are saved in writer buffer. In my case because the rows were big , I hit an Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a nested loop that for each row , will explode the parent array and then explode the inner array. This is prone to OutOfmerry errors Please view the attached Spark Gui and Spark Dag In my case the wholestagecodegen includes 2 explode nodes. Because the array elements are large , we end up with an Out Of Memory error. I recommend that we do not merge multiple explode nodes in the same whole stage code gen node . Doing so leads to potential memory issues. In our case , the job execution failed with an OOM error because the the WSCG executed into a nested for loop . > Do not combine multiple Generate operators in the same WholeStageCodeGen node > because it can easily cause OOM failures if arrays are relatively large > -- > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, > 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.2, 3.4.0, 3.4.1 >Reporter: Franck Tago >Priority: Major > Attachments: image-2023-08-10-09-27-24-124.png, > image-2023-08-10-09-29-24-804.png, image-2023-08-10-09-32-46-163.png, > image-2023-08-10-09-33-47-788.png, > wholestagecodegen_wc1_debug_wholecodegen_passed > > > This is an issue since the WSCG implementation of the generate node. > Because WSCG compute rows in batches , the combination of WSCG and the > explode operation consume a lot of the dedicated executor memory. This is > even more true when the WSCG node contains multiple explode nodes. This is > the case when flattening a nested array. > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from snapshots added in the comments , the rows created in > the nested loop are saved in a writer buffer. In this case
[jira] [Created] (SPARK-44763) Fix a bug of promoting string as double in binary arithmetic with interval
Gengliang Wang created SPARK-44763: -- Summary: Fix a bug of promoting string as double in binary arithmetic with interval Key: SPARK-44763 URL: https://issues.apache.org/jira/browse/SPARK-44763 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 4.0.0 Reporter: Gengliang Wang Assignee: Gengliang Wang The following query works on branch-3.5 or below, but fails on the latest master: ``` select concat(DATE'2020-12-31', ' ', date_format('09:03:08', 'HH:mm:ss')) + (INTERVAL '03' HOUR) ``` The direct reason is now we mark `cast(date as string)` as resolved during type coercion after changes [https://github.com/apache/spark/pull/42089.] As a result, there are two transforms from CombinedTypeCoercionRule ``` Rule ConcatCoercion Transformed concat(2020-12-31, , date_format(cast(09:03:08 as timestamp), HH:mm:ss, Some(America/Los_Angeles))) to concat(cast(2020-12-31 as string), , date_format(cast(09:03:08 as timestamp), HH:mm:ss, Some(America/Los_Angeles))) Rule PromoteStrings Transformed (concat(cast(2020-12-31 as string), , date_format(cast(09:03:08 as timestamp), HH:mm:ss, Some(America/Los_Angeles))) + INTERVAL '03' HOUR) to (cast(concat(cast(2020-12-31 as string), , date_format(cast(09:03:08 as timestamp), HH:mm:ss, Some(America/Los_Angeles))) as double) + INTERVAL '03' HOUR) ``` The second transform doesn't happen in previous releases since cast(2020-12-31 as string) used to be unresolved after the first transform. The fix is simple, the analyzer should not promote string as double in binary arithmetic with ANSI interval. The changes in [https://github.com/apache/spark/pull/42089|https://github.com/apache/spark/pull/42089.] are valid and we should keep it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44759) Do not combine multiple Generate operators in the same WholeStageCodeGen node because it can easily cause OOM failures if arrays are relatively large
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Summary: Do not combine multiple Generate operators in the same WholeStageCodeGen node because it can easily cause OOM failures if arrays are relatively large (was: Do not combine multiple Generate nodes in the same WholeStageCodeGen node because it can easily cause OOM failures if arrays are relatively large) > Do not combine multiple Generate operators in the same WholeStageCodeGen node > because it can easily cause OOM failures if arrays are relatively large > -- > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, > 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.2, 3.4.0, 3.4.1 >Reporter: Franck Tago >Priority: Major > Attachments: image-2023-08-10-09-27-24-124.png, > image-2023-08-10-09-29-24-804.png, image-2023-08-10-09-32-46-163.png, > image-2023-08-10-09-33-47-788.png, > wholestagecodegen_wc1_debug_wholecodegen_passed > > > This is an issue since the WSCG implementation of the generate node. > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whole > stage code gen node . Doing so leads to potential memory issues. > In our case , the job execution failed with an OOM error because the the > WSCG executed into a nested for loop . > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen node because it can easily cause OOM failures if arrays are relatively large
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Summary: Do not combine multiple Generate nodes in the same WholeStageCodeGen node because it can easily cause OOM failures if arrays are relatively large (was: Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large) > Do not combine multiple Generate nodes in the same WholeStageCodeGen node > because it can easily cause OOM failures if arrays are relatively large > -- > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, > 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.2, 3.4.0, 3.4.1 >Reporter: Franck Tago >Priority: Major > Attachments: image-2023-08-10-09-27-24-124.png, > image-2023-08-10-09-29-24-804.png, image-2023-08-10-09-32-46-163.png, > image-2023-08-10-09-33-47-788.png, > wholestagecodegen_wc1_debug_wholecodegen_passed > > > This is an issue since the WSCG implementation of the generate node. > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whole > stage code gen node . Doing so leads to potential memory issues. > In our case , the job execution failed with an OOM error because the the > WSCG executed into a nested for loop . > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Description: This is an issue since the WSCG implementation of the generate node. The generate node used to flatten array generally produces an amount of output rows that is significantly higher than the input rows to the node. the number of output rows generated is even drastically higher when flattening a nested array . When we combine more that 1 generate node in the same WholeStageCodeGen node, we run a high risk of running out of memory for multiple reasons. 1- As you can see from the attachment , the rows created in the nested loop are saved in writer buffer. In my case because the rows were big , I hit an Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a nested loop that for each row , will explode the parent array and then explode the inner array. This is prone to OutOfmerry errors Please view the attached Spark Gui and Spark Dag In my case the wholestagecodegen includes 2 explode nodes. Because the array elements are large , we end up with an Out Of Memory error. I recommend that we do not merge multiple explode nodes in the same whole stage code gen node . Doing so leads to potential memory issues. In our case , the job execution failed with an OOM error because the the WSCG executed into a nested for loop . was: The generate node used to flatten array generally produces an amount of output rows that is significantly higher than the input rows to the node. the number of output rows generated is even drastically higher when flattening a nested array . When we combine more that 1 generate node in the same WholeStageCodeGen node, we run a high risk of running out of memory for multiple reasons. 1- As you can see from the attachment , the rows created in the nested loop are saved in writer buffer. In my case because the rows were big , I hit an Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a nested loop that for each row , will explode the parent array and then explode the inner array. This is prone to OutOfmerry errors Please view the attached Spark Gui and Spark Dag In my case the wholestagecodegen includes 2 explode nodes. Because the array elements are large , we end up with an Out Of Memory error. I recommend that we do not merge multiple explode nodes in the same whole stage code gen node . Doing so leads to potential memory issues. In our case , the job execution failed with an OOM error because the the WSCG executed into a nested for loop . > Do not combine multiple Generate nodes in the same WholeStageCodeGen > nodebecause it can easily cause OOM failures if arrays are relatively large > - > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, > 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.2, 3.4.0, 3.4.1 >Reporter: Franck Tago >Priority: Major > Attachments: image-2023-08-10-09-27-24-124.png, > image-2023-08-10-09-29-24-804.png, image-2023-08-10-09-32-46-163.png, > image-2023-08-10-09-33-47-788.png, > wholestagecodegen_wc1_debug_wholecodegen_passed > > > This is an issue since the WSCG implementation of the generate node. > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whole > stage code gen node . Doing so leads to potential memory issues. > In our case , the job execution failed with an OOM error because the the > WSCG executed
[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Affects Version/s: 3.4.1 3.4.0 3.3.2 3.2.4 3.2.3 3.2.2 3.2.1 3.1.3 3.2.0 3.1.2 3.1.1 3.1.0 3.0.3 3.0.2 3.0.1 3.0.0 > Do not combine multiple Generate nodes in the same WholeStageCodeGen > nodebecause it can easily cause OOM failures if arrays are relatively large > - > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, > 3.1.3, 3.2.1, 3.3.0, 3.2.2, 3.3.1, 3.2.3, 3.2.4, 3.3.2, 3.4.0, 3.4.1 >Reporter: Franck Tago >Priority: Major > Attachments: image-2023-08-10-09-27-24-124.png, > image-2023-08-10-09-29-24-804.png, image-2023-08-10-09-32-46-163.png, > image-2023-08-10-09-33-47-788.png, > wholestagecodegen_wc1_debug_wholecodegen_passed > > > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whole > stage code gen node . Doing so leads to potential memory issues. > In our case , the job execution failed with an OOM error because the the > WSCG executed into a nested for loop . > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Description: The generate node used to flatten array generally produces an amount of output rows that is significantly higher than the input rows to the node. the number of output rows generated is even drastically higher when flattening a nested array . When we combine more that 1 generate node in the same WholeStageCodeGen node, we run a high risk of running out of memory for multiple reasons. 1- As you can see from the attachment , the rows created in the nested loop are saved in writer buffer. In my case because the rows were big , I hit an Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a nested loop that for each row , will explode the parent array and then explode the inner array. This is prone to OutOfmerry errors Please view the attached Spark Gui and Spark Dag In my case the wholestagecodegen includes 2 explode nodes. Because the array elements are large , we end up with an Out Of Memory error. I recommend that we do not merge multiple explode nodes in the same whole stage code gen node . Doing so leads to potential memory issues. In our case , the job execution failed with an OOM error because the the WSCG executed into a nested for loop . was: The generate node used to flatten array generally produces an amount of output rows that is significantly higher than the input rows to the node. the number of output rows generated is even drastically higher when flattening a nested array . When we combine more that 1 generate node in the same WholeStageCodeGen node, we run a high risk of running out of memory for multiple reasons. 1- As you can see from the attachment , the rows created in the nested loop are saved in writer buffer. In my case because the rows were big , I hit an Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a nested loop that for each row , will explode the parent array and then explode the inner array. This is prone to OutOfmerry errors Please view the attached Spark Gui and Spark Dag In my case the wholestagecodegen includes 2 explode nodes. Because the array elements are large , we end up with an Out Of Memory error. I recommend that we do not merge multiple explode nodes in the same whoe stage code gen node . Doing so leads to potential memory issues. WSCG method for first Generate node !image-2023-08-10-09-22-29-949.png! WSCG for second Generate node As you can see the execution of generate_doConsume_1 and generate_doConsume_0 triggers a nested loop. !image-2023-08-10-09-24-02-755.png! > Do not combine multiple Generate nodes in the same WholeStageCodeGen > nodebecause it can easily cause OOM failures if arrays are relatively large > - > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.0, 3.3.1 >Reporter: Franck Tago >Priority: Major > Attachments: image-2023-08-10-09-27-24-124.png, > image-2023-08-10-09-29-24-804.png, image-2023-08-10-09-32-46-163.png, > image-2023-08-10-09-33-47-788.png, > wholestagecodegen_wc1_debug_wholecodegen_passed > > > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whole > stage code gen node . Doing so leads to potential memory issues. > In our case , the job execution failed with an OOM error because the the > WSCG executed into a nested for loop . > > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Description: The generate node used to flatten array generally produces an amount of output rows that is significantly higher than the input rows to the node. the number of output rows generated is even drastically higher when flattening a nested array . When we combine more that 1 generate node in the same WholeStageCodeGen node, we run a high risk of running out of memory for multiple reasons. 1- As you can see from the attachment , the rows created in the nested loop are saved in writer buffer. In my case because the rows were big , I hit an Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a nested loop that for each row , will explode the parent array and then explode the inner array. This is prone to OutOfmerry errors Please view the attached Spark Gui and Spark Dag In my case the wholestagecodegen includes 2 explode nodes. Because the array elements are large , we end up with an Out Of Memory error. I recommend that we do not merge multiple explode nodes in the same whoe stage code gen node . Doing so leads to potential memory issues. WSCG method for first Generate node !image-2023-08-10-09-22-29-949.png! WSCG for second Generate node As you can see the execution of generate_doConsume_1 and generate_doConsume_0 triggers a nested loop. !image-2023-08-10-09-24-02-755.png! was: The generate node used to flatten array generally produces an amount of output rows that is significantly higher than the input rows to the node. the number of output rows generated is even drastically higher when flattening a nested array . When we combine more that 1 generate node in the same WholeStageCodeGen node, we run a high risk of running out of memory for multiple reasons. 1- As you can see from the attachment , the rows created in the nested loop are saved in writer buffer. In my case because the rows were big , I hit an Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a nested loop that for each row , will explode the parent array and then explode the inner array. This is prone to OutOfmerry errors Please view the attached Spark Gui and Spark Dag In my case the wholestagecodegen includes 2 explode nodes. Because the array elements are large , we end up with an Out Of Memory error. I recommend that we do not merge multiple explode nodes in the same whoe stage code gen node . Doing so leads to potential memory issues. !image-2023-08-10-02-18-24-758.png! !image-2023-08-10-09-19-23-973.png! !image-2023-08-10-09-21-32-371.png! WSCG method for first Generate node !image-2023-08-10-09-22-29-949.png! WSCG for second Generate node As you can see the execution of generate_doConsume_1 and generate_doConsume_0 triggers a nested loop. !image-2023-08-10-09-24-02-755.png! > Do not combine multiple Generate nodes in the same WholeStageCodeGen > nodebecause it can easily cause OOM failures if arrays are relatively large > - > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.0, 3.3.1 >Reporter: Franck Tago >Priority: Major > Attachments: image-2023-08-10-09-27-24-124.png, > image-2023-08-10-09-29-24-804.png, image-2023-08-10-09-32-46-163.png, > image-2023-08-10-09-33-47-788.png, > wholestagecodegen_wc1_debug_wholecodegen_passed > > > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whoe > stage code gen node .
[jira] [Commented] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752870#comment-17752870 ] Franck Tago commented on SPARK-44759: - Spark Dag for the use case . The failure is from the execution of WholeStageCodeGen(2) !image-2023-08-10-09-33-47-788.png! > Do not combine multiple Generate nodes in the same WholeStageCodeGen > nodebecause it can easily cause OOM failures if arrays are relatively large > - > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.0, 3.3.1 >Reporter: Franck Tago >Priority: Major > Attachments: image-2023-08-10-09-27-24-124.png, > image-2023-08-10-09-29-24-804.png, image-2023-08-10-09-32-46-163.png, > image-2023-08-10-09-33-47-788.png, > wholestagecodegen_wc1_debug_wholecodegen_passed > > > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whoe > stage code gen node . Doing so leads to potential memory issues. > > !image-2023-08-10-02-18-24-758.png! > > !image-2023-08-10-09-19-23-973.png! > > !image-2023-08-10-09-21-32-371.png! > > WSCG method for first Generate node > !image-2023-08-10-09-22-29-949.png! > > WSCG for second Generate node > As you can see the execution of generate_doConsume_1 and generate_doConsume_0 > triggers a nested loop. > !image-2023-08-10-09-24-02-755.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Attachment: image-2023-08-10-09-33-47-788.png > Do not combine multiple Generate nodes in the same WholeStageCodeGen > nodebecause it can easily cause OOM failures if arrays are relatively large > - > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.0, 3.3.1 >Reporter: Franck Tago >Priority: Major > Attachments: image-2023-08-10-09-27-24-124.png, > image-2023-08-10-09-29-24-804.png, image-2023-08-10-09-32-46-163.png, > image-2023-08-10-09-33-47-788.png, > wholestagecodegen_wc1_debug_wholecodegen_passed > > > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whoe > stage code gen node . Doing so leads to potential memory issues. > > !image-2023-08-10-02-18-24-758.png! > > !image-2023-08-10-09-19-23-973.png! > > !image-2023-08-10-09-21-32-371.png! > > WSCG method for first Generate node > !image-2023-08-10-09-22-29-949.png! > > WSCG for second Generate node > As you can see the execution of generate_doConsume_1 and generate_doConsume_0 > triggers a nested loop. > !image-2023-08-10-09-24-02-755.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752868#comment-17752868 ] Franck Tago commented on SPARK-44759: - WSCG generated code for second Generate node !image-2023-08-10-09-32-46-163.png! > Do not combine multiple Generate nodes in the same WholeStageCodeGen > nodebecause it can easily cause OOM failures if arrays are relatively large > - > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.0, 3.3.1 >Reporter: Franck Tago >Priority: Major > Attachments: image-2023-08-10-09-27-24-124.png, > image-2023-08-10-09-29-24-804.png, image-2023-08-10-09-32-46-163.png, > wholestagecodegen_wc1_debug_wholecodegen_passed > > > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whoe > stage code gen node . Doing so leads to potential memory issues. > > !image-2023-08-10-02-18-24-758.png! > > !image-2023-08-10-09-19-23-973.png! > > !image-2023-08-10-09-21-32-371.png! > > WSCG method for first Generate node > !image-2023-08-10-09-22-29-949.png! > > WSCG for second Generate node > As you can see the execution of generate_doConsume_1 and generate_doConsume_0 > triggers a nested loop. > !image-2023-08-10-09-24-02-755.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Attachment: image-2023-08-10-09-32-46-163.png > Do not combine multiple Generate nodes in the same WholeStageCodeGen > nodebecause it can easily cause OOM failures if arrays are relatively large > - > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.0, 3.3.1 >Reporter: Franck Tago >Priority: Major > Attachments: image-2023-08-10-09-27-24-124.png, > image-2023-08-10-09-29-24-804.png, image-2023-08-10-09-32-46-163.png, > wholestagecodegen_wc1_debug_wholecodegen_passed > > > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whoe > stage code gen node . Doing so leads to potential memory issues. > > !image-2023-08-10-02-18-24-758.png! > > !image-2023-08-10-09-19-23-973.png! > > !image-2023-08-10-09-21-32-371.png! > > WSCG method for first Generate node > !image-2023-08-10-09-22-29-949.png! > > WSCG for second Generate node > As you can see the execution of generate_doConsume_1 and generate_doConsume_0 > triggers a nested loop. > !image-2023-08-10-09-24-02-755.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Attachment: image-2023-08-10-09-29-24-804.png > Do not combine multiple Generate nodes in the same WholeStageCodeGen > nodebecause it can easily cause OOM failures if arrays are relatively large > - > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.0, 3.3.1 >Reporter: Franck Tago >Priority: Major > Attachments: image-2023-08-10-09-27-24-124.png, > image-2023-08-10-09-29-24-804.png, > wholestagecodegen_wc1_debug_wholecodegen_passed > > > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whoe > stage code gen node . Doing so leads to potential memory issues. > > !image-2023-08-10-02-18-24-758.png! > > !image-2023-08-10-09-19-23-973.png! > > !image-2023-08-10-09-21-32-371.png! > > WSCG method for first Generate node > !image-2023-08-10-09-22-29-949.png! > > WSCG for second Generate node > As you can see the execution of generate_doConsume_1 and generate_doConsume_0 > triggers a nested loop. > !image-2023-08-10-09-24-02-755.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752866#comment-17752866 ] Franck Tago commented on SPARK-44759: - WSCG generated code for first Generate node !image-2023-08-10-09-29-24-804.png! > Do not combine multiple Generate nodes in the same WholeStageCodeGen > nodebecause it can easily cause OOM failures if arrays are relatively large > - > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.0, 3.3.1 >Reporter: Franck Tago >Priority: Major > Attachments: image-2023-08-10-09-27-24-124.png, > image-2023-08-10-09-29-24-804.png, > wholestagecodegen_wc1_debug_wholecodegen_passed > > > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whoe > stage code gen node . Doing so leads to potential memory issues. > > !image-2023-08-10-02-18-24-758.png! > > !image-2023-08-10-09-19-23-973.png! > > !image-2023-08-10-09-21-32-371.png! > > WSCG method for first Generate node > !image-2023-08-10-09-22-29-949.png! > > WSCG for second Generate node > As you can see the execution of generate_doConsume_1 and generate_doConsume_0 > triggers a nested loop. > !image-2023-08-10-09-24-02-755.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752864#comment-17752864 ] Franck Tago edited comment on SPARK-44759 at 8/10/23 4:28 PM: -- WSCG generated code that calls generate_doConsume_0 !image-2023-08-10-09-27-24-124.png! was (Author: tafra...@gmail.com): !image-2023-08-10-09-27-24-124.png! > Do not combine multiple Generate nodes in the same WholeStageCodeGen > nodebecause it can easily cause OOM failures if arrays are relatively large > - > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.0, 3.3.1 >Reporter: Franck Tago >Priority: Major > Attachments: image-2023-08-10-09-27-24-124.png, > wholestagecodegen_wc1_debug_wholecodegen_passed > > > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whoe > stage code gen node . Doing so leads to potential memory issues. > > !image-2023-08-10-02-18-24-758.png! > > !image-2023-08-10-09-19-23-973.png! > > !image-2023-08-10-09-21-32-371.png! > > WSCG method for first Generate node > !image-2023-08-10-09-22-29-949.png! > > WSCG for second Generate node > As you can see the execution of generate_doConsume_1 and generate_doConsume_0 > triggers a nested loop. > !image-2023-08-10-09-24-02-755.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Attachment: image-2023-08-10-09-27-24-124.png > Do not combine multiple Generate nodes in the same WholeStageCodeGen > nodebecause it can easily cause OOM failures if arrays are relatively large > - > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.0, 3.3.1 >Reporter: Franck Tago >Priority: Major > Attachments: image-2023-08-10-09-27-24-124.png, > wholestagecodegen_wc1_debug_wholecodegen_passed > > > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whoe > stage code gen node . Doing so leads to potential memory issues. > > !image-2023-08-10-02-18-24-758.png! > > !image-2023-08-10-09-19-23-973.png! > > !image-2023-08-10-09-21-32-371.png! > > WSCG method for first Generate node > !image-2023-08-10-09-22-29-949.png! > > WSCG for second Generate node > As you can see the execution of generate_doConsume_1 and generate_doConsume_0 > triggers a nested loop. > !image-2023-08-10-09-24-02-755.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752864#comment-17752864 ] Franck Tago commented on SPARK-44759: - !image-2023-08-10-09-27-24-124.png! > Do not combine multiple Generate nodes in the same WholeStageCodeGen > nodebecause it can easily cause OOM failures if arrays are relatively large > - > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.0, 3.3.1 >Reporter: Franck Tago >Priority: Major > Attachments: image-2023-08-10-09-27-24-124.png, > wholestagecodegen_wc1_debug_wholecodegen_passed > > > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whoe > stage code gen node . Doing so leads to potential memory issues. > > !image-2023-08-10-02-18-24-758.png! > > !image-2023-08-10-09-19-23-973.png! > > !image-2023-08-10-09-21-32-371.png! > > WSCG method for first Generate node > !image-2023-08-10-09-22-29-949.png! > > WSCG for second Generate node > As you can see the execution of generate_doConsume_1 and generate_doConsume_0 > triggers a nested loop. > !image-2023-08-10-09-24-02-755.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Summary: Do not combine multiple Generate nodes in the same WholeStageCodeGen nodebecause it can easily cause OOM failures if arrays are relatively large (was: Do not combine multiple Generate nodes in the same WholeStageCodeGen because it can easily cause OOM failures) > Do not combine multiple Generate nodes in the same WholeStageCodeGen > nodebecause it can easily cause OOM failures if arrays are relatively large > - > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.0, 3.3.1 >Reporter: Franck Tago >Priority: Major > Attachments: wholestagecodegen_wc1_debug_wholecodegen_passed > > > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whoe > stage code gen node . Doing so leads to potential memory issues. > > !image-2023-08-10-02-18-24-758.png! > > !image-2023-08-10-09-19-23-973.png! > > !image-2023-08-10-09-21-32-371.png! > > WSCG method for first Generate node > !image-2023-08-10-09-22-29-949.png! > > WSCG for second Generate node > As you can see the execution of generate_doConsume_1 and generate_doConsume_0 > triggers a nested loop. > !image-2023-08-10-09-24-02-755.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen because it can easily cause OOM failures
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Description: The generate node used to flatten array generally produces an amount of output rows that is significantly higher than the input rows to the node. the number of output rows generated is even drastically higher when flattening a nested array . When we combine more that 1 generate node in the same WholeStageCodeGen node, we run a high risk of running out of memory for multiple reasons. 1- As you can see from the attachment , the rows created in the nested loop are saved in writer buffer. In my case because the rows were big , I hit an Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a nested loop that for each row , will explode the parent array and then explode the inner array. This is prone to OutOfmerry errors Please view the attached Spark Gui and Spark Dag In my case the wholestagecodegen includes 2 explode nodes. Because the array elements are large , we end up with an Out Of Memory error. I recommend that we do not merge multiple explode nodes in the same whoe stage code gen node . Doing so leads to potential memory issues. !image-2023-08-10-02-18-24-758.png! !image-2023-08-10-09-19-23-973.png! !image-2023-08-10-09-21-32-371.png! WSCG method for first Generate node !image-2023-08-10-09-22-29-949.png! WSCG for second Generate node As you can see the execution of generate_doConsume_1 and generate_doConsume_0 triggers a nested loop. !image-2023-08-10-09-24-02-755.png! was: The generate node used to flatten array generally produces an amount of output rows that is significantly higher than the input rows to the node. the number of output rows generated is even drastically higher when flattening a nested array . When we combine more that 1 generate node in the same WholeStageCodeGen node, we run a high risk of running out of memory for multiple reasons. 1- As you can see from the attachment , the rows created in the nested loop are saved in writer buffer. In my case because the rows were big , I hit an Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a nested loop that for each row , will explode the parent array and then explode the inner array. This is prone to OutOfmerry errors Please view the attached Spark Gui and Spark Dag In my case the wholestagecodegen includes 2 explode nodes. Because the array elements are large , we end up with an Out Of Memory error. I recommend that we do not merge multiple explode nodes in the same whoe stage code gen node . Doing so leads to potential memory issues. !image-2023-08-10-02-18-24-758.png! > Do not combine multiple Generate nodes in the same WholeStageCodeGen because > it can easily cause OOM failures > -- > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.0, 3.3.1 >Reporter: Franck Tago >Priority: Major > Attachments: wholestagecodegen_wc1_debug_wholecodegen_passed > > > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whoe > stage code gen node . Doing so leads to potential memory issues. > > !image-2023-08-10-02-18-24-758.png! > > !image-2023-08-10-09-19-23-973.png! > > !image-2023-08-10-09-21-32-371.png! > > WSCG method for first Generate node > !image-2023-08-10-09-22-29-949.png! > > WSCG for second Generate node > As you can see the execution of generate_doConsume_1 and generate_doConsume_0 > triggers a nested loop. > !image-2023-08-10-09-24-02-755.png! > -- This
[jira] [Resolved] (SPARK-44760) Index Out Of Bound for JIRA resolution in merge_spark_pr
[ https://issues.apache.org/jira/browse/SPARK-44760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44760. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42429 [https://github.com/apache/spark/pull/42429] > Index Out Of Bound for JIRA resolution in merge_spark_pr > > > Key: SPARK-44760 > URL: https://issues.apache.org/jira/browse/SPARK-44760 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Fix For: 4.0.0 > > > I -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44762) Add more documentation and examples for using job tags for interrupt
Juliusz Sompolski created SPARK-44762: - Summary: Add more documentation and examples for using job tags for interrupt Key: SPARK-44762 URL: https://issues.apache.org/jira/browse/SPARK-44762 Project: Spark Issue Type: Improvement Components: Connect Affects Versions: 3.5.0 Reporter: Juliusz Sompolski Add documentation to spark.addJob tag with similar examples and explanation like SparkContext.setJobGroup -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44741) Support regex-based MetricFilter in StatsdSink
[ https://issues.apache.org/jira/browse/SPARK-44741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44741: -- Affects Version/s: 4.0.0 (was: 3.4.1) > Support regex-based MetricFilter in StatsdSink > -- > > Key: SPARK-44741 > URL: https://issues.apache.org/jira/browse/SPARK-44741 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: rameshkrishnan muthusamy >Assignee: rameshkrishnan muthusamy >Priority: Minor > Labels: Metrics, Sink, statsd > Fix For: 4.0.0 > > > Spark statsd metrics sink in the current state does not support metrics > filtering option. > Though this option is available in the reporters it is not exposed in the > statsD sink. An example of this can be looked at > [https://github.com/apache/spark/blob/be9ffb37585fe421705ceaa52fe49b89c50703a3/core/src/main/scala/org/apache/spark/metrics/sink/GraphiteSink.scala#L76] > > This is a critical option to have when teams does not want all the metrics > that are exposed by spark in the metrics monitoring platforms and switch to > detailed metrics as and when needed. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44741) Support regex-based MetricFilter in StatsdSink
[ https://issues.apache.org/jira/browse/SPARK-44741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44741: -- Summary: Support regex-based MetricFilter in StatsdSink (was: Spark StatsD metrics reported to support metrics filter option ) > Support regex-based MetricFilter in StatsdSink > -- > > Key: SPARK-44741 > URL: https://issues.apache.org/jira/browse/SPARK-44741 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: rameshkrishnan muthusamy >Assignee: rameshkrishnan muthusamy >Priority: Minor > Labels: Metrics, Sink, statsd > Fix For: 4.0.0 > > > Spark statsd metrics sink in the current state does not support metrics > filtering option. > Though this option is available in the reporters it is not exposed in the > statsD sink. An example of this can be looked at > [https://github.com/apache/spark/blob/be9ffb37585fe421705ceaa52fe49b89c50703a3/core/src/main/scala/org/apache/spark/metrics/sink/GraphiteSink.scala#L76] > > This is a critical option to have when teams does not want all the metrics > that are exposed by spark in the metrics monitoring platforms and switch to > detailed metrics as and when needed. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44741) Spark StatsD metrics reported to support metrics filter option
[ https://issues.apache.org/jira/browse/SPARK-44741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44741: - Assignee: rameshkrishnan muthusamy > Spark StatsD metrics reported to support metrics filter option > --- > > Key: SPARK-44741 > URL: https://issues.apache.org/jira/browse/SPARK-44741 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: rameshkrishnan muthusamy >Assignee: rameshkrishnan muthusamy >Priority: Minor > Labels: Metrics, Sink, statsd > > Spark statsd metrics sink in the current state does not support metrics > filtering option. > Though this option is available in the reporters it is not exposed in the > statsD sink. An example of this can be looked at > [https://github.com/apache/spark/blob/be9ffb37585fe421705ceaa52fe49b89c50703a3/core/src/main/scala/org/apache/spark/metrics/sink/GraphiteSink.scala#L76] > > This is a critical option to have when teams does not want all the metrics > that are exposed by spark in the metrics monitoring platforms and switch to > detailed metrics as and when needed. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44741) Spark StatsD metrics reported to support metrics filter option
[ https://issues.apache.org/jira/browse/SPARK-44741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44741. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42416 [https://github.com/apache/spark/pull/42416] > Spark StatsD metrics reported to support metrics filter option > --- > > Key: SPARK-44741 > URL: https://issues.apache.org/jira/browse/SPARK-44741 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.1 >Reporter: rameshkrishnan muthusamy >Assignee: rameshkrishnan muthusamy >Priority: Minor > Labels: Metrics, Sink, statsd > Fix For: 4.0.0 > > > Spark statsd metrics sink in the current state does not support metrics > filtering option. > Though this option is available in the reporters it is not exposed in the > statsD sink. An example of this can be looked at > [https://github.com/apache/spark/blob/be9ffb37585fe421705ceaa52fe49b89c50703a3/core/src/main/scala/org/apache/spark/metrics/sink/GraphiteSink.scala#L76] > > This is a critical option to have when teams does not want all the metrics > that are exposed by spark in the metrics monitoring platforms and switch to > detailed metrics as and when needed. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44700) Rule OptimizeCsvJsonExprs should not be applied to expression like from_json(regexp_replace)
[ https://issues.apache.org/jira/browse/SPARK-44700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44700: Fix Version/s: 3.3.0 > Rule OptimizeCsvJsonExprs should not be applied to expression like > from_json(regexp_replace) > > > Key: SPARK-44700 > URL: https://issues.apache.org/jira/browse/SPARK-44700 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: jiahong.li >Priority: Minor > Fix For: 3.3.0 > > > _SQL_ like below: > select tmp.* > from > (select > device_id, ads_id, > from_json(regexp_replace(device_personas, '(?<=(\\{|,))"device_', > '"user_device_'), ${device_schema}) as tmp > from input ) > ${device_schema} includes more than 100 fields. > if Rule: OptimizeCsvJsonExprs been applied, the expression, regexp_replace, > will be invoked many times, that costs so much time. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44700) Rule OptimizeCsvJsonExprs should not be applied to expression like from_json(regexp_replace)
[ https://issues.apache.org/jira/browse/SPARK-44700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-44700. - Resolution: Fixed Please upgrade Spark to the latest version to fix this issue. > Rule OptimizeCsvJsonExprs should not be applied to expression like > from_json(regexp_replace) > > > Key: SPARK-44700 > URL: https://issues.apache.org/jira/browse/SPARK-44700 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: jiahong.li >Priority: Minor > > _SQL_ like below: > select tmp.* > from > (select > device_id, ads_id, > from_json(regexp_replace(device_personas, '(?<=(\\{|,))"device_', > '"user_device_'), ${device_schema}) as tmp > from input ) > ${device_schema} includes more than 100 fields. > if Rule: OptimizeCsvJsonExprs been applied, the expression, regexp_replace, > will be invoked many times, that costs so much time. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44700) Rule OptimizeCsvJsonExprs should not be applied to expression like from_json(regexp_replace)
[ https://issues.apache.org/jira/browse/SPARK-44700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44700: Affects Version/s: 3.1.1 (was: 3.4.0) (was: 3.4.1) > Rule OptimizeCsvJsonExprs should not be applied to expression like > from_json(regexp_replace) > > > Key: SPARK-44700 > URL: https://issues.apache.org/jira/browse/SPARK-44700 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: jiahong.li >Priority: Minor > > _SQL_ like below: > select tmp.* > from > (select > device_id, ads_id, > from_json(regexp_replace(device_personas, '(?<=(\\{|,))"device_', > '"user_device_'), ${device_schema}) as tmp > from input ) > ${device_schema} includes more than 100 fields. > if Rule: OptimizeCsvJsonExprs been applied, the expression, regexp_replace, > will be invoked many times, that costs so much time. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44761) Add DataStreamWriter.foreachBatch(org.apache.spark.api.java.function.VoidFunction2) signature
Herman van Hövell created SPARK-44761: - Summary: Add DataStreamWriter.foreachBatch(org.apache.spark.api.java.function.VoidFunction2) signature Key: SPARK-44761 URL: https://issues.apache.org/jira/browse/SPARK-44761 Project: Spark Issue Type: New Feature Components: Connect Affects Versions: 3.5.0 Reporter: Herman van Hövell Assignee: Herman van Hövell -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44756) Executor hangs when RetryingBlockTransferor fails to initiate retry
[ https://issues.apache.org/jira/browse/SPARK-44756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752785#comment-17752785 ] GridGain Integration commented on SPARK-44756: -- User 'hdaikoku' has created a pull request for this issue: https://github.com/apache/spark/pull/42426 > Executor hangs when RetryingBlockTransferor fails to initiate retry > --- > > Key: SPARK-44756 > URL: https://issues.apache.org/jira/browse/SPARK-44756 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 3.3.1 >Reporter: Harunobu Daikoku >Priority: Minor > > We have been observing this issue several times in our production where some > executors are being stuck at BlockTransferService#fetchBlockSync(). > After some investigation, the issue seems to be caused by an unhandled edge > case in RetryingBlockTransferor. > 1. Shuffle transfer fails for whatever reason > {noformat} > java.io.IOException: Cannot allocate memory > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:51) > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) > at > org.apache.spark.network.shuffle.SimpleDownloadFile$SimpleDownloadWritableChannel.write(SimpleDownloadFile.java:78) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onData(OneForOneBlockFetcher.java:340) > at > org.apache.spark.network.client.StreamInterceptor.handle(StreamInterceptor.java:79) > at > org.apache.spark.network.util.TransportFrameDecoder.feedInterceptor(TransportFrameDecoder.java:263) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:87) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > {noformat} > 2. The above exception caught by > [AbstractChannelHandlerContext#invokeChannelRead()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L381], > and propagated to the exception handler > 3. Exception reaches > [RetryingBlockTransferor#initiateRetry()|https://github.com/apache/spark/blob/v3.3.1/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java#L178-L180], > and it tries to initiate retry > {noformat} > 23/08/09 16:58:37 shuffle-client-4-2 INFO RetryingBlockTransferor: Retrying > fetch (1/3) for 1 outstanding blocks after 5000 ms > {noformat} > 4. Retry initiation fails (in our case, it fails to create a new thread) > 5. Exception caught by > [AbstractChannelHandlerContext#invokeExceptionCaught()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L305-L309], > and not further processed > {noformat} > 23/08/09 16:58:53 shuffle-client-4-2 DEBUG AbstractChannelHandlerContext: An > exception java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:719) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor.initiateRetry(RetryingBlockTransferor.java:182) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor.access$500(RetryingBlockTransferor.java:43) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.handleBlockTransferFailure(RetryingBlockTransferor.java:230) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.onBlockFetchFailure(RetryingBlockTransferor.java:260) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher.failRemainingBlocks(OneForOneBlockFetcher.java:318) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher.access$300(OneForOneBlockFetcher.java:55) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onFailure(OneForOneBlockFetcher.java:357) > at > org.apache.spark.network.client.StreamInterceptor.exceptionCaught(StreamInterceptor.java:56) > at > org.apache.spark.network.util.TransportFrameDecoder.exceptionCaught(TransportFrameDecoder.java:231) > at >
[jira] [Commented] (SPARK-44756) Executor hangs when RetryingBlockTransferor fails to initiate retry
[ https://issues.apache.org/jira/browse/SPARK-44756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752781#comment-17752781 ] Harunobu Daikoku commented on SPARK-44756: -- PR raised: https://github.com/apache/spark/pull/42426 > Executor hangs when RetryingBlockTransferor fails to initiate retry > --- > > Key: SPARK-44756 > URL: https://issues.apache.org/jira/browse/SPARK-44756 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 3.3.1 >Reporter: Harunobu Daikoku >Priority: Minor > > We have been observing this issue several times in our production where some > executors are being stuck at BlockTransferService#fetchBlockSync(). > After some investigation, the issue seems to be caused by an unhandled edge > case in RetryingBlockTransferor. > 1. Shuffle transfer fails for whatever reason > {noformat} > java.io.IOException: Cannot allocate memory > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:51) > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) > at > org.apache.spark.network.shuffle.SimpleDownloadFile$SimpleDownloadWritableChannel.write(SimpleDownloadFile.java:78) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onData(OneForOneBlockFetcher.java:340) > at > org.apache.spark.network.client.StreamInterceptor.handle(StreamInterceptor.java:79) > at > org.apache.spark.network.util.TransportFrameDecoder.feedInterceptor(TransportFrameDecoder.java:263) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:87) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > {noformat} > 2. The above exception caught by > [AbstractChannelHandlerContext#invokeChannelRead()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L381], > and propagated to the exception handler > 3. Exception reaches > [RetryingBlockTransferor#initiateRetry()|https://github.com/apache/spark/blob/v3.3.1/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java#L178-L180], > and it tries to initiate retry > {noformat} > 23/08/09 16:58:37 shuffle-client-4-2 INFO RetryingBlockTransferor: Retrying > fetch (1/3) for 1 outstanding blocks after 5000 ms > {noformat} > 4. Retry initiation fails (in our case, it fails to create a new thread) > 5. Exception caught by > [AbstractChannelHandlerContext#invokeExceptionCaught()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L305-L309], > and not further processed > {noformat} > 23/08/09 16:58:53 shuffle-client-4-2 DEBUG AbstractChannelHandlerContext: An > exception java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:719) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor.initiateRetry(RetryingBlockTransferor.java:182) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor.access$500(RetryingBlockTransferor.java:43) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.handleBlockTransferFailure(RetryingBlockTransferor.java:230) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.onBlockFetchFailure(RetryingBlockTransferor.java:260) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher.failRemainingBlocks(OneForOneBlockFetcher.java:318) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher.access$300(OneForOneBlockFetcher.java:55) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onFailure(OneForOneBlockFetcher.java:357) > at > org.apache.spark.network.client.StreamInterceptor.exceptionCaught(StreamInterceptor.java:56) > at > org.apache.spark.network.util.TransportFrameDecoder.exceptionCaught(TransportFrameDecoder.java:231) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) > {noformat} > 6.
[jira] [Updated] (SPARK-44756) Executor hangs when RetryingBlockTransferor fails to initiate retry
[ https://issues.apache.org/jira/browse/SPARK-44756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harunobu Daikoku updated SPARK-44756: - Component/s: Spark Core > Executor hangs when RetryingBlockTransferor fails to initiate retry > --- > > Key: SPARK-44756 > URL: https://issues.apache.org/jira/browse/SPARK-44756 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 3.3.1 >Reporter: Harunobu Daikoku >Priority: Minor > > We have been observing this issue several times in our production where some > executors are being stuck at BlockTransferService#fetchBlockSync(). > After some investigation, the issue seems to be caused by an unhandled edge > case in RetryingBlockTransferor. > 1. Shuffle transfer fails for whatever reason > {noformat} > java.io.IOException: Cannot allocate memory > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:51) > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) > at > org.apache.spark.network.shuffle.SimpleDownloadFile$SimpleDownloadWritableChannel.write(SimpleDownloadFile.java:78) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onData(OneForOneBlockFetcher.java:340) > at > org.apache.spark.network.client.StreamInterceptor.handle(StreamInterceptor.java:79) > at > org.apache.spark.network.util.TransportFrameDecoder.feedInterceptor(TransportFrameDecoder.java:263) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:87) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > {noformat} > 2. The above exception caught by > [AbstractChannelHandlerContext#invokeChannelRead()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L381], > and propagated to the exception handler > 3. Exception reaches > [RetryingBlockTransferor#initiateRetry()|https://github.com/apache/spark/blob/v3.3.1/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java#L178-L180], > and it tries to initiate retry > {noformat} > 23/08/09 16:58:37 shuffle-client-4-2 INFO RetryingBlockTransferor: Retrying > fetch (1/3) for 1 outstanding blocks after 5000 ms > {noformat} > 4. Retry initiation fails (in our case, it fails to create a new thread) > 5. Exception caught by > [AbstractChannelHandlerContext#invokeExceptionCaught()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L305-L309], > and not further processed > {noformat} > 23/08/09 16:58:53 shuffle-client-4-2 DEBUG AbstractChannelHandlerContext: An > exception java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:719) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor.initiateRetry(RetryingBlockTransferor.java:182) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor.access$500(RetryingBlockTransferor.java:43) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.handleBlockTransferFailure(RetryingBlockTransferor.java:230) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.onBlockFetchFailure(RetryingBlockTransferor.java:260) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher.failRemainingBlocks(OneForOneBlockFetcher.java:318) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher.access$300(OneForOneBlockFetcher.java:55) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onFailure(OneForOneBlockFetcher.java:357) > at > org.apache.spark.network.client.StreamInterceptor.exceptionCaught(StreamInterceptor.java:56) > at > org.apache.spark.network.util.TransportFrameDecoder.exceptionCaught(TransportFrameDecoder.java:231) > at > io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) > {noformat} > 6. After all, retry never happens and the executor thread ends up being
[jira] [Created] (SPARK-44760) Index Out Of Bound for JIRA resolution in merge_spark_pr
Kent Yao created SPARK-44760: Summary: Index Out Of Bound for JIRA resolution in merge_spark_pr Key: SPARK-44760 URL: https://issues.apache.org/jira/browse/SPARK-44760 Project: Spark Issue Type: Bug Components: Project Infra Affects Versions: 4.0.0 Reporter: Kent Yao I -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44758) Support memory limit configurable
[ https://issues.apache.org/jira/browse/SPARK-44758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuml updated SPARK-44758: -- Description: Currently the memory request and limit are set by summing the values of spark.\{driver,executor}.memory and spark.\{driver,executor}.memoryOverhead. Supporting memory limits configurable can bring some benefits. For example, use unfixed memory to use page cache, reduce disk IO of shuffle read to improve performance. (was: Currently the memory request and limit are set by summing the values of spark.\{driver,executor}.memory and spark.\{driver,executor}.memoryOverhead. Supporting configurable memory limits can bring some benefits. For example, use unfixed memory to use page cache, reduce disk IO of shuffle read to improve performance.) > Support memory limit configurable > - > > Key: SPARK-44758 > URL: https://issues.apache.org/jira/browse/SPARK-44758 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.1 >Reporter: zhuml >Priority: Minor > > Currently the memory request and limit are set by summing the values of > spark.\{driver,executor}.memory and spark.\{driver,executor}.memoryOverhead. > Supporting memory limits configurable can bring some benefits. For example, > use unfixed memory to use page cache, reduce disk IO of shuffle read to > improve performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44758) Support memory limit configurable
[ https://issues.apache.org/jira/browse/SPARK-44758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhuml updated SPARK-44758: -- Summary: Support memory limit configurable (was: Support memory limit) > Support memory limit configurable > - > > Key: SPARK-44758 > URL: https://issues.apache.org/jira/browse/SPARK-44758 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.4.1 >Reporter: zhuml >Priority: Minor > > Currently the memory request and limit are set by summing the values of > spark.\{driver,executor}.memory and spark.\{driver,executor}.memoryOverhead. > Supporting configurable memory limits can bring some benefits. For example, > use unfixed memory to use page cache, reduce disk IO of shuffle read to > improve performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen because it can easily cause OOM failures
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Attachment: (was: spark-verbosewithcodegenenabled) > Do not combine multiple Generate nodes in the same WholeStageCodeGen because > it can easily cause OOM failures > -- > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.0, 3.3.1 >Reporter: Franck Tago >Priority: Major > Attachments: wholestagecodegen_wc1_debug_wholecodegen_passed > > > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whoe > stage code gen node . Doing so leads to potential memory issues. > > !image-2023-08-10-02-18-24-758.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen because it can easily cause OOM failures
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Attachment: wholestagecodegen_wc1_debug_wholecodegen_passed > Do not combine multiple Generate nodes in the same WholeStageCodeGen because > it can easily cause OOM failures > -- > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.0, 3.3.1 >Reporter: Franck Tago >Priority: Major > Attachments: wholestagecodegen_wc1_debug_wholecodegen_passed > > > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whoe > stage code gen node . Doing so leads to potential memory issues. > > !image-2023-08-10-02-18-24-758.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen because it can easily cause OOM failures
[ https://issues.apache.org/jira/browse/SPARK-44759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Franck Tago updated SPARK-44759: Attachment: spark-verbosewithcodegenenabled > Do not combine multiple Generate nodes in the same WholeStageCodeGen because > it can easily cause OOM failures > -- > > Key: SPARK-44759 > URL: https://issues.apache.org/jira/browse/SPARK-44759 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.3.0, 3.3.1 >Reporter: Franck Tago >Priority: Major > Attachments: spark-verbosewithcodegenenabled > > > The generate node used to flatten array generally produces an amount of > output rows that is significantly higher than the input rows to the node. > the number of output rows generated is even drastically higher when > flattening a nested array . > When we combine more that 1 generate node in the same WholeStageCodeGen > node, we run a high risk of running out of memory for multiple reasons. > 1- As you can see from the attachment , the rows created in the nested loop > are saved in writer buffer. In my case because the rows were big , I hit an > Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a > nested loop that for each row , will explode the parent array and then > explode the inner array. This is prone to OutOfmerry errors > > Please view the attached Spark Gui and Spark Dag > In my case the wholestagecodegen includes 2 explode nodes. > Because the array elements are large , we end up with an Out Of Memory error. > > I recommend that we do not merge multiple explode nodes in the same whoe > stage code gen node . Doing so leads to potential memory issues. > > !image-2023-08-10-02-18-24-758.png! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44759) Do not combine multiple Generate nodes in the same WholeStageCodeGen because it can easily cause OOM failures
Franck Tago created SPARK-44759: --- Summary: Do not combine multiple Generate nodes in the same WholeStageCodeGen because it can easily cause OOM failures Key: SPARK-44759 URL: https://issues.apache.org/jira/browse/SPARK-44759 Project: Spark Issue Type: Bug Components: Optimizer Affects Versions: 3.3.1, 3.3.0 Reporter: Franck Tago The generate node used to flatten array generally produces an amount of output rows that is significantly higher than the input rows to the node. the number of output rows generated is even drastically higher when flattening a nested array . When we combine more that 1 generate node in the same WholeStageCodeGen node, we run a high risk of running out of memory for multiple reasons. 1- As you can see from the attachment , the rows created in the nested loop are saved in writer buffer. In my case because the rows were big , I hit an Out Of Memory Exception error .2_ The generated WholeStageCodeGen result in a nested loop that for each row , will explode the parent array and then explode the inner array. This is prone to OutOfmerry errors Please view the attached Spark Gui and Spark Dag In my case the wholestagecodegen includes 2 explode nodes. Because the array elements are large , we end up with an Out Of Memory error. I recommend that we do not merge multiple explode nodes in the same whoe stage code gen node . Doing so leads to potential memory issues. !image-2023-08-10-02-18-24-758.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44757) Vulnerabilities in Spark3.4
[ https://issues.apache.org/jira/browse/SPARK-44757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anand Balasubramaniam updated SPARK-44757: -- Priority: Major (was: Minor) > Vulnerabilities in Spark3.4 > --- > > Key: SPARK-44757 > URL: https://issues.apache.org/jira/browse/SPARK-44757 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Anand Balasubramaniam >Priority: Major > > We are seeing below list of TPLS's with vulnerabilities bundled with Spark3.4 > package with StackRox scan , is there any ETA on fixing them ? Kindly apprise > us on the same . > h2. Vulnerabilities in Spark3.4 > |*CVE*|*Description*|*Severity*| > |CVE-2018-21234|Jodd before 5.0.4 performs Deserialization of Untrusted JSON > Data when setClassMetadataName is set.|CVSS Score:9.8Critical| > |CVE-2022-42004|In FasterXML jackson-databind before 2.13.4, resource > exhaustion can occur because of a lack of a check in > BeanDeserializer._deserializeFromArray to prevent use of deeply nested > arrays. An application is vulnerable only with certain customized choices for > deserialization.|CVSS Score 7.5Important| > | CVE-2022-42003|In FasterXML jackson-databind before 2.14.0-rc1, resource > exhaustion can occur because of a lack of a check in primitive value > deserializers to avoid deep wrapper array nesting, when the > UNWRAP_SINGLE_VALUE_ARRAYS feature is enabled. Additional fix version in > 2.13.4.1 and 2.12.17.1|CVSS Score 7.5Important| > |CVE-2022-40152|Those using Woodstox to parse XML data may be vulnerable to > Denial of Service attacks (DOS) if DTD support is enabled. If the parser is > running on user supplied input, an attacker may supply content that causes > the parser to crash by stackoverflow. This effect may support a denial of > service attack.|CVSS Score 7.5Important| > |CVE-2022-3171|A parsing issue with binary data in protobuf-java core and > lite versions prior to 3.21.7, 3.20.3, 3.19.6 and 3.16.3 can lead to a denial > of service attack. Inputs containing multiple instances of non-repeated > embedded messages with repeated or unknown fields causes objects to be > converted back-n-forth between mutable and immutable forms, resulting in > potentially long garbage collection pauses. We recommend updating to the > versions mentioned above.|CVSS Score 7.5Important| > |CVE-2021-34538|Apache Hive before 3.1.3 "CREATE" and "DROP" function > operations does not check for necessary authorization of involved entities in > the query. It was found that an unauthorized user can manipulate an existing > UDF without having the privileges to do so. This allowed unauthorized or > underprivileged users to drop and recreate UDFs pointing them to new jars > that could be potentially malicious.|CVSS Score 7.5Important| > |CVE-2020-13949|In Apache Thrift 0.9.3 to 0.13.0, malicious RPC clients could > send short messages which would result in a large memory allocation, > potentially leading to denial of service.|CVSS Score 7.5Important| > |CVE-2018-10237|Unbounded memory allocation in Google Guava 11.0 through 24.x > before 24.1.1 allows remote attackers to conduct denial of service attacks > against servers that depend on this library and deserialize attacker-provided > data, because the AtomicDoubleArray class (when serialized with Java > serialization) and the CompoundOrdering class (when serialized with GWT > serialization) perform eager allocation without appropriate checks on what a > client has sent and whether the data size is reasonable.|CVSS 5.9Moderate| > |CVE-2021-22569|An issue in protobuf-java allowed the interleaving of > com.google.protobuf.UnknownFieldSet fields in such a way that would be > processed out of order. A small malicious payload can occupy the parser for > several minutes by creating large numbers of short-lived objects that cause > frequent, repeated pauses. We recommend upgrading libraries beyond the > vulnerable versions.|CVSS 5.9Moderate| > |CVE-2020-8908|A temp directory creation vulnerability exists in all versions > of Guava, allowing an attacker with access to the machine to potentially > access data in a temporary directory created by the Guava API > [com.google.common.io|https://urldefense.com/v3/__http:/com.google.common.io/__;!!KpaPruflFCEp!hUy3fNZoxf_mnbeTP7GUWkbaKtRLDswR2fRnQ9Gm_AoaeVUncE_plq53EqTWyd1ZfAI7tIFOgmmEBPoGRw$].Files.createTempDir(). > By default, on unix-like systems, the created directory is world-readable > (readable by an attacker with access to the system). The method in question > has been marked @Deprecated in versions 30.0 and later and should not be > used. For Android developers, we recommend choosing a temporary directory API > provided by Android, such as context.getCacheDir(). For other Java >
[jira] [Created] (SPARK-44758) Support memory limit
zhuml created SPARK-44758: - Summary: Support memory limit Key: SPARK-44758 URL: https://issues.apache.org/jira/browse/SPARK-44758 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.4.1 Reporter: zhuml Currently the memory request and limit are set by summing the values of spark.\{driver,executor}.memory and spark.\{driver,executor}.memoryOverhead. Supporting configurable memory limits can bring some benefits. For example, use unfixed memory to use page cache, reduce disk IO of shuffle read to improve performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44757) Vulnerabilities in Spark3.4
Anand Balasubramaniam created SPARK-44757: - Summary: Vulnerabilities in Spark3.4 Key: SPARK-44757 URL: https://issues.apache.org/jira/browse/SPARK-44757 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.4.0 Reporter: Anand Balasubramaniam We are seeing below list of TPLS's with vulnerabilities bundled with Spark3.4 package with StackRox scan , is there any ETA on fixing them ? Kindly apprise us on the same . h2. Vulnerabilities in Spark3.4 |*CVE*|*Description*|*Severity*| |CVE-2018-21234|Jodd before 5.0.4 performs Deserialization of Untrusted JSON Data when setClassMetadataName is set.|CVSS Score:9.8Critical| |CVE-2022-42004|In FasterXML jackson-databind before 2.13.4, resource exhaustion can occur because of a lack of a check in BeanDeserializer._deserializeFromArray to prevent use of deeply nested arrays. An application is vulnerable only with certain customized choices for deserialization.|CVSS Score 7.5Important| | CVE-2022-42003|In FasterXML jackson-databind before 2.14.0-rc1, resource exhaustion can occur because of a lack of a check in primitive value deserializers to avoid deep wrapper array nesting, when the UNWRAP_SINGLE_VALUE_ARRAYS feature is enabled. Additional fix version in 2.13.4.1 and 2.12.17.1|CVSS Score 7.5Important| |CVE-2022-40152|Those using Woodstox to parse XML data may be vulnerable to Denial of Service attacks (DOS) if DTD support is enabled. If the parser is running on user supplied input, an attacker may supply content that causes the parser to crash by stackoverflow. This effect may support a denial of service attack.|CVSS Score 7.5Important| |CVE-2022-3171|A parsing issue with binary data in protobuf-java core and lite versions prior to 3.21.7, 3.20.3, 3.19.6 and 3.16.3 can lead to a denial of service attack. Inputs containing multiple instances of non-repeated embedded messages with repeated or unknown fields causes objects to be converted back-n-forth between mutable and immutable forms, resulting in potentially long garbage collection pauses. We recommend updating to the versions mentioned above.|CVSS Score 7.5Important| |CVE-2021-34538|Apache Hive before 3.1.3 "CREATE" and "DROP" function operations does not check for necessary authorization of involved entities in the query. It was found that an unauthorized user can manipulate an existing UDF without having the privileges to do so. This allowed unauthorized or underprivileged users to drop and recreate UDFs pointing them to new jars that could be potentially malicious.|CVSS Score 7.5Important| |CVE-2020-13949|In Apache Thrift 0.9.3 to 0.13.0, malicious RPC clients could send short messages which would result in a large memory allocation, potentially leading to denial of service.|CVSS Score 7.5Important| |CVE-2018-10237|Unbounded memory allocation in Google Guava 11.0 through 24.x before 24.1.1 allows remote attackers to conduct denial of service attacks against servers that depend on this library and deserialize attacker-provided data, because the AtomicDoubleArray class (when serialized with Java serialization) and the CompoundOrdering class (when serialized with GWT serialization) perform eager allocation without appropriate checks on what a client has sent and whether the data size is reasonable.|CVSS 5.9Moderate| |CVE-2021-22569|An issue in protobuf-java allowed the interleaving of com.google.protobuf.UnknownFieldSet fields in such a way that would be processed out of order. A small malicious payload can occupy the parser for several minutes by creating large numbers of short-lived objects that cause frequent, repeated pauses. We recommend upgrading libraries beyond the vulnerable versions.|CVSS 5.9Moderate| |CVE-2020-8908|A temp directory creation vulnerability exists in all versions of Guava, allowing an attacker with access to the machine to potentially access data in a temporary directory created by the Guava API [com.google.common.io|https://urldefense.com/v3/__http:/com.google.common.io/__;!!KpaPruflFCEp!hUy3fNZoxf_mnbeTP7GUWkbaKtRLDswR2fRnQ9Gm_AoaeVUncE_plq53EqTWyd1ZfAI7tIFOgmmEBPoGRw$].Files.createTempDir(). By default, on unix-like systems, the created directory is world-readable (readable by an attacker with access to the system). The method in question has been marked @Deprecated in versions 30.0 and later and should not be used. For Android developers, we recommend choosing a temporary directory API provided by Android, such as context.getCacheDir(). For other Java developers, we recommend migrating to the Java 7 API java.nio.file.Files.createTempDirectory() which explicitly configures permissions of 700, or configuring the Java runtime's
[jira] [Assigned] (SPARK-43705) Enable TimedeltaIndexTests.test_properties for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43705: Assignee: Haejoon Lee > Enable TimedeltaIndexTests.test_properties for pandas 2.0.0. > > > Key: SPARK-43705 > URL: https://issues.apache.org/jira/browse/SPARK-43705 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Enable TimedeltaIndexTests.test_properties for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43245) Fix DatetimeIndex.microsecond to return 'int32' instead of 'int64' type of Index.
[ https://issues.apache.org/jira/browse/SPARK-43245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43245. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42271 [https://github.com/apache/spark/pull/42271] > Fix DatetimeIndex.microsecond to return 'int32' instead of 'int64' type of > Index. > - > > Key: SPARK-43245 > URL: https://issues.apache.org/jira/browse/SPARK-43245 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 4.0.0 > > > https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#index-can-now-hold-numpy-numeric-dtypes -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43705) Enable TimedeltaIndexTests.test_properties for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-43705. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42271 [https://github.com/apache/spark/pull/42271] > Enable TimedeltaIndexTests.test_properties for pandas 2.0.0. > > > Key: SPARK-43705 > URL: https://issues.apache.org/jira/browse/SPARK-43705 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 4.0.0 > > > Enable TimedeltaIndexTests.test_properties for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43245) Fix DatetimeIndex.microsecond to return 'int32' instead of 'int64' type of Index.
[ https://issues.apache.org/jira/browse/SPARK-43245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-43245: Assignee: Haejoon Lee > Fix DatetimeIndex.microsecond to return 'int32' instead of 'int64' type of > Index. > - > > Key: SPARK-43245 > URL: https://issues.apache.org/jira/browse/SPARK-43245 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > https://pandas.pydata.org/docs/dev/whatsnew/v2.0.0.html#index-can-now-hold-numpy-numeric-dtypes -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44742) Add Spark version drop down to the PySpark doc site
[ https://issues.apache.org/jira/browse/SPARK-44742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17752689#comment-17752689 ] BingKun Pan commented on SPARK-44742: - I work on it. > Add Spark version drop down to the PySpark doc site > --- > > Key: SPARK-44742 > URL: https://issues.apache.org/jira/browse/SPARK-44742 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Allison Wang >Priority: Major > > Currently, PySpark documentation does not have a version dropdown. While by > default we want people to land on the latest version, it will be helpful and > easier for people to find docs if we have this version dropdown. > Other libraries such as numpy have such version dropdown. > !image-2023-08-09-09-38-00-805.png|width=214,height=189! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44756) Executor hangs when RetryingBlockTransferor fails to initiate retry
[ https://issues.apache.org/jira/browse/SPARK-44756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harunobu Daikoku updated SPARK-44756: - Description: We have been observing this issue several times in our production where some executors are being stuck at BlockTransferService#fetchBlockSync(). After some investigation, the issue seems to be caused by an unhandled edge case in RetryingBlockTransferor. 1. Shuffle transfer fails for whatever reason {noformat} java.io.IOException: Cannot allocate memory at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:51) at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) at org.apache.spark.network.shuffle.SimpleDownloadFile$SimpleDownloadWritableChannel.write(SimpleDownloadFile.java:78) at org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onData(OneForOneBlockFetcher.java:340) at org.apache.spark.network.client.StreamInterceptor.handle(StreamInterceptor.java:79) at org.apache.spark.network.util.TransportFrameDecoder.feedInterceptor(TransportFrameDecoder.java:263) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:87) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) {noformat} 2. The above exception caught by [AbstractChannelHandlerContext#invokeChannelRead()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L381], and propagated to the exception handler 3. Exception reaches [RetryingBlockTransferor#initiateRetry()|https://github.com/apache/spark/blob/v3.3.1/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java#L178-L180], and it tries to initiate retry {noformat} 23/08/09 16:58:37 shuffle-client-4-2 INFO RetryingBlockTransferor: Retrying fetch (1/3) for 1 outstanding blocks after 5000 ms {noformat} 4. Retry initiation fails (in our case, it fails to create a new thread) 5. Exception caught by [AbstractChannelHandlerContext#invokeExceptionCaught()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L305-L309], and not further processed {noformat} 23/08/09 16:58:53 shuffle-client-4-2 DEBUG AbstractChannelHandlerContext: An exception java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:719) at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) at org.apache.spark.network.shuffle.RetryingBlockTransferor.initiateRetry(RetryingBlockTransferor.java:182) at org.apache.spark.network.shuffle.RetryingBlockTransferor.access$500(RetryingBlockTransferor.java:43) at org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.handleBlockTransferFailure(RetryingBlockTransferor.java:230) at org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.onBlockFetchFailure(RetryingBlockTransferor.java:260) at org.apache.spark.network.shuffle.OneForOneBlockFetcher.failRemainingBlocks(OneForOneBlockFetcher.java:318) at org.apache.spark.network.shuffle.OneForOneBlockFetcher.access$300(OneForOneBlockFetcher.java:55) at org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onFailure(OneForOneBlockFetcher.java:357) at org.apache.spark.network.client.StreamInterceptor.exceptionCaught(StreamInterceptor.java:56) at org.apache.spark.network.util.TransportFrameDecoder.exceptionCaught(TransportFrameDecoder.java:231) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) {noformat} 6. After all, retry never happens and the executor thread ends up being stuck at [BlockTransferService#fetchBlockSync()|https://github.com/apache/spark/blob/v3.3.1/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala#L103], waiting for the transfer to complete/fail {noformat} sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
[jira] [Updated] (SPARK-44756) Executor hangs when RetryingBlockTransferor fails to initiate retry
[ https://issues.apache.org/jira/browse/SPARK-44756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harunobu Daikoku updated SPARK-44756: - Description: We have been observing this issue several times in our production where some executors are being stuck at BlockTransferService.fetchBlockSync. After some investigation, the issue seems to be caused by an unhandled edge case in RetryingBlockTransferor. 1. Shuffle transfer fails for whatever reason {noformat} java.io.IOException: Cannot allocate memory at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:51) at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) at org.apache.spark.network.shuffle.SimpleDownloadFile$SimpleDownloadWritableChannel.write(SimpleDownloadFile.java:78) at org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onData(OneForOneBlockFetcher.java:340) at org.apache.spark.network.client.StreamInterceptor.handle(StreamInterceptor.java:79) at org.apache.spark.network.util.TransportFrameDecoder.feedInterceptor(TransportFrameDecoder.java:263) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:87) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) {noformat} 2. The above exception caught by [AbstractChannelHandlerContext#invokeChannelRead()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L381], and propagated to the exception handler 3. Exception reaches [RetryingBlockTransferor#initiateRetry()|https://github.com/apache/spark/blob/v3.3.1/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java#L178-L180], and it tries to initiate retry {noformat} 23/08/09 16:58:37 shuffle-client-4-2 INFO RetryingBlockTransferor: Retrying fetch (1/3) for 1 outstanding blocks after 5000 ms {noformat} 4. Retry initiation fails (in our case, it fails to create a new thread) 5. Exception caught by [AbstractChannelHandlerContext#invokeExceptionCaught()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L305-L309], and not further processed {noformat} 23/08/09 16:58:53 shuffle-client-4-2 DEBUG AbstractChannelHandlerContext: An exception java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:719) at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) at org.apache.spark.network.shuffle.RetryingBlockTransferor.initiateRetry(RetryingBlockTransferor.java:182) at org.apache.spark.network.shuffle.RetryingBlockTransferor.access$500(RetryingBlockTransferor.java:43) at org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.handleBlockTransferFailure(RetryingBlockTransferor.java:230) at org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.onBlockFetchFailure(RetryingBlockTransferor.java:260) at org.apache.spark.network.shuffle.OneForOneBlockFetcher.failRemainingBlocks(OneForOneBlockFetcher.java:318) at org.apache.spark.network.shuffle.OneForOneBlockFetcher.access$300(OneForOneBlockFetcher.java:55) at org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onFailure(OneForOneBlockFetcher.java:357) at org.apache.spark.network.client.StreamInterceptor.exceptionCaught(StreamInterceptor.java:56) at org.apache.spark.network.util.TransportFrameDecoder.exceptionCaught(TransportFrameDecoder.java:231) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) {noformat} 6. After all, retry never happens and the executor thread ends up being stuck at [BlockTransferService#fetchBlockSync()|https://github.com/apache/spark/blob/v3.3.1/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala#L103], waiting for the transfer to complete/fail {noformat} sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
[jira] [Updated] (SPARK-44756) Executor hangs when RetryingBlockTransferor fails to initiate retry
[ https://issues.apache.org/jira/browse/SPARK-44756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harunobu Daikoku updated SPARK-44756: - Summary: Executor hangs when RetryingBlockTransferor fails to initiate retry (was: Executor hangs when RetryingBlockTransferor fails to submit retry request) > Executor hangs when RetryingBlockTransferor fails to initiate retry > --- > > Key: SPARK-44756 > URL: https://issues.apache.org/jira/browse/SPARK-44756 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 3.3.1 >Reporter: Harunobu Daikoku >Priority: Minor > > We have been observing this issue several times in our production where some > executors are being stuck at BlockTransferService.fetchBlockSync. > After some investigation, the issue seems to be caused by an unhandled edge > case in RetryingBlockTransferor. > 1. Shuffle transfer fails for whatever reason > {noformat} > java.io.IOException: Cannot allocate memory > at sun.nio.ch.FileDispatcherImpl.write0(Native Method) > at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) > at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) > at sun.nio.ch.IOUtil.write(IOUtil.java:51) > at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) > at > org.apache.spark.network.shuffle.SimpleDownloadFile$SimpleDownloadWritableChannel.write(SimpleDownloadFile.java:78) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onData(OneForOneBlockFetcher.java:340) > at > org.apache.spark.network.client.StreamInterceptor.handle(StreamInterceptor.java:79) > at > org.apache.spark.network.util.TransportFrameDecoder.feedInterceptor(TransportFrameDecoder.java:263) > at > org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:87) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) > {noformat} > 2. The above exception caught by > [AbstractChannelHandlerContext#invokeChannelRead()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L381], > and propagated to the exception handler > 3. Exception reaches > [RetryingBlockTransferor#initiateRetry()|https://github.com/apache/spark/blob/v3.3.1/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java#L178-L180], > and it tries to initiate retry > {noformat} > 23/08/09 16:58:37 shuffle-client-4-2 INFO RetryingBlockTransferor: Retrying > fetch (1/3) for 1 outstanding blocks after 5000 ms > {noformat} > 4. Retry initiation fails (in our case, it fails to create a new thread) > 5. Exception caught by > [AbstractChannelHandlerContext#invokeExceptionCaught()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L299], > and not further processed > {noformat} > 23/08/09 16:58:53 shuffle-client-4-2 DEBUG AbstractChannelHandlerContext: An > exception java.lang.OutOfMemoryError: unable to create new native thread > at java.lang.Thread.start0(Native Method) > at java.lang.Thread.start(Thread.java:719) > at > java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) > at > java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor.initiateRetry(RetryingBlockTransferor.java:182) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor.access$500(RetryingBlockTransferor.java:43) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.handleBlockTransferFailure(RetryingBlockTransferor.java:230) > at > org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.onBlockFetchFailure(RetryingBlockTransferor.java:260) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher.failRemainingBlocks(OneForOneBlockFetcher.java:318) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher.access$300(OneForOneBlockFetcher.java:55) > at > org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onFailure(OneForOneBlockFetcher.java:357) > at > org.apache.spark.network.client.StreamInterceptor.exceptionCaught(StreamInterceptor.java:56) > at > org.apache.spark.network.util.TransportFrameDecoder.exceptionCaught(TransportFrameDecoder.java:231) > at >
[jira] [Created] (SPARK-44756) Executor hangs when RetryingBlockTransferor fails to submit retry request
Harunobu Daikoku created SPARK-44756: Summary: Executor hangs when RetryingBlockTransferor fails to submit retry request Key: SPARK-44756 URL: https://issues.apache.org/jira/browse/SPARK-44756 Project: Spark Issue Type: Bug Components: Shuffle Affects Versions: 3.3.1 Reporter: Harunobu Daikoku We have been observing this issue several times in our production where some executors are being stuck at BlockTransferService.fetchBlockSync. After some investigation, the issue seems to be caused by an unhandled edge case in RetryingBlockTransferor. 1. Shuffle transfer fails for whatever reason {noformat} java.io.IOException: Cannot allocate memory at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:51) at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211) at org.apache.spark.network.shuffle.SimpleDownloadFile$SimpleDownloadWritableChannel.write(SimpleDownloadFile.java:78) at org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onData(OneForOneBlockFetcher.java:340) at org.apache.spark.network.client.StreamInterceptor.handle(StreamInterceptor.java:79) at org.apache.spark.network.util.TransportFrameDecoder.feedInterceptor(TransportFrameDecoder.java:263) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:87) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) {noformat} 2. The above exception caught by [AbstractChannelHandlerContext#invokeChannelRead()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L381], and propagated to the exception handler 3. Exception reaches [RetryingBlockTransferor#initiateRetry()|https://github.com/apache/spark/blob/v3.3.1/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java#L178-L180], and it tries to initiate retry {noformat} 23/08/09 16:58:37 shuffle-client-4-2 INFO RetryingBlockTransferor: Retrying fetch (1/3) for 1 outstanding blocks after 5000 ms {noformat} 4. Retry initiation fails (in our case, it fails to create a new thread) 5. Exception caught by [AbstractChannelHandlerContext#invokeExceptionCaught()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L299], and not further processed {noformat} 23/08/09 16:58:53 shuffle-client-4-2 DEBUG AbstractChannelHandlerContext: An exception java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:719) at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) at org.apache.spark.network.shuffle.RetryingBlockTransferor.initiateRetry(RetryingBlockTransferor.java:182) at org.apache.spark.network.shuffle.RetryingBlockTransferor.access$500(RetryingBlockTransferor.java:43) at org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.handleBlockTransferFailure(RetryingBlockTransferor.java:230) at org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.onBlockFetchFailure(RetryingBlockTransferor.java:260) at org.apache.spark.network.shuffle.OneForOneBlockFetcher.failRemainingBlocks(OneForOneBlockFetcher.java:318) at org.apache.spark.network.shuffle.OneForOneBlockFetcher.access$300(OneForOneBlockFetcher.java:55) at org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onFailure(OneForOneBlockFetcher.java:357) at org.apache.spark.network.client.StreamInterceptor.exceptionCaught(StreamInterceptor.java:56) at org.apache.spark.network.util.TransportFrameDecoder.exceptionCaught(TransportFrameDecoder.java:231) at io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:302) {noformat} 6. After all, retry never happens and the executor thread ends up being stuck at [BlockTransferService#fetchBlockSync()|https://github.com/apache/spark/blob/v3.3.1/core/src/main/scala/org/apache/spark/network/BlockTransferService.scala#L103], waiting for the transfer to complete/fail {noformat} sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
[jira] [Updated] (SPARK-44573) Couldn't submit Spark application to Kubenetes in versions v1.27.3
[ https://issues.apache.org/jira/browse/SPARK-44573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddaraju G C updated SPARK-44573: -- Priority: Blocker (was: Major) > Couldn't submit Spark application to Kubenetes in versions v1.27.3 > -- > > Key: SPARK-44573 > URL: https://issues.apache.org/jira/browse/SPARK-44573 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Submit >Affects Versions: 3.4.1 >Reporter: Siddaraju G C >Priority: Blocker > > Spark-submit ( cluster mode on Kubernetes ) results error > *io.fabric8.kubernetes.client.KubernetesClientException* on my 3 nodes k8s > cluster. > Steps followed: > * using IBM cloud, created 3 Instances > * 1st Instance act as master node and another two acts as worker nodes > > {noformat} > root@vsi-spark-master:/opt# kubectl get nodes > NAME STATUS ROLES AGE VERSION > vsi-spark-master Ready control-plane,master 2d v1.27.3+k3s1 > vsi-spark-worker-1 Ready 47h v1.27.3+k3s1 > vsi-spark-worker-2 Ready 47h > v1.27.3+k3s1{noformat} > * Copy spark-3.4.1-bin-hadoop3.tgz in to /opt/spark folder > * Ran spark by using below command > > {noformat} > root@vsi-spark-master:/opt# /opt/spark/bin/spark-submit --master > k8s://http://:6443 --conf > spark.kubernetes.authenticate.submission.oauthToken=$TOKEN --deploy-mode > cluster --name spark-pi --class org.apache.spark.examples.SparkPi --conf > spark.executor.instances=5 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.kubernetes.container.image=sushmakorati/testrepo:pyrandomGB > local:///opt/spark/examples/jars/spark-examples_2.12-3.4.1.jar{noformat} > * And getting below error message. > {noformat} > 3/07/27 12:56:26 WARN Utils: Kubernetes master URL uses HTTP instead of HTTPS. > 23/07/27 12:56:26 WARN NativeCodeLoader: Unable to load native-hadoop library > for your platform... using builtin-java classes where applicable > 23/07/27 12:56:26 INFO SparkKubernetesClientFactory: Auto-configuring K8S > client using current context from users K8S config file > 23/07/27 12:56:26 INFO KerberosConfDriverFeatureStep: You have not specified > a krb5.conf file locally or via a ConfigMap. Make sure that you have the > krb5.conf locally on the driver image. > 23/07/27 12:56:27 ERROR Client: Please check "kubectl auth can-i create pod" > first. It should be yes. > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: An error has occurred. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:129) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:122) > at > io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:44) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113) > at > io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:153) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.$anonfun$run$5$adapted(KubernetesClientApplication.scala:244) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2786) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:244) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:216) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.io.IOException: Connection reset > at > io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:535) > at > io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:558) > at > io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleCreate(OperationSupport.java:349) > at >
[jira] [Assigned] (SPARK-44691) Move Subclasses of Analysis to sql/api
[ https://issues.apache.org/jira/browse/SPARK-44691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-44691: --- Assignee: Yihong He > Move Subclasses of Analysis to sql/api > -- > > Key: SPARK-44691 > URL: https://issues.apache.org/jira/browse/SPARK-44691 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yihong He >Assignee: Yihong He >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44691) Move Subclasses of Analysis to sql/api
[ https://issues.apache.org/jira/browse/SPARK-44691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-44691. - Fix Version/s: 3.5.0 Resolution: Fixed > Move Subclasses of Analysis to sql/api > -- > > Key: SPARK-44691 > URL: https://issues.apache.org/jira/browse/SPARK-44691 > Project: Spark > Issue Type: New Feature > Components: Connect >Affects Versions: 3.5.0 >Reporter: Yihong He >Assignee: Yihong He >Priority: Major > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44755) Local tmp data is not cleared while using spark streaming consuming from kafka
leesf created SPARK-44755: - Summary: Local tmp data is not cleared while using spark streaming consuming from kafka Key: SPARK-44755 URL: https://issues.apache.org/jira/browse/SPARK-44755 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.0 Reporter: leesf we are using spark 3.2 consuming data from kafka and then using `collectAsMap` to send to driver, we found the local temp file do not get cleared if the data consumed from kafka is larger than 200m(spark.network.maxRemoteBlockSizeFetchToMem) !https://intranetproxy.alipay.com/skylark/lark/0/2023/png/320711/1691419276170-2dd0964f-4cf4-4b15-9fbe-9622116671da.png! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44754) Improve DeduplicateRelations rewriteAttrs compatibility
[ https://issues.apache.org/jira/browse/SPARK-44754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jia Fan updated SPARK-44754: Description: {{Follow [https://github.com/apache/spark/pull/41554,] we should add test for }} {{{}MapPartitionsInR{}}}, {{{}MapPartitionsInRWithArrow{}}}, {{{}MapElements{}}}, {{{}MapGroups{}}}, {{{}FlatMapGroupsWithState{}}}, {{{}FlatMapGroupsInR{}}}, {{{}FlatMapGroupsInR{}}}, {{{}FlatMapGroupsInRWithArrow{}}}, {{{}FlatMapGroupsInPandas{}}}, {{{}MapInPandas{}}}, {{{}PythonMapInArrow{}}}, {{FlatMapGroupsInPandasWithState and}} {{FlatMapCoGroupsInPandas}} {{To make sure DeduplicateRelations rewriteAttrs will rewrite its attribute normally. Also should fix the error behavior follow [https://github.com/apache/spark/pull/41554|https://github.com/apache/spark/pull/41554,]}} was: {{Follow [https://github.com/apache/spark/pull/41554,] we should add test for }} {}MapPartitionsInR{}}}, {{{}MapPartitionsInRWithArrow{}}}, {{{}MapElements{}}}, {{{}MapGroups{}}}, {{{}FlatMapGroupsWithState{}}}, {{{}FlatMapGroupsInR{}}}, {{{}FlatMapGroupsInR{}}}, {{{}FlatMapGroupsInRWithArrow{}}}, {{{}FlatMapGroupsInPandas{}}}, {{{}MapInPandas{}}}, {{{}PythonMapInArrow{}}}, {{FlatMapGroupsInPandasWithState and}} {{FlatMapCoGroupsInPandas {{{}{{{}{}}}{{{}{}}}{}}}{{{}{{To make sure DeduplicateRelations rewriteAttrs will rewrite its attribute normally. Also should fix the error behavior follow [https://github.com/apache/spark/pull/41554|https://github.com/apache/spark/pull/41554,]}}{}}} > Improve DeduplicateRelations rewriteAttrs compatibility > --- > > Key: SPARK-44754 > URL: https://issues.apache.org/jira/browse/SPARK-44754 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Jia Fan >Priority: Major > > {{Follow [https://github.com/apache/spark/pull/41554,] we should add test for > }} > {{{}MapPartitionsInR{}}}, {{{}MapPartitionsInRWithArrow{}}}, > {{{}MapElements{}}}, {{{}MapGroups{}}}, {{{}FlatMapGroupsWithState{}}}, > {{{}FlatMapGroupsInR{}}}, {{{}FlatMapGroupsInR{}}}, > {{{}FlatMapGroupsInRWithArrow{}}}, {{{}FlatMapGroupsInPandas{}}}, > {{{}MapInPandas{}}}, {{{}PythonMapInArrow{}}}, > {{FlatMapGroupsInPandasWithState and}} {{FlatMapCoGroupsInPandas}} > {{To make sure DeduplicateRelations rewriteAttrs will rewrite its attribute > normally. Also should fix the error behavior follow > [https://github.com/apache/spark/pull/41554|https://github.com/apache/spark/pull/41554,]}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44754) Improve DeduplicateRelations rewriteAttrs compatibility
Jia Fan created SPARK-44754: --- Summary: Improve DeduplicateRelations rewriteAttrs compatibility Key: SPARK-44754 URL: https://issues.apache.org/jira/browse/SPARK-44754 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.5.0 Reporter: Jia Fan {{Follow [https://github.com/apache/spark/pull/41554,] we should add test for }} {}MapPartitionsInR{}}}, {{{}MapPartitionsInRWithArrow{}}}, {{{}MapElements{}}}, {{{}MapGroups{}}}, {{{}FlatMapGroupsWithState{}}}, {{{}FlatMapGroupsInR{}}}, {{{}FlatMapGroupsInR{}}}, {{{}FlatMapGroupsInRWithArrow{}}}, {{{}FlatMapGroupsInPandas{}}}, {{{}MapInPandas{}}}, {{{}PythonMapInArrow{}}}, {{FlatMapGroupsInPandasWithState and}} {{FlatMapCoGroupsInPandas {{{}{{{}{}}}{{{}{}}}{}}}{{{}{{To make sure DeduplicateRelations rewriteAttrs will rewrite its attribute normally. Also should fix the error behavior follow [https://github.com/apache/spark/pull/41554|https://github.com/apache/spark/pull/41554,]}}{}}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org