[jira] [Commented] (SPARK-20112) SIGSEGV in GeneratedIterator.sort_addToSorter
[ https://issues.apache.org/jira/browse/SPARK-20112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099030#comment-16099030 ] Mitesh commented on SPARK-20112: Still seeing this on 2.1.0, attached new err file > SIGSEGV in GeneratedIterator.sort_addToSorter > - > > Key: SPARK-20112 > URL: https://issues.apache.org/jira/browse/SPARK-20112 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 > Environment: AWS m4.10xlarge with EBS (io1 drive, 400g, 4000iops) >Reporter: Mitesh > Attachments: codegen_sorter_crash.log, hs_err_pid19271.log, > hs_err_pid22870.log > > > I'm seeing a very weird crash in {{GeneratedIterator.sort_addToSorter}}. The > hs_err_pid and codegen file are attached (with query plans). Its not a > deterministic repro, but running a big query load, I eventually see it come > up within a few minutes. > Here is some interesting repro information: > - Using AWS r3.8xlarge machines, which have ephermal attached drives, I can't > repro this. But it does repro with m4.10xlarge with an io1 EBS drive. So I > think that means its not an issue with the code-gen, but I cant figure out > what the difference in behavior is. > - The broadcast joins in the plan are all small tables. I have > autoJoinBroadcast=-1 because I always hint which tables should be broadcast. > - As you can see from the plan, all the sources are cached memory tables. And > we partition/sort them all beforehand so its always sort-merge-joins or > broadcast joins (with small tables). > {noformat} > # A fatal error has been detected by the Java Runtime Environment: > # > # [thread 139872345896704 also had an error] > SIGSEGV (0xb) at pc=0x7f38a378caa3, pid=19271, tid=139872342738688 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build > 1.8.0_60-b27) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode > linux-amd64 compressed oops) > [thread 139872348002048 also had an error]# Problematic frame: > # > J 28454 C1 > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$GeneratedIterator;)V > (369 bytes) @ 0x7f38a378caa3 [0x7f38a378b5e0+0x14c3] > {noformat} > This kind of looks like https://issues.apache.org/jira/browse/SPARK-15822, > but that is marked fix in 2.0.0 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20112) SIGSEGV in GeneratedIterator.sort_addToSorter
[ https://issues.apache.org/jira/browse/SPARK-20112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1594#comment-1594 ] Kazuaki Ishizaki commented on SPARK-20112: -- [~MasterDDT] Thank you for preparing additional information. The size of hashed relation does not seem to be very large. In these two cases, I cannot correlate load instructions, which caused SIGSEGV, to Java statements. > SIGSEGV in GeneratedIterator.sort_addToSorter > - > > Key: SPARK-20112 > URL: https://issues.apache.org/jira/browse/SPARK-20112 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 > Environment: AWS m4.10xlarge with EBS (io1 drive, 400g, 4000iops) >Reporter: Mitesh > Attachments: codegen_sorter_crash.log, hs_err_pid19271.log, > hs_err_pid22870.log > > > I'm seeing a very weird crash in {{GeneratedIterator.sort_addToSorter}}. The > hs_err_pid and codegen file are attached (with query plans). Its not a > deterministic repro, but running a big query load, I eventually see it come > up within a few minutes. > Here is some interesting repro information: > - Using AWS r3.8xlarge machines, which have ephermal attached drives, I can't > repro this. But it does repro with m4.10xlarge with an io1 EBS drive. So I > think that means its not an issue with the code-gen, but I cant figure out > what the difference in behavior is. > - The broadcast joins in the plan are all small tables. I have > autoJoinBroadcast=-1 because I always hint which tables should be broadcast. > - As you can see from the plan, all the sources are cached memory tables. And > we partition/sort them all beforehand so its always sort-merge-joins or > broadcast joins (with small tables). > {noformat} > # A fatal error has been detected by the Java Runtime Environment: > # > # [thread 139872345896704 also had an error] > SIGSEGV (0xb) at pc=0x7f38a378caa3, pid=19271, tid=139872342738688 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build > 1.8.0_60-b27) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode > linux-amd64 compressed oops) > [thread 139872348002048 also had an error]# Problematic frame: > # > J 28454 C1 > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$GeneratedIterator;)V > (369 bytes) @ 0x7f38a378caa3 [0x7f38a378b5e0+0x14c3] > {noformat} > This kind of looks like https://issues.apache.org/jira/browse/SPARK-15822, > but that is marked fix in 2.0.0 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20112) SIGSEGV in GeneratedIterator.sort_addToSorter
[ https://issues.apache.org/jira/browse/SPARK-20112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945399#comment-15945399 ] Mitesh commented on SPARK-20112: [~kiszk] I can try out spark 2.0.3+ or 2.1. Actually I disabled wholestage codegen and I do see a failure still on 2.0.2, but in a different place now in {{HashJoin.advanceNext}}. Also uploaded the new hs_err_pid22870. The hashed relations are around 1-10M, but a few are 200M. {noformat} 17/03/27 22:15:59 DEBUG [Executor task launch worker-17] TaskMemoryManager: Task 152119 acquired 64.0 KB for org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@2b60e395 SIGSEGV17/03/27 22:15:59 DEBUG [Executor task launch worker-17] TaskMemoryManager: Task 152119 acquired 64.0 MB for org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@2b60e395 [thread 140369911781120 also had an error] (0xb) at pc=0x7fad1f7afc11, pid=22870, tid=140369909675776 # # JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build 1.8.0_60-b27) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode linux-amd64 compressed oops) # Problematic frame: # J 25558 C2 org.apache.spark.sql.execution.joins.HashJoin$$anonfun$outerJoin$1$$anon$1.advanceNext()Z (110 bytes) @ 0x7fad1f7afc11 [0x7fad1f7afb20+0xf1] # # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /mnt/xvdb/spark/worker_dir/app-20170327213416-0005/14/hs_err_pid22870.log 17/03/27 22:15:59 DEBUG [Executor task launch worker-19] TaskMemoryManager: Task 152090 acquired 64.0 MB for org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@51de5289 2502.591: [G1Ergonomics (Concurrent Cycles) request concurrent cycle initiation, reason: occupancy higher than threshold, occupancy: 7667187712 bytes, allocation request: 160640 bytes, threshold: 8214124950 bytes (45.00 %), source: concurrent humongous allocation] [thread 140376087648000 also had an error] [thread 140369903376128 also had an error] # {noformat} > SIGSEGV in GeneratedIterator.sort_addToSorter > - > > Key: SPARK-20112 > URL: https://issues.apache.org/jira/browse/SPARK-20112 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 > Environment: AWS m4.10xlarge with EBS (io1 drive, 400g, 4000iops) >Reporter: Mitesh > Attachments: codegen_sorter_crash.log, hs_err_pid19271.log, > hs_err_pid22870.log > > > I'm seeing a very weird crash in {{GeneratedIterator.sort_addToSorter}}. The > hs_err_pid and codegen file are attached (with query plans). Its not a > deterministic repro, but running a big query load, I eventually see it come > up within a few minutes. > Here is some interesting repro information: > - Using AWS r3.8xlarge machines, which have ephermal attached drives, I can't > repro this. But it does repro with m4.10xlarge with an io1 EBS drive. So I > think that means its not an issue with the code-gen, but I cant figure out > what the difference in behavior is. > - The broadcast joins in the plan are all small tables. I have > autoJoinBroadcast=-1 because I always hint which tables should be broadcast. > - As you can see from the plan, all the sources are cached memory tables. And > we partition/sort them all beforehand so its always sort-merge-joins or > broadcast joins (with small tables). > {noformat} > # A fatal error has been detected by the Java Runtime Environment: > # > # [thread 139872345896704 also had an error] > SIGSEGV (0xb) at pc=0x7f38a378caa3, pid=19271, tid=139872342738688 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build > 1.8.0_60-b27) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode > linux-amd64 compressed oops) > [thread 139872348002048 also had an error]# Problematic frame: > # > J 28454 C1 > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$GeneratedIterator;)V > (369 bytes) @ 0x7f38a378caa3 [0x7f38a378b5e0+0x14c3] > {noformat} > This kind of looks like https://issues.apache.org/jira/browse/SPARK-15822, > but that is marked fix in 2.0.0 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20112) SIGSEGV in GeneratedIterator.sort_addToSorter
[ https://issues.apache.org/jira/browse/SPARK-20112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15945389#comment-15945389 ] Kazuaki Ishizaki commented on SPARK-20112: -- SPARK-18745 fixed integer overflow issues in {{HashedRelation.scala}} due to large data, which was merged into post-2.0.2. If the data is very large, would it be possible to have a change to try it with the latest branch-2.0? > SIGSEGV in GeneratedIterator.sort_addToSorter > - > > Key: SPARK-20112 > URL: https://issues.apache.org/jira/browse/SPARK-20112 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 > Environment: AWS m4.10xlarge with EBS (io1 drive, 400g, 4000iops) >Reporter: Mitesh > Attachments: codegen_sorter_crash.log, hs_err_pid19271.log > > > I'm seeing a very weird crash in {{GeneratedIterator.sort_addToSorter}}. The > hs_err_pid and codegen file are attached (with query plans). Its not a > deterministic repro, but running a big query load, I eventually see it come > up within a few minutes. > Here is some interesting repro information: > - Using AWS r3.8xlarge machines, which have ephermal attached drives, I can't > repro this. But it does repro with m4.10xlarge with an io1 EBS drive. So I > think that means its not an issue with the code-gen, but I cant figure out > what the difference in behavior is. > - The broadcast joins in the plan are all small tables. I have > autoJoinBroadcast=-1 because I always hint which tables should be broadcast. > - As you can see from the plan, all the sources are cached memory tables. And > we partition/sort them all beforehand so its always sort-merge-joins or > broadcast joins (with small tables). > {noformat} > # A fatal error has been detected by the Java Runtime Environment: > # > # [thread 139872345896704 also had an error] > SIGSEGV (0xb) at pc=0x7f38a378caa3, pid=19271, tid=139872342738688 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build > 1.8.0_60-b27) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode > linux-amd64 compressed oops) > [thread 139872348002048 also had an error]# Problematic frame: > # > J 28454 C1 > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$GeneratedIterator;)V > (369 bytes) @ 0x7f38a378caa3 [0x7f38a378b5e0+0x14c3] > {noformat} > This kind of looks like https://issues.apache.org/jira/browse/SPARK-15822, > but that is marked fix in 2.0.0 -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20112) SIGSEGV in GeneratedIterator.sort_addToSorter
[ https://issues.apache.org/jira/browse/SPARK-20112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15944100#comment-15944100 ] Mitesh commented on SPARK-20112: This kind of looks like https://issues.apache.org/jira/browse/SPARK-15822, but that is marked fix in 2.0.0 > SIGSEGV in GeneratedIterator.sort_addToSorter > - > > Key: SPARK-20112 > URL: https://issues.apache.org/jira/browse/SPARK-20112 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 > Environment: AWS m4.10xlarge with EBS (io1 drive, 400g, 4000iops) >Reporter: Mitesh > Attachments: codegen_sorter_crash.log, hs_err_pid19271.log > > > I'm seeing a very weird crash in {{GeneratedIterator.sort_addToSorter}}. The > hs_err_pid and codegen file are attached (with query plans). Its not a > deterministic repro, but running a big query load, I eventually see it come > up within a few minutes. > Here is some interesting repro information: > - Using AWS r3.8xlarge machines, which have ephermal attached drives, I can't > repro this. But it does repro with m4.10xlarge with an io1 EBS drive. So I > think that means its not an issue with the code-gen, but I cant figure out > what the difference in behavior is. > - The broadcast joins in the plan are all small tables. I have > autoJoinBroadcast=-1 because I always hint which tables should be broadcast. > - As you can see from the plan, all the sources are cached memory tables > {noformat} > # A fatal error has been detected by the Java Runtime Environment: > # > # [thread 139872345896704 also had an error] > SIGSEGV (0xb) at pc=0x7f38a378caa3, pid=19271, tid=139872342738688 > # > # JRE version: Java(TM) SE Runtime Environment (8.0_60-b27) (build > 1.8.0_60-b27) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.60-b23 mixed mode > linux-amd64 compressed oops) > [thread 139872348002048 also had an error]# Problematic frame: > # > J 28454 C1 > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.sort_addToSorter$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$GeneratedIterator;)V > (369 bytes) @ 0x7f38a378caa3 [0x7f38a378b5e0+0x14c3] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org