[GitHub] [spark] ankurdave commented on pull request #34245: [SPARK-37088][PYSPARK][SQL] Writer thread must not access input after task completion listener returns

GitBox Thu, 21 Oct 2021 10:17:22 -0700


ankurdave commented on pull request #34245:
URL: https://github.com/apache/spark/pull/34245#issuecomment-948836489



   I noticed it occurred on another recent PR as well: 
https://github.com/apache/spark/pull/34352 
[failed](https://github.com/beliefer/spark/runs/3961257003?check_suite_focus=true#step:9:1360)
 in `test_pandas_udf_with_column_vector`.
   
   I was also able to repro this locally on `branch-3.2` using the following 
commands:
   ```sh
   ./build/sbt -Phive package
   ./build/sbt test:compile
   seq 100 | parallel -j 8 --halt now,fail=1 'echo {#}; python/run-tests 
--testnames pyspark.sql.tests.test_udf'
   ```
   
   Here's the location of the segfault:
   
   ```
   #
   # A fatal error has been detected by the Java Runtime Environment:
   #
   #  SIGSEGV (0xb) at pc=0x00007f2569052836, pid=25950, tid=0x00007f2358fd5700
   #
   # JRE version: OpenJDK Runtime Environment (8.0_292-b10) (build 
1.8.0_292-8u292-b10-0ubuntu1~18.04-b10)
   # Java VM: OpenJDK 64-Bit Server VM (25.292-b10 mixed mode linux-amd64 
compressed oops)
   # Problematic frame:
   # v  ~StubRoutines::jlong_disjoint_arraycopy
   #
   # Failed to write core dump. Core dumps have been disabled. To enable core 
dumping, try "ulimit -c unlimited" before starting Java again
   #
   # If you would like to submit a bug report, please visit:
   #   http://bugreport.java.com/bugreport/crash.jsp
   #
   
   ---------------  T H R E A D  ---------------
   
   Current thread (0x00007f24082e7800):  JavaThread "stdout writer for 
/usr/bin/python3.6" daemon [_thread_in_Java, id=4879, 
stack(0x00007f2358ed5000,0x00007f2358fd6000)]
   
   siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 
0x0000000000004400
   
   Registers:
   RAX=0x00000000f9c9f5a0, RBX=0x0000000000004400, RCX=0x0000000000007ff8, 
RDX=0xfffffffffffff888
   RSP=0x00007f2358fd39d0, RBP=0x00007f2358fd39d0, RSI=0x0000000000004400, 
RDI=0x00000000f9c9f598
   R8 =0x0000000000008000, R9 =0x0000000000000000, R10=0x00007f2569052e20, 
R11=0x0000000000000010
   R12=0x0000000000000000, R13=0x0000000000000000, R14=0x0000000000100000, 
R15=0x00007f24082e7800
   RIP=0x00007f2569052836, EFLAGS=0x0000000000010286, 
CSGSFS=0x002b000000000033, ERR=0x0000000000000006
     TRAPNO=0x000000000000000e
   
   Top of Stack: (sp=0x00007f2358fd39d0)
   0x00007f2358fd39d0:   00000000f9c9b990 00007f2569ff90a0
   [...]
   
   Instructions: (pc=0x00007f2569052836)
   0x00007f2569052816:   48 8b 44 d7 08 48 89 44 d1 08 48 ff c2 75 f1 48
   0x00007f2569052826:   33 c0 c9 c3 66 0f 1f 44 00 00 c5 fe 6f 44 d7 c8
   0x00007f2569052836:   c5 fe 7f 44 d1 c8 c5 fe 6f 4c d7 e8 c5 fe 7f 4c
   0x00007f2569052846:   d1 e8 48 83 c2 08 7e e2 48 83 ea 04 7f 10 c5 fe
   
   Register to memory mapping:
   
   RAX=0x00000000f9c9f5a0 is an oop
   
   [error occurred during error reporting (printing register info), id 0xb]
   
   Stack: [0x00007f2358ed5000,0x00007f2358fd6000],  sp=0x00007f2358fd39d0,  
free space=1018k
   Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native 
code)
   v  ~StubRoutines::jlong_disjoint_arraycopy
   J 19184 C2 
org.apache.spark.unsafe.Platform.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ)V
 (124 bytes) @ 0x00007f2569ff90a0 [0x00007f2569ff9020+0x80]
   j  
org.apache.spark.sql.execution.vectorized.OffHeapColumnVector.putLongsLittleEndian(II[BI)V+32
   j  
org.apache.spark.sql.execution.datasources.parquet.VectorizedPlainValuesReader.readLongs(ILorg/apache/spark/sql/execution/vectorized/WritableColumnVector;I)V+45
   j  
org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory$LongUpdater.readValues(IILorg/apache/spark/sql/execution/vectorized/WritableColumnVector;Lorg/apache/spark/sql/execution/datasources/parquet/VectorizedValuesReader;)V+5
   j  
org.apache.spark.sql.execution.datasources.parquet.VectorizedRleValuesReader.readBatchInternal(Lorg/apache/spark/sql/execution/datasources/parquet/ParquetReadState;Lorg/apache/spark/sql/execution/vectorized/WritableColumnVector;Lorg/apache/spark/sql/execution/vectorized/WritableColumnVector;Lorg/apache/spark/sql/execution/datasources/parquet/VectorizedValuesReader;Lorg/apache/spark/sql/execution/datasources/parquet/ParquetVectorUpdater;)V+260
   j  
org.apache.spark.sql.execution.datasources.parquet.VectorizedRleValuesReader.readBatch(Lorg/apache/spark/sql/execution/datasources/parquet/ParquetReadState;Lorg/apache/spark/sql/execution/vectorized/WritableColumnVector;Lorg/apache/spark/sql/execution/datasources/parquet/VectorizedValuesReader;Lorg/apache/spark/sql/execution/datasources/parquet/ParquetVectorUpdater;)V+7
   j  
org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(ILorg/apache/spark/sql/execution/vectorized/WritableColumnVector;)V+375
   j  
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch()Z+112
   j  
org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue()Z+13
   j  
org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext()Z+19
   j  
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext()Z+18
   j  org.apache.spark.sql.execution.FileSourceScanExec$$anon$1.hasNext()Z+8
   j  
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.columnartorow_nextBatch_0$(Lorg/apache/spark/sql/catalyst/expressions/GeneratedClass$GeneratedIteratorForCodegenStage1;)V+6
   J 19200 C2 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext()V
 (194 bytes) @ 0x00007f256a3a3a5c [0x00007f256a3a3820+0x23c]
   J 18817 C2 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext()Z (31 
bytes) @ 0x00007f256c7bb2c0 [0x00007f256c7bb260+0x60]
   J 19187 C2 org.apache.spark.ContextAwareIterator.hasNext()Z (38 bytes) @ 
0x00007f256a37f1ac [0x00007f256a37f0a0+0x10c]
   J 12714 C2 scala.collection.Iterator$$anon$10.hasNext()Z (10 bytes) @ 
0x00007f256a9cf0a4 [0x00007f256a9cf060+0x44]
   J 12714 C2 scala.collection.Iterator$$anon$10.hasNext()Z (10 bytes) @ 
0x00007f256a9cf0a4 [0x00007f256a9cf060+0x44]
   J 19215 C2 
scala.collection.Iterator$GroupedIterator.takeDestructively(I)Lscala/collection/Seq;
 (50 bytes) @ 0x00007f25696b7c38 [0x00007f25696b7ae0+0x158]
   J 12032 C1 scala.collection.Iterator$GroupedIterator.go(I)Z (218 bytes) @ 
0x00007f25695c144c [0x00007f25695c0fc0+0x48c]
   J 10331 C1 scala.collection.Iterator$GroupedIterator.fill()Z (42 bytes) @ 
0x00007f256aa58f14 [0x00007f256aa58b20+0x3f4]
   J 10330 C1 scala.collection.Iterator$GroupedIterator.hasNext()Z (18 bytes) @ 
0x00007f256aa595bc [0x00007f256aa59500+0xbc]
   J 12714 C2 scala.collection.Iterator$$anon$10.hasNext()Z (10 bytes) @ 
0x00007f256a9cf0a4 [0x00007f256a9cf060+0x44]
   J 7864 C2 scala.collection.AbstractIterator.foreach(Lscala/Function1;)V (6 
bytes) @ 0x00007f2569ab96f0 [0x00007f2569ab9660+0x90]
   J 16662 C1 
org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(Lscala/collection/Iterator;Ljava/io/DataOutputStream;)V
 (14 bytes) @ 0x00007f256becae5c [0x00007f256becaa00+0x45c]
   j  
org.apache.spark.sql.execution.python.PythonUDFRunner$$anon$1.writeIteratorToStream(Ljava/io/DataOutputStream;)V+8
   J 16844 C1 
org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(Lorg/apache/spark/api/python/BasePythonRunner$WriterThread;)Ljava/lang/Object;
 (952 bytes) @ 0x00007f256bfabaa4 [0x00007f256bfa05e0+0xb4c4]
   J 16843 C1 
org.apache.spark.api.python.BasePythonRunner$WriterThread$$Lambda$2220.apply()Ljava/lang/Object;
 (8 bytes) @ 0x00007f256bf70cfc [0x00007f256bf70c80+0x7c]
   J 15649 C1 
org.apache.spark.util.Utils$.logUncaughtExceptions(Lscala/Function0;)Ljava/lang/Object;
 (66 bytes) @ 0x00007f256ba5dcac [0x00007f256ba5dba0+0x10c]
   J 16840 C1 org.apache.spark.api.python.BasePythonRunner$WriterThread.run()V 
(14 bytes) @ 0x00007f256bf9c794 [0x00007f256bf9c340+0x454]
   v  ~StubRoutines::call_stub
   V  [libjvm.so+0x6b04aa]
   V  [libjvm.so+0x6ada8b]
   V  [libjvm.so+0x6ae077]
   V  [libjvm.so+0x755edb]
   V  [libjvm.so+0xb08d2f]
   V  [libjvm.so+0xb0a0fa]
   V  [libjvm.so+0x990552]
   C  [libpthread.so.0+0x76db]  start_thread+0xdb
   
   
   ---------------  P R O C E S S  ---------------
   
   Java Threads: ( => current thread )
     0x00007f24082e3800 JavaThread "Worker Monitor for /usr/bin/python3.6" 
daemon [_thread_blocked, id=4880, stack(0x00007f234dc45000,0x00007f234dd46000)]
   =>0x00007f24082e7800 JavaThread "stdout writer for /usr/bin/python3.6" 
daemon [_thread_in_Java, id=4879, stack(0x00007f2358ed5000,0x00007f2358fd6000)]
   [...]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ankurdave commented on pull request #34245: [SPARK-37088][PYSPARK][SQL] Writer thread must not access input after task completion listener returns

Reply via email to