failure to parallelize an RDD

AlexG Tue, 12 Jan 2016 18:38:17 -0800

I transpose a matrix (colChunkOfA) stored as a 200-by-54843210 as an array of
rows in Array[Array[Float]] format into another matrix (rowChunk) also
stored row-wise as a 54843210-by-200 Array[Array[Float]] using the following
code:


val rowChunk = new Array[Tuple2[Int,Array[Float]]](numCols)        
val colIndices = (0 until colChunkOfA.length).toArray        

(0 until numCols).foreach( rowIdx => {         
  rowChunk(rowIdx) = Tuple2(rowIdx, colIndices.map(colChunkOfA(_)(rowIdx)))
})       

This succeeds, but the following code which attempts to turn rowChunk into
an RDD fails silently: spark-submit just ends, and none of the executor logs
indicate any errors occurring. 

val parallelRowChunkRDD = sc.parallelize(rowChunk).cache
parallelRowChunkRDD.count

What is the culprit here?

Here is the log output starting from the count instruction:

16/01/13 02:23:38 INFO SparkContext: Starting job: count at
transposeAvroToAvroChunks.scala:129
16/01/13 02:23:38 INFO DAGScheduler: Got job 3 (count at
transposeAvroToAvroChunks.scala:129) with 928 output partitions
16/01/13 02:23:38 INFO DAGScheduler: Final stage: ResultStage 3(count at
transposeAvroToAvroChunks.scala:129)
16/01/13 02:23:38 INFO DAGScheduler: Parents of final stage: List()
16/01/13 02:23:38 INFO DAGScheduler: Missing parents: List()
16/01/13 02:23:38 INFO DAGScheduler: Submitting ResultStage 3
(ParallelCollectionRDD[2448] at parallelize at
transposeAvroToAvroChunks.scala:128), which has no missing parents
16/01/13 02:23:38 INFO MemoryStore: ensureFreeSpace(1048) called with
curMem=50917367, maxMem=127452201615
16/01/13 02:23:38 INFO MemoryStore: Block broadcast_615 stored as values in
memory (estimated size 1048.0 B, free 118.7 GB)
16/01/13 02:23:38 INFO MemoryStore: ensureFreeSpace(740) called with
curMem=50918415, maxMem=127452201615
16/01/13 02:23:38 INFO MemoryStore: Block broadcast_615_piece0 stored as
bytes in memory (estimated size 740.0 B, free 118.7 GB)
16/01/13 02:23:38 INFO BlockManagerInfo: Added broadcast_615_piece0 in
memory on 172.31.36.112:36581 (size: 740.0 B, free: 118.7 GB)
16/01/13 02:23:38 INFO SparkContext: Created broadcast 615 from broadcast at
DAGScheduler.scala:861
16/01/13 02:23:38 INFO DAGScheduler: Submitting 928 missing tasks from
ResultStage 3 (ParallelCollectionRDD[2448] at parallelize at
transposeAvroToAvroChunks.scala:128)
16/01/13 02:23:38 INFO TaskSchedulerImpl: Adding task set 3.0 with 928 tasks
16/01/13 02:23:39 WARN TaskSetManager: Stage 3 contains a task of very large
size (47027 KB). The maximum recommended task size is 100 KB.
16/01/13 02:23:39 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID
1219, 172.31.34.184, PROCESS_LOCAL, 48156290 bytes)
...
16/01/13 02:27:13 INFO TaskSetManager: Starting task 927.0 in stage 3.0 (TID
2146, 172.31.42.67, PROCESS_LOCAL, 48224789 bytes)
16/01/13 02:27:17 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
172.31.36.112:36581 in memory (size: 17.4 KB, free: 118.7 GB)
16/01/13 02:27:21 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
172.31.35.157:51059 in memory (size: 17.4 KB, free: 10.4 GB)
16/01/13 02:27:21 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
172.31.47.118:34888 in memory (size: 17.4 KB, free: 10.4 GB)
16/01/13 02:27:22 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
172.31.38.42:48582 in memory (size: 17.4 KB, free: 10.4 GB)
16/01/13 02:27:38 INFO BlockManagerInfo: Added broadcast_615_piece0 in
memory on 172.31.41.68:59281 (size: 740.0 B, free: 10.4 GB)
16/01/13 02:27:55 INFO BlockManagerInfo: Added broadcast_615_piece0 in
memory on 172.31.47.118:59575 (size: 740.0 B, free: 10.4 GB)
16/01/13 02:28:47 INFO BlockManagerInfo: Added broadcast_615_piece0 in
memory on 172.31.40.24:55643 (size: 740.0 B, free: 10.4 GB)
16/01/13 02:28:49 INFO BlockManagerInfo: Added broadcast_615_piece0 in
memory on 172.31.47.118:53671 (size: 740.0 B, free: 10.4 GB)
     
This is the end of the log, so it looks like all 928 tasks got started, but
presumably somewhere in running, they ran into an error. Nothing shows up in
the executor logs.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/failure-to-parallelize-an-RDD-tp25950.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

failure to parallelize an RDD

Reply via email to