Re: failure to parallelize an RDD

2016-01-12 Thread Ted Yu
Which release of Spark are you using ?

Can you turn on DEBUG logging to see if there is more clue ?

Thanks

On Tue, Jan 12, 2016 at 6:37 PM, AlexG <swift...@gmail.com> wrote:

> I transpose a matrix (colChunkOfA) stored as a 200-by-54843210 as an array
> of
> rows in Array[Array[Float]] format into another matrix (rowChunk) also
> stored row-wise as a 54843210-by-200 Array[Array[Float]] using the
> following
> code:
>
> val rowChunk = new Array[Tuple2[Int,Array[Float]]](numCols)
> val colIndices = (0 until colChunkOfA.length).toArray
>
> (0 until numCols).foreach( rowIdx => {
>   rowChunk(rowIdx) = Tuple2(rowIdx, colIndices.map(colChunkOfA(_)(rowIdx)))
> })
>
> This succeeds, but the following code which attempts to turn rowChunk into
> an RDD fails silently: spark-submit just ends, and none of the executor
> logs
> indicate any errors occurring.
>
> val parallelRowChunkRDD = sc.parallelize(rowChunk).cache
> parallelRowChunkRDD.count
>
> What is the culprit here?
>
> Here is the log output starting from the count instruction:
>
> 16/01/13 02:23:38 INFO SparkContext: Starting job: count at
> transposeAvroToAvroChunks.scala:129
> 16/01/13 02:23:38 INFO DAGScheduler: Got job 3 (count at
> transposeAvroToAvroChunks.scala:129) with 928 output partitions
> 16/01/13 02:23:38 INFO DAGScheduler: Final stage: ResultStage 3(count at
> transposeAvroToAvroChunks.scala:129)
> 16/01/13 02:23:38 INFO DAGScheduler: Parents of final stage: List()
> 16/01/13 02:23:38 INFO DAGScheduler: Missing parents: List()
> 16/01/13 02:23:38 INFO DAGScheduler: Submitting ResultStage 3
> (ParallelCollectionRDD[2448] at parallelize at
> transposeAvroToAvroChunks.scala:128), which has no missing parents
> 16/01/13 02:23:38 INFO MemoryStore: ensureFreeSpace(1048) called with
> curMem=50917367, maxMem=127452201615
> 16/01/13 02:23:38 INFO MemoryStore: Block broadcast_615 stored as values in
> memory (estimated size 1048.0 B, free 118.7 GB)
> 16/01/13 02:23:38 INFO MemoryStore: ensureFreeSpace(740) called with
> curMem=50918415, maxMem=127452201615
> 16/01/13 02:23:38 INFO MemoryStore: Block broadcast_615_piece0 stored as
> bytes in memory (estimated size 740.0 B, free 118.7 GB)
> 16/01/13 02:23:38 INFO BlockManagerInfo: Added broadcast_615_piece0 in
> memory on 172.31.36.112:36581 (size: 740.0 B, free: 118.7 GB)
> 16/01/13 02:23:38 INFO SparkContext: Created broadcast 615 from broadcast
> at
> DAGScheduler.scala:861
> 16/01/13 02:23:38 INFO DAGScheduler: Submitting 928 missing tasks from
> ResultStage 3 (ParallelCollectionRDD[2448] at parallelize at
> transposeAvroToAvroChunks.scala:128)
> 16/01/13 02:23:38 INFO TaskSchedulerImpl: Adding task set 3.0 with 928
> tasks
> 16/01/13 02:23:39 WARN TaskSetManager: Stage 3 contains a task of very
> large
> size (47027 KB). The maximum recommended task size is 100 KB.
> 16/01/13 02:23:39 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID
> 1219, 172.31.34.184, PROCESS_LOCAL, 48156290 bytes)
> ...
> 16/01/13 02:27:13 INFO TaskSetManager: Starting task 927.0 in stage 3.0
> (TID
> 2146, 172.31.42.67, PROCESS_LOCAL, 48224789 bytes)
> 16/01/13 02:27:17 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
> 172.31.36.112:36581 in memory (size: 17.4 KB, free: 118.7 GB)
> 16/01/13 02:27:21 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
> 172.31.35.157:51059 in memory (size: 17.4 KB, free: 10.4 GB)
> 16/01/13 02:27:21 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
> 172.31.47.118:34888 in memory (size: 17.4 KB, free: 10.4 GB)
> 16/01/13 02:27:22 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
> 172.31.38.42:48582 in memory (size: 17.4 KB, free: 10.4 GB)
> 16/01/13 02:27:38 INFO BlockManagerInfo: Added broadcast_615_piece0 in
> memory on 172.31.41.68:59281 (size: 740.0 B, free: 10.4 GB)
> 16/01/13 02:27:55 INFO BlockManagerInfo: Added broadcast_615_piece0 in
> memory on 172.31.47.118:59575 (size: 740.0 B, free: 10.4 GB)
> 16/01/13 02:28:47 INFO BlockManagerInfo: Added broadcast_615_piece0 in
> memory on 172.31.40.24:55643 (size: 740.0 B, free: 10.4 GB)
> 16/01/13 02:28:49 INFO BlockManagerInfo: Added broadcast_615_piece0 in
> memory on 172.31.47.118:53671 (size: 740.0 B, free: 10.4 GB)
>
> This is the end of the log, so it looks like all 928 tasks got started, but
> presumably somewhere in running, they ran into an error. Nothing shows up
> in
> the executor logs.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/failure-to-parallelize-an-RDD-tp25950.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


failure to parallelize an RDD

2016-01-12 Thread AlexG
I transpose a matrix (colChunkOfA) stored as a 200-by-54843210 as an array of
rows in Array[Array[Float]] format into another matrix (rowChunk) also
stored row-wise as a 54843210-by-200 Array[Array[Float]] using the following
code:

val rowChunk = new Array[Tuple2[Int,Array[Float]]](numCols)
val colIndices = (0 until colChunkOfA.length).toArray

(0 until numCols).foreach( rowIdx => { 
  rowChunk(rowIdx) = Tuple2(rowIdx, colIndices.map(colChunkOfA(_)(rowIdx)))
})   

This succeeds, but the following code which attempts to turn rowChunk into
an RDD fails silently: spark-submit just ends, and none of the executor logs
indicate any errors occurring. 

val parallelRowChunkRDD = sc.parallelize(rowChunk).cache
parallelRowChunkRDD.count

What is the culprit here?

Here is the log output starting from the count instruction:

16/01/13 02:23:38 INFO SparkContext: Starting job: count at
transposeAvroToAvroChunks.scala:129
16/01/13 02:23:38 INFO DAGScheduler: Got job 3 (count at
transposeAvroToAvroChunks.scala:129) with 928 output partitions
16/01/13 02:23:38 INFO DAGScheduler: Final stage: ResultStage 3(count at
transposeAvroToAvroChunks.scala:129)
16/01/13 02:23:38 INFO DAGScheduler: Parents of final stage: List()
16/01/13 02:23:38 INFO DAGScheduler: Missing parents: List()
16/01/13 02:23:38 INFO DAGScheduler: Submitting ResultStage 3
(ParallelCollectionRDD[2448] at parallelize at
transposeAvroToAvroChunks.scala:128), which has no missing parents
16/01/13 02:23:38 INFO MemoryStore: ensureFreeSpace(1048) called with
curMem=50917367, maxMem=127452201615
16/01/13 02:23:38 INFO MemoryStore: Block broadcast_615 stored as values in
memory (estimated size 1048.0 B, free 118.7 GB)
16/01/13 02:23:38 INFO MemoryStore: ensureFreeSpace(740) called with
curMem=50918415, maxMem=127452201615
16/01/13 02:23:38 INFO MemoryStore: Block broadcast_615_piece0 stored as
bytes in memory (estimated size 740.0 B, free 118.7 GB)
16/01/13 02:23:38 INFO BlockManagerInfo: Added broadcast_615_piece0 in
memory on 172.31.36.112:36581 (size: 740.0 B, free: 118.7 GB)
16/01/13 02:23:38 INFO SparkContext: Created broadcast 615 from broadcast at
DAGScheduler.scala:861
16/01/13 02:23:38 INFO DAGScheduler: Submitting 928 missing tasks from
ResultStage 3 (ParallelCollectionRDD[2448] at parallelize at
transposeAvroToAvroChunks.scala:128)
16/01/13 02:23:38 INFO TaskSchedulerImpl: Adding task set 3.0 with 928 tasks
16/01/13 02:23:39 WARN TaskSetManager: Stage 3 contains a task of very large
size (47027 KB). The maximum recommended task size is 100 KB.
16/01/13 02:23:39 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID
1219, 172.31.34.184, PROCESS_LOCAL, 48156290 bytes)
...
16/01/13 02:27:13 INFO TaskSetManager: Starting task 927.0 in stage 3.0 (TID
2146, 172.31.42.67, PROCESS_LOCAL, 48224789 bytes)
16/01/13 02:27:17 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
172.31.36.112:36581 in memory (size: 17.4 KB, free: 118.7 GB)
16/01/13 02:27:21 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
172.31.35.157:51059 in memory (size: 17.4 KB, free: 10.4 GB)
16/01/13 02:27:21 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
172.31.47.118:34888 in memory (size: 17.4 KB, free: 10.4 GB)
16/01/13 02:27:22 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
172.31.38.42:48582 in memory (size: 17.4 KB, free: 10.4 GB)
16/01/13 02:27:38 INFO BlockManagerInfo: Added broadcast_615_piece0 in
memory on 172.31.41.68:59281 (size: 740.0 B, free: 10.4 GB)
16/01/13 02:27:55 INFO BlockManagerInfo: Added broadcast_615_piece0 in
memory on 172.31.47.118:59575 (size: 740.0 B, free: 10.4 GB)
16/01/13 02:28:47 INFO BlockManagerInfo: Added broadcast_615_piece0 in
memory on 172.31.40.24:55643 (size: 740.0 B, free: 10.4 GB)
16/01/13 02:28:49 INFO BlockManagerInfo: Added broadcast_615_piece0 in
memory on 172.31.47.118:53671 (size: 740.0 B, free: 10.4 GB)
 
This is the end of the log, so it looks like all 928 tasks got started, but
presumably somewhere in running, they ran into an error. Nothing shows up in
the executor logs.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/failure-to-parallelize-an-RDD-tp25950.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: failure to parallelize an RDD

2016-01-12 Thread Alex Gittens
r: Missing parents: List()
>> 16/01/13 02:23:38 INFO DAGScheduler: Submitting ResultStage 3
>> (ParallelCollectionRDD[2448] at parallelize at
>> transposeAvroToAvroChunks.scala:128), which has no missing parents
>> 16/01/13 02:23:38 INFO MemoryStore: ensureFreeSpace(1048) called with
>> curMem=50917367, maxMem=127452201615
>> 16/01/13 02:23:38 INFO MemoryStore: Block broadcast_615 stored as values
>> in
>> memory (estimated size 1048.0 B, free 118.7 GB)
>> 16/01/13 02:23:38 INFO MemoryStore: ensureFreeSpace(740) called with
>> curMem=50918415, maxMem=127452201615
>> 16/01/13 02:23:38 INFO MemoryStore: Block broadcast_615_piece0 stored as
>> bytes in memory (estimated size 740.0 B, free 118.7 GB)
>> 16/01/13 02:23:38 INFO BlockManagerInfo: Added broadcast_615_piece0 in
>> memory on 172.31.36.112:36581 (size: 740.0 B, free: 118.7 GB)
>> 16/01/13 02:23:38 INFO SparkContext: Created broadcast 615 from broadcast
>> at
>> DAGScheduler.scala:861
>> 16/01/13 02:23:38 INFO DAGScheduler: Submitting 928 missing tasks from
>> ResultStage 3 (ParallelCollectionRDD[2448] at parallelize at
>> transposeAvroToAvroChunks.scala:128)
>> 16/01/13 02:23:38 INFO TaskSchedulerImpl: Adding task set 3.0 with 928
>> tasks
>> 16/01/13 02:23:39 WARN TaskSetManager: Stage 3 contains a task of very
>> large
>> size (47027 KB). The maximum recommended task size is 100 KB.
>> 16/01/13 02:23:39 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID
>> 1219, 172.31.34.184, PROCESS_LOCAL, 48156290 bytes)
>> ...
>> 16/01/13 02:27:13 INFO TaskSetManager: Starting task 927.0 in stage 3.0
>> (TID
>> 2146, 172.31.42.67, PROCESS_LOCAL, 48224789 bytes)
>> 16/01/13 02:27:17 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
>> 172.31.36.112:36581 in memory (size: 17.4 KB, free: 118.7 GB)
>> 16/01/13 02:27:21 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
>> 172.31.35.157:51059 in memory (size: 17.4 KB, free: 10.4 GB)
>> 16/01/13 02:27:21 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
>> 172.31.47.118:34888 in memory (size: 17.4 KB, free: 10.4 GB)
>> 16/01/13 02:27:22 INFO BlockManagerInfo: Removed broadcast_419_piece0 on
>> 172.31.38.42:48582 in memory (size: 17.4 KB, free: 10.4 GB)
>> 16/01/13 02:27:38 INFO BlockManagerInfo: Added broadcast_615_piece0 in
>> memory on 172.31.41.68:59281 (size: 740.0 B, free: 10.4 GB)
>> 16/01/13 02:27:55 INFO BlockManagerInfo: Added broadcast_615_piece0 in
>> memory on 172.31.47.118:59575 (size: 740.0 B, free: 10.4 GB)
>> 16/01/13 02:28:47 INFO BlockManagerInfo: Added broadcast_615_piece0 in
>> memory on 172.31.40.24:55643 (size: 740.0 B, free: 10.4 GB)
>> 16/01/13 02:28:49 INFO BlockManagerInfo: Added broadcast_615_piece0 in
>> memory on 172.31.47.118:53671 (size: 740.0 B, free: 10.4 GB)
>>
>> This is the end of the log, so it looks like all 928 tasks got started,
>> but
>> presumably somewhere in running, they ran into an error. Nothing shows up
>> in
>> the executor logs.
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/failure-to-parallelize-an-RDD-tp25950.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>