The cost depends on the feature dimension, number of instances, number
of classes, and number of partitions. Do you mind sharing those
numbers? -Xiangrui

On Wed, Oct 1, 2014 at 6:31 PM, Mike Bernico <mike.bern...@gmail.com> wrote:
> Hi Everyone,
>
> I'm working on training mllib's Naive Bayes to classify TF/IDF vectoried
> docs using Spark 1.1.0.
>
> I've gotten this to work fine on a smaller set of data, but when I increase
> the number of vectorized documents  I get hung up on training.  The only
> messages I'm seeing are below.  I'm pretty new to spark and I don't really
> know where to go next to troubleshoot this.
>
> I'm running spark in yarn like this:
> spark-shell --master yarn-client --executor-memory 7G --driver memory 7G
> --num-executors 3
>
> I have three workers, each with 64G of ram and 8 cores.
>
>
>
> scala> val model = NaiveBayes.train(training, lambda = 1.0)
> 14/10/01 19:40:34 ERROR YarnClientClusterScheduler: Lost executor 2 on
> rpl0000001273.<removed>: remote Akka client disassociated
> 14/10/01 19:40:34 WARN TaskSetManager: Lost task 195.0 in stage 5.0 (TID
> 2940, rpl0000001273.<removed>): ExecutorLostFailure (executor lost)
> 14/10/01 19:40:34 WARN TaskSetManager: Lost task 190.0 in stage 5.0 (TID
> 2782, rpl0000001272.<removed>): FetchFailed(BlockManagerId(2,
> rpl0000001273.<removed>, 57359, 0), shuffleId=1, mapId=0, reduceId=190)
> 14/10/01 19:40:35 WARN TaskSetManager: Lost task 195.1 in stage 5.0 (TID
> 2941, rpl0000001272.<removed>): FetchFailed(BlockManagerId(2,
> rpl0000001273.<removed>, 57359, 0), shuffleId=1, mapId=0, reduceId=195)
> 14/10/01 19:40:36 WARN TaskSetManager: Lost task 185.0 in stage 5.0 (TID
> 2780, rpl0000001277.<removed>): FetchFailed(BlockManagerId(2,
> rpl0000001273.<removed>, 57359, 0), shuffleId=1, mapId=0, reduceId=185)
> 14/10/01 19:46:24 ERROR YarnClientClusterScheduler: Lost executor 1 on
> rpl0000001272.<removed>: remote Akka client disassociated
> 14/10/01 19:46:24 WARN TaskSetManager: Lost task 78.0 in stage 5.1 (TID
> 3377, rpl0000001272.<removed>): ExecutorLostFailure (executor lost)
> 14/10/01 19:46:25 WARN TaskSetManager: Lost task 79.0 in stage 5.1 (TID
> 3378, rpl0000001273.<removed>): FetchFailed(BlockManagerId(1,
> rpl0000001272.<removed>, 60926, 0), shuffleId=1, mapId=5, reduceId=220)
> 14/10/01 19:46:25 WARN TaskSetManager: Lost task 78.1 in stage 5.1 (TID
> 3379, rpl0000001273.<removed>): FetchFailed(BlockManagerId(1,
> rpl0000001272.<removed>, 60926, 0), shuffleId=1, mapId=5, reduceId=215)
> 14/10/01 19:46:29 WARN TaskSetManager: Lost task 73.0 in stage 5.1 (TID
> 3372, rpl0000001277.<removed>): FetchFailed(BlockManagerId(1,
> rpl0000001272.<removed>, 60926, 0), shuffleId=1, mapId=9, reduceId=210)
> 14/10/01 19:57:27 ERROR YarnClientClusterScheduler: Lost executor 3 on
> rpl0000001277.<removed>: remote Akka client disassociated
> 14/10/01 19:57:27 WARN TaskSetManager: Lost task 177.0 in stage 5.2 (TID
> 4015, rpl0000001277.<removed>): ExecutorLostFailure (executor lost)
> 14/10/01 19:57:27 ERROR ConnectionManager: Corresponding SendingConnection
> to ConnectionManagerId(rpl0000001277.<removed>,41425) not found
> 14/10/01 19:57:30 WARN TaskSetManager: Lost task 182.0 in stage 5.2 (TID
> 4020, rpl0000001272.<removed>): FetchFailed(BlockManagerId(3,
> rpl0000001277.<removed>, 41425, 0), shuffleId=1, mapId=2, reduceId=340)
> 14/10/01 19:57:30 WARN TaskSetManager: Lost task 177.1 in stage 5.2 (TID
> 4022, rpl0000001272.<removed>): FetchFailed(BlockManagerId(3,
> rpl0000001277.<removed>, 41425, 0), shuffleId=1, mapId=2, reduceId=335)
> 14/10/01 19:57:36 WARN TaskSetManager: Lost task 183.0 in stage 5.2 (TID
> 4021, rpl0000001273.<removed>): FetchFailed(BlockManagerId(3,
> rpl0000001277.<removed>, 41425, 0), shuffleId=1, mapId=8, reduceId=345)
> 14/10/01 20:20:22 ERROR YarnClientClusterScheduler: Lost executor 4 on
> rpl0000001273.<removed>: remote Akka client disassociated
> 14/10/01 20:20:22 WARN TaskSetManager: Lost task 527.0 in stage 5.3 (TID
> 5159, rpl0000001273.<removed>): ExecutorLostFailure (executor lost)
> 14/10/01 20:20:23 WARN TaskSetManager: Lost task 517.0 in stage 5.3 (TID
> 5149, rpl0000001272.<removed>): FetchFailed(BlockManagerId(4,
> rpl0000001273.<removed>, 51049, 0), shuffleId=1, mapId=6, reduceId=690)
> 14/10/01 20:20:23 WARN TaskSetManager: Lost task 527.1 in stage 5.3 (TID
> 5160, rpl0000001272.<removed>): FetchFailed(BlockManagerId(4,
> rpl0000001273.<removed>, 51049, 0), shuffleId=1, mapId=5, reduceId=700)
> 14/10/01 20:20:25 WARN TaskSetManager: Lost task 522.0 in stage 5.3 (TID
> 5154, rpl0000001277.<removed>): FetchFailed(BlockManagerId(4,
> rpl0000001273.<removed>, 51049, 0), shuffleId=1, mapId=5, reduceId=695)

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to