[jira] [Updated] (SPARK-1980) problems introduced by broadcast
[ https://issues.apache.org/jira/browse/SPARK-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoudi updated SPARK-1980: -- Description: i am writing a word embedding on SPARK. The scale of the model is about 60w * 100 * Float.size. Because of the large scale, I have to use broadcast to deliver the current model to executors. After each iteration, I would update the model, and then broadcast it again. The pseudo-code is as follows, for (i <- 0 to 100) { broadcast_model <- broadcast(model); e_model = xxx.map(Func(broadcast_model)) // handle of broadcast_model to Func; .reduce(_ + _) model <- model + e_model // Update the model } My problem is that an Error would come out after six iteration. The Error Info is as follows, ./bin/spark-submit: line 44: 28232 killed $SPARK_HOME/bin/spark-class org.apache.spark.deploy.Spark "${ORIG_ARGS[@]}" was: i am writing a word embedding on SPARK. The scale of the model is about 60w * 100 * Float.size. Because of the large scale, I have to use broadcast to deliver the current model to executors. After each iteration, I would update the model, and then broadcast it again. The pseudo-code is as follows, for (i <- 0 to 100) { broadcast_model <- broadcast(model) e_model = xxx.map(Func(broadcast_model)) // handle of broadcast_model to Func .reduce(_ + _) model <- model + e_model } My problem is that an Error would come out after six iteration. The Error Info is as follows, ./bin/spark-submit: line 44: 28232 killed $SPARK_HOME/bin/spark-class org.apache.spark.deploy.Spark "${ORIG_ARGS[@]}" > problems introduced by broadcast > > > Key: SPARK-1980 > URL: https://issues.apache.org/jira/browse/SPARK-1980 > Project: Spark > Issue Type: Bug >Reporter: zhoudi > > i am writing a word embedding on SPARK. The scale of the model is about 60w * > 100 * Float.size. Because of the large scale, I have to use broadcast to > deliver the current model to executors. After each iteration, I would update > the model, and then broadcast it again. The pseudo-code is as follows, > for (i <- 0 to 100) { > broadcast_model <- broadcast(model); > e_model = xxx.map(Func(broadcast_model)) // handle of broadcast_model to > Func; > .reduce(_ + _) > model <- model + e_model // Update the model > } > My problem is that an Error would come out after six iteration. The Error > Info is as follows, > ./bin/spark-submit: line 44: 28232 killed > $SPARK_HOME/bin/spark-class org.apache.spark.deploy.Spark "${ORIG_ARGS[@]}" > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1980) problems introduced by broadcast
[ https://issues.apache.org/jira/browse/SPARK-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoudi updated SPARK-1980: -- Description: i am writing a word embedding on SPARK. The scale of the model is about 60w * 100 * Float.size. Because of the large scale, I have to use broadcast to deliver the current model to executors. After each iteration, I would update the model, and then broadcast it again. The pseudo-code is as follows, for (i <- 0 to 100) { broadcast_model <- broadcast(model) e_model = xxx.map(Func(broadcast_model)) // handle of broadcast_model to Func .reduce(_ + _) model <- model + e_model } My problem is that an Error would come out after six iteration. The Error Info is as follows, ./bin/spark-submit: line 44: 28232 killed $SPARK_HOME/bin/spark-class org.apache.spark.deploy.Spark "${ORIG_ARGS[@]}" was:i am writing a word embedding on SPARK. The scale of the model is about 60w * 100 * Float.size. > problems introduced by broadcast > > > Key: SPARK-1980 > URL: https://issues.apache.org/jira/browse/SPARK-1980 > Project: Spark > Issue Type: Bug >Reporter: zhoudi > > i am writing a word embedding on SPARK. The scale of the model is about 60w * > 100 * Float.size. Because of the large scale, I have to use broadcast to > deliver the current model to executors. After each iteration, I would update > the model, and then broadcast it again. The pseudo-code is as follows, > for (i <- 0 to 100) { > broadcast_model <- broadcast(model) > e_model = xxx.map(Func(broadcast_model)) // handle of broadcast_model to > Func > .reduce(_ + _) > model <- model + e_model > } > My problem is that an Error would come out after six iteration. The Error > Info is as follows, > ./bin/spark-submit: line 44: 28232 killed > $SPARK_HOME/bin/spark-class org.apache.spark.deploy.Spark "${ORIG_ARGS[@]}" > -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (SPARK-1980) problems introduced by broadcast
[ https://issues.apache.org/jira/browse/SPARK-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoudi updated SPARK-1980: -- Description: i am writing a word embedding on SPARK. The scale of the model is about 60w * 100 * Float.size. > problems introduced by broadcast > > > Key: SPARK-1980 > URL: https://issues.apache.org/jira/browse/SPARK-1980 > Project: Spark > Issue Type: Bug >Reporter: zhoudi > > i am writing a word embedding on SPARK. The scale of the model is about 60w * > 100 * Float.size. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1980) problems introduced by broadcast
zhoudi created SPARK-1980: - Summary: problems introduced by broadcast Key: SPARK-1980 URL: https://issues.apache.org/jira/browse/SPARK-1980 Project: Spark Issue Type: Bug Reporter: zhoudi -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (SPARK-1961) when data return from map is about 10 kb, reduce(_ + _) would always pending
zhoudi created SPARK-1961: - Summary: when data return from map is about 10 kb, reduce(_ + _) would always pending Key: SPARK-1961 URL: https://issues.apache.org/jira/browse/SPARK-1961 Project: Spark Issue Type: Bug Affects Versions: 1.0.0 Reporter: zhoudi -- This message was sent by Atlassian JIRA (v6.2#6252)