Anyone to confirm this?
On Jan 20, 2014, at 12:22 PM, Milos Nikolic <[email protected]> wrote:
> Hello,
>
> I think there is a bug with TorrentBroadcast in the latest release (0.8.1).
> The problem is that even a simple job (e.g., rdd.count) hangs waiting for
> some tasks to finish. Here is how to reproduce the problem:
>
> 1) Configure Spark such that node X is the master and also one of the workers
> (e.g., 5 nodes => 5 workers and 1 master)
> 2) Activate TorrentBroadcast
> 3) Use Kryo serializer (the problem happens more often than with Java
> serializer)
> 4) Read some file from HDFS, persist RDD, and call count
>
> In almost 80% of the cases (~50% with Java serializer), the count job hangs
> waiting for two tasks from node X to finish. The problem *does not* appear
> if: 1) I separate the master from the worker nodes, or 2) I use
> HttpBroadcast, or 3) I do not persist the RDD.
>
> The code is below.
>
> def main(args: Array[String]): Unit = {
>
> System.setProperty("spark.serializer",
> "org.apache.spark.serializer.KryoSerializer")
> System.setProperty("spark.kryo.registrator", "test.MyRegistrator")
> System.setProperty("spark.broadcast.factory",
> "org.apache.spark.broadcast.TorrentBroadcastFactory")
>
> val sc = new SparkContext(...)
>
> val file = "hdfs://server:9000/user/xxx/Test.out" // ~750MB
> val rdd = sc.textFile(file)
> rdd.persist
> println("Counting: " + rdd.count)
> }
>
>
> Best regards,
> Milos