Re: TorrentBroadcast + persist = bug

Milos Nikolic Wed, 22 Jan 2014 01:39:26 -0800

Here they are...

logs_spark.tar.gz
Description: GNU Zip compressed data

On Jan 22, 2014, at 10:10 AM, Mosharaf Chowdhury <[email protected]> wrote:

Btw, can you please send me the master and worker logs?

--
Mosharaf Chowdhury
http://www.mosharaf.com/

On Wed, Jan 22, 2014 at 12:58 AM, Mosharaf Chowdhury <[email protected]> wrote:

Milos, thanks for reporting. It seems just initializing the TorrentBroadcast is causing the trouble (I don't see you calling it at all).

I'll try to make some time to reproduce it tomorrow and let you know.

--
Mosharaf Chowdhury
http://www.mosharaf.com/

On Wed, Jan 22, 2014 at 12:22 AM, Milos Nikolic <[email protected]> wrote:
Anyone to confirm this?

On Jan 20, 2014, at 12:22 PM, Milos Nikolic <[email protected]> wrote:

> Hello,
>
> I think there is a bug with TorrentBroadcast in the latest release (0.8.1). The problem is that even a simple job (e.g., rdd.count) hangs waiting for some tasks to finish. Here is how to reproduce the problem:
>
> 1) Configure Spark such that node X is the master and also one of the workers (e.g., 5 nodes => 5 workers and 1 master)
> 2) Activate TorrentBroadcast
> 3) Use Kryo serializer (the problem happens more often than with Java serializer)
> 4) Read some file from HDFS, persist RDD, and call count
>
> In almost 80% of the cases (~50% with Java serializer), the count job hangs waiting for two tasks from node X to finish. The problem *does not* appear if: 1) I separate the master from the worker nodes, or 2) I use HttpBroadcast, or 3) I do not persist the RDD.
>
> The code is below.
>
> def main(args: Array[String]): Unit = {
>
> System.setProperty("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
> System.setProperty("spark.kryo.registrator", "test.MyRegistrator")
> System.setProperty("spark.broadcast.factory", "org.apache.spark.broadcast.TorrentBroadcastFactory")
>
> val sc = new SparkContext(...)
>
> val file = "hdfs://server:9000/user/xxx/Test.out" // ~750MB
> val rdd = sc.textFile(file)
> rdd.persist
> println("Counting: " + rdd.count)
> }
>
>
> Best regards,
> Milos

Re: TorrentBroadcast + persist = bug

Reply via email to