Re: How to share a NonSerializable variable among tasks in the same worker node?

2015-01-21 Thread Sean Owen
Singletons aren't hacks; it can be an entirely appropriate pattern for this. What exception do you get? From Spark or your code? I think this pattern is orthogonal to using Spark. On Jan 21, 2015 8:11 AM, octavian.ganea octavian.ga...@inf.ethz.ch wrote: In case someone has the same problem:

Re: How to share a NonSerializable variable among tasks in the same worker node?

2015-01-21 Thread octavian.ganea
In case someone has the same problem: The singleton hack works for me sometimes, sometimes it doesn't in spark 1.2.0, that is, sometimes I get nullpointerexception. Anyway, if you really need to work with big indexes and you want to have the smallest amount of communication between master and

Re: How to share a NonSerializable variable among tasks in the same worker node?

2015-01-20 Thread Fengyun RAO
currently we migrate from 1.1 to 1.2, and found our program 3x slower, maybe due to the singleton hack? could you explain in detail why or how The singleton hack works very different in spark 1.2.0 thanks! 2015-01-18 20:56 GMT+08:00 octavian.ganea octavian.ga...@inf.ethz.ch: The singleton

Re: How to share a NonSerializable variable among tasks in the same worker node?

2015-01-18 Thread octavian.ganea
The singleton hack works very different in spark 1.2.0 (it does not work if the program has multiple map-reduce jobs in the same program). I guess there should be an official documentation on how to have each machine/node do an init step locally before executing any other instructions (e.g.

Re: How to share a NonSerializable variable among tasks in the same worker node?

2014-08-10 Thread DB Tsai
Spark cached the RDD in JVM, so presumably, yes, the singleton trick should work. Sent from my Google Nexus 5 On Aug 9, 2014 11:00 AM, Kevin James Matzen kmat...@cs.cornell.edu wrote: I have a related question. With Hadoop, I would do the same thing for non-serializable objects and setup().

Re: How to share a NonSerializable variable among tasks in the same worker node?

2014-08-09 Thread Fengyun RAO
Although nobody answers the Two questions, in my practice, it seems both are yes. 2014-08-04 19:50 GMT+08:00 Fengyun RAO raofeng...@gmail.com: object LogParserWrapper { private val logParser = { val settings = new ... val builders = new new

Re: How to share a NonSerializable variable among tasks in the same worker node?

2014-08-09 Thread Kevin James Matzen
I have a related question. With Hadoop, I would do the same thing for non-serializable objects and setup(). I also had a use case where it was so expensive to initialize the non-serializable object that I would make it a static member of the mapper, turn on JVM reuse across tasks, and then

Re: How to share a NonSerializable variable among tasks in the same worker node?

2014-08-03 Thread Ron's Yahoo!
I think you’re going to have to make it serializable by registering it with the Kryo registrator. I think multiple workers are running as separate VMs so it might need to be able to serialize and deserialize broadcasted variables to the different executors. Thanks, Ron On Aug 3, 2014, at 6:38

How to share a NonSerializable variable among tasks in the same worker node?

2014-07-31 Thread Fengyun RAO
As shown here: 2 - Why Is My Spark Job so Slow and Only Using a Single Thread? http://engineering.sharethrough.com/blog/2013/09/13/top-3-troubleshooting-tips-to-keep-you-sparking/ 123456789101112131415 object JSONParser { def parse(raw: String): String = ...}object MyFirstSparkJob { def