Thanks, Sean!

It works, but as the link in 2 - Why Is My Spark Job so Slow and Only Using
a Single Thread?
<http://engineering.sharethrough.com/blog/2013/09/13/top-3-troubleshooting-tips-to-keep-you-sparking/>
 says " parser instance is now a singleton created in the scope of our
driver program" which I thought was in the scope of executor. Am I wrong,
or why?

I didn't want the equivalent of "setup()" method, since I want to share the
"parser" among tasks in the same worker node. It takes tens of seconds to
initialize a "parser". What's more, I want to know if the "parser" could
have a field such as ConcurrentHashMap which all tasks in the node may
get() of put() items.




2014-08-04 16:35 GMT+08:00 Sean Owen <so...@cloudera.com>:

> The parser does not need to be serializable. In the line:
>
> lines.map(line => JSONParser.parse(line))
>
> ... the parser is called but there is no parser object that with state
> that can be serialized. Are you sure it does not work?
>
> The error message alluded to originally refers to an object not shown
> in the code, so I'm not 100% sure this was the original issue.
>
> If you want, the equivalent of "setup()" is really "writing some code
> at the start of a call to mapPartitions()"
>
> On Mon, Aug 4, 2014 at 8:40 AM, Fengyun RAO <raofeng...@gmail.com> wrote:
> > Thanks, Ron.
> >
> > The problem is that the "parser" is written in another package which is
> not
> > serializable.
> >
> > In mapreduce, I could create the "parser" in the map setup() method.
> >
> > Now in spark, I want to create it for each worker, and share it among all
> > the tasks on the same work node.
> >
> > I know different workers run on different machine, but it doesn't have to
> > communicate between workers.
>

Reply via email to