eike.segg...@sevenval.com>
Cc: Sidney Feiner <sidney.fei...@startapp.com>; user@spark.apache.org
Subject: Re: [PySpark - 1.6] - Avoid object serialization
Alternatively, using the broadcast functionality can also help with this.
On Thu, Dec 29, 2016 at 3:05 AM Eike von Seggern
<eike.
Alternatively, using the broadcast functionality can also help with this.
On Thu, Dec 29, 2016 at 3:05 AM Eike von Seggern
wrote:
> 2016-12-28 20:17 GMT+01:00 Chawla,Sumit :
>
> Would this work for you?
>
> def processRDD(rdd):
> analyzer =
2016-12-28 20:17 GMT+01:00 Chawla,Sumit :
> Would this work for you?
>
> def processRDD(rdd):
> analyzer = ShortTextAnalyzer(root_dir)
> rdd.foreach(lambda record: analyzer.analyze_short_text_
> event(record[1]))
>
> ssc.union(*streams).filter(lambda x: x[1] !=
Would this work for you?
def processRDD(rdd):
analyzer = ShortTextAnalyzer(root_dir)
rdd.foreach(lambda record: analyzer.analyze_short_text_event(record[1]))
ssc.union(*streams).filter(lambda x: x[1] != None)
.foreachRDD(lambda rdd: processRDD(rdd))
Regards
Sumit Chawla
On Wed, Dec