RE: [PySpark - 1.6] - Avoid object serialization

2017-01-01 Thread Sidney Feiner
eike.segg...@sevenval.com> Cc: Sidney Feiner <sidney.fei...@startapp.com>; user@spark.apache.org Subject: Re: [PySpark - 1.6] - Avoid object serialization Alternatively, using the broadcast functionality can also help with this. On Thu, Dec 29, 2016 at 3:05 AM Eike von Seggern <eike.

Re: [PySpark - 1.6] - Avoid object serialization

2016-12-29 Thread Holden Karau
Alternatively, using the broadcast functionality can also help with this. On Thu, Dec 29, 2016 at 3:05 AM Eike von Seggern wrote: > 2016-12-28 20:17 GMT+01:00 Chawla,Sumit : > > Would this work for you? > > def processRDD(rdd): > analyzer =

Re: [PySpark - 1.6] - Avoid object serialization

2016-12-29 Thread Eike von Seggern
2016-12-28 20:17 GMT+01:00 Chawla,Sumit : > Would this work for you? > > def processRDD(rdd): > analyzer = ShortTextAnalyzer(root_dir) > rdd.foreach(lambda record: analyzer.analyze_short_text_ > event(record[1])) > > ssc.union(*streams).filter(lambda x: x[1] !=

Re: [PySpark - 1.6] - Avoid object serialization

2016-12-28 Thread Chawla,Sumit
Would this work for you? def processRDD(rdd): analyzer = ShortTextAnalyzer(root_dir) rdd.foreach(lambda record: analyzer.analyze_short_text_event(record[1])) ssc.union(*streams).filter(lambda x: x[1] != None) .foreachRDD(lambda rdd: processRDD(rdd)) Regards Sumit Chawla On Wed, Dec