Re: pyspark+spacy throwing pickling exception

2018-02-15 Thread Holden Karau
So you left out the exception. On one hand I’m also not sure how well spacy serializes, so to debug this I would start off by moving the nlp = inside of my function and see if it still fails. On Thu, Feb 15, 2018 at 9:08 PM Selvam Raman wrote: > import spacy > > nlp =

Pyspark UDF/map fucntion throws pickling exception

2018-02-15 Thread Selvam Raman
import spacy nlp = spacy.load('en') def getPhrases(content): phrases = [] doc = nlp(str(content)) for chunks in doc.noun_chunks: phrases.append(chunks.text) return phrases the above function will retrieve the noun phrases from the content and return list of phrases.

pyspark+spacy throwing pickling exception

2018-02-15 Thread Selvam Raman
import spacy nlp = spacy.load('en') def getPhrases(content): phrases = [] doc = nlp(str(content)) for chunks in doc.noun_chunks: phrases.append(chunks.text) return phrases the above function will retrieve the noun phrases from the content and return list of phrases.

Re: Pyspark UDF/map fucntion throws pickling exception

2018-02-15 Thread Selvam Raman
pyspark - 2.2.1 spacy - 2.0.7 python - 3.6 Placing full logs here Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyspark/cloudpickle.py", line 148, in dump return Pickler.dump(self, obj) File

Re: pyspark+spacy throwing pickling exception

2018-02-15 Thread Selvam Raman
Hi , i solved the issue when i extract the method into another class. Failure: Class extract.py - contains the whole implementation. Because of this single class driver trying to serialize spacy(english) object and sending to executor. There i am facing pickling exception. Success: Class