Hi,
I'm applying preprocessing methods on big data of text by using spark-Java.
I created my own NLP pipline as a normal java code and call it in the map
function like this:

MyRDD.map(call nlp pipeline fr each row)

I run my job in a cluster 14 machines(32 Cores  and about 140G for each).
The job run correctltly, it distrbutes the documents across executors, but
the job stuck on the last task for several minutes
I looked at the job details, I found that most of documents are processed
in several executrs, but only one task stuck on the small number of
documents, it looks like the task waits for something, then after 10-20
minutes the task cntinues to process the rest documents and finish.

I also tried to test different configurations but still the same.
any help?

thanks,
Donni

Reply via email to