Is it possible to implement Vector Space Model using PySpark

2018-09-23 Thread Soheil Pourbafrani
Hi, I want to implement the Vector Space Model for texts using Spark. At the first step, I calculate the Vector of the files (dictionary) and I made it a broadcast variable to be accessible for all executors. Vector_of_Words = selected_data.select('full_text').rdd\ .map(lambda x :

Re: Lightweight pipeline execution for single eow

2018-09-23 Thread Michael Artz
Are you using the scheduler in fair mode instead of fifo mode? Sent from my iPhone > On Sep 22, 2018, at 12:58 AM, Jatin Puri wrote: > > Hi. > > What tactics can I apply for such a scenario. > > I have a pipeline of 10 stages. Simple text processing. I train the data with > the pipeline

Failed to shuffle write

2018-09-23 Thread yguang11
Hello, I am fairly new to Spark, recently I was debugging some Spark application failures, one issue I found is that the executor failed to write with the following stack trace: 2018-09-23 05:05:38 ERROR Executor:91 - Exception in task 1037.0 in stage 14.0 (TID 33041)