Have you tried the obvious (increase the heap size of your JVM)? On Tue, Jul 8, 2014 at 2:02 PM, Rahul Bhojwani <rahulbhojwani2...@gmail.com> wrote: > Thanks Marcelo. > I was having another problem. My code was running properly and then it > suddenly stopped with the error: > > java.lang.OutOfMemoryError: Java heap space > at java.io.BufferedOutputStream.<init>(Unknown Source) > at > org.apache.spark.api.python.PythonRDD$$anon$2.run(PythonRDD.scala:62) > > Can you help in that? > > > On Wed, Jul 9, 2014 at 2:07 AM, Marcelo Vanzin <van...@cloudera.com> wrote: >> >> Sorry, that would be sc.stop() (not close). >> >> On Tue, Jul 8, 2014 at 1:31 PM, Marcelo Vanzin <van...@cloudera.com> >> wrote: >> > Hi Rahul, >> > >> > Can you try calling "sc.close()" at the end of your program, so Spark >> > can clean up after itself? >> > >> > On Tue, Jul 8, 2014 at 12:40 PM, Rahul Bhojwani >> > <rahulbhojwani2...@gmail.com> wrote: >> >> Here I am adding my code. If you can have a look to help me out. >> >> Thanks >> >> ####################### >> >> >> >> import tokenizer >> >> import gettingWordLists as gl >> >> from pyspark.mllib.classification import NaiveBayes >> >> from numpy import array >> >> from pyspark import SparkContext, SparkConf >> >> >> >> conf = (SparkConf().setMaster("local[6]").setAppName("My >> >> app").set("spark.executor.memory", "1g")) >> >> >> >> sc=SparkContext(conf = conf) >> >> # Getting the positive dict: >> >> >> >> pos_list = [] >> >> pos_list = gl.getPositiveList() >> >> neg_list = gl.getNegativeList() >> >> >> >> #print neg_list >> >> tok = tokenizer.Tokenizer(preserve_case=False) >> >> train_data = [] >> >> >> >> with open("training_file_coach.csv","r") as train_file: >> >> for line in train_file: >> >> tokens = line.split("######") >> >> msg = tokens[0] >> >> sentiment = tokens[1] >> >> pos_count = 0 >> >> neg_count = 0 >> >> # print sentiment + "\n\n" >> >> # print msg >> >> tokens = set(tok.tokenize(msg)) >> >> for i in tokens: >> >> if i.encode('utf-8') in pos_list: >> >> pos_count+=1 >> >> if i.encode('utf-8') in neg_list: >> >> neg_count+=1 >> >> if sentiment.__contains__('NEG'): >> >> label = 0.0 >> >> else: >> >> label = 1.0 >> >> >> >> feature = [] >> >> feature.append(label) >> >> feature.append(float(pos_count)) >> >> feature.append(float(neg_count)) >> >> train_data.append(feature) >> >> train_file.close() >> >> >> >> model = NaiveBayes.train(sc.parallelize(array(train_data))) >> >> >> >> >> >> file_predicted = open("predicted_file_coach.csv","w") >> >> >> >> with open("prediction_file_coach.csv","r") as predict_file: >> >> for line in predict_file: >> >> msg = line[0:-1] >> >> pos_count = 0 >> >> neg_count = 0 >> >> # print sentiment + "\n\n" >> >> # print msg >> >> tokens = set(tok.tokenize(msg)) >> >> for i in tokens: >> >> if i.encode('utf-8') in pos_list: >> >> pos_count+=1 >> >> if i.encode('utf-8') in neg_list: >> >> neg_count+=1 >> >> prediction = >> >> model.predict(array([float(pos_count),float(neg_count)])) >> >> if prediction == 0: >> >> sentiment = "NEG" >> >> elif prediction == 1: >> >> sentiment = "POS" >> >> else: >> >> print "ERROR\n\n\n\n\n\n\nERROR" >> >> >> >> feature = [] >> >> feature.append(float(prediction)) >> >> feature.append(float(pos_count)) >> >> feature.append(float(neg_count)) >> >> print feature >> >> train_data.append(feature) >> >> model = NaiveBayes.train(sc.parallelize(array(train_data))) >> >> file_predicted.write(msg + "######" + sentiment + "\n") >> >> >> >> file_predicted.close() >> >> ################### >> >> >> >> If you can have a look at the code and help me out, It would be great >> >> >> >> Thanks >> >> >> >> >> >> On Wed, Jul 9, 2014 at 12:54 AM, Rahul Bhojwani >> >> <rahulbhojwani2...@gmail.com> wrote: >> >>> >> >>> Hi Marcelo. >> >>> Thanks for the quick reply. Can you suggest me how to increase the >> >>> memory >> >>> limits or how to tackle this problem. I am a novice. If you want I can >> >>> post >> >>> my code here. >> >>> >> >>> >> >>> Thanks >> >>> >> >>> >> >>> On Wed, Jul 9, 2014 at 12:50 AM, Marcelo Vanzin <van...@cloudera.com> >> >>> wrote: >> >>>> >> >>>> This is generally a side effect of your executor being killed. For >> >>>> example, Yarn will do that if you're going over the requested memory >> >>>> limits. >> >>>> >> >>>> On Tue, Jul 8, 2014 at 12:17 PM, Rahul Bhojwani >> >>>> <rahulbhojwani2...@gmail.com> wrote: >> >>>> > HI, >> >>>> > >> >>>> > I am getting this error. Can anyone help out to explain why is this >> >>>> > error >> >>>> > coming. >> >>>> > >> >>>> > ######## >> >>>> > >> >>>> > Exception in thread "delete Spark temp dir >> >>>> > >> >>>> > >> >>>> > C:\Users\shawn\AppData\Local\Temp\spark-27f60467-36d4-4081-aaf5-d0ad42dda560" >> >>>> > java.io.IOException: Failed to delete: >> >>>> > >> >>>> > >> >>>> > C:\Users\shawn\AppData\Local\Temp\spark-27f60467-36d4-4081-aaf5-d0ad42dda560\tmp >> >>>> > cmenlp >> >>>> > at >> >>>> > org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:483) >> >>>> > at >> >>>> > >> >>>> > >> >>>> > org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:479) >> >>>> > at >> >>>> > >> >>>> > >> >>>> > org.apache.spark.util.Utils$$anonfun$deleteRecursively$1.apply(Utils.scala:478) >> >>>> > at >> >>>> > >> >>>> > >> >>>> > scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) >> >>>> > at >> >>>> > >> >>>> > scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34) >> >>>> > at >> >>>> > org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:478) >> >>>> > at org.apache.spark.util.Utils$$anon$4.run(Utils.scala:212) >> >>>> > PS> >> >>>> > ############ >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> > Thanks in advance >> >>>> > -- >> >>>> > Rahul K Bhojwani >> >>>> > 3rd Year B.Tech >> >>>> > Computer Science and Engineering >> >>>> > National Institute of Technology, Karnataka >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Marcelo >> >>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Rahul K Bhojwani >> >>> 3rd Year B.Tech >> >>> Computer Science and Engineering >> >>> National Institute of Technology, Karnataka >> >> >> >> >> >> >> >> >> >> -- >> >> Rahul K Bhojwani >> >> 3rd Year B.Tech >> >> Computer Science and Engineering >> >> National Institute of Technology, Karnataka >> > >> > >> > >> > -- >> > Marcelo >> >> >> >> -- >> Marcelo > > > > > -- > Rahul K Bhojwani > 3rd Year B.Tech > Computer Science and Engineering > National Institute of Technology, Karnataka
-- Marcelo