Re: pyspark script fails on EMR with an ERROR in configuring object.

2014-08-03 Thread Rahul Bhojwani
lassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(Co

pyspark script fails on EMR with an ERROR in configuring object.

2014-08-03 Thread Rahul Bhojwani
Hi, I used to run spark scripts on local machine. Now i am porting my codes to EMR and i am facing lots of problem. The main one now is that the spark script which is running properly on my local machine is giving error when run on Amazon EMR Cluster. Here is the error: [image: Inline image 1]

Re: Installing Spark 0.9.1 on EMR Cluster

2014-07-31 Thread Rahul Bhojwani
t we have downloaded > the tar ball , extracted and configured accordingly and it worked fine. > > I believe you would want to write a custom script which does all these > things and add it like a bootstrap action. > > Thanks, > Sai > On Jul 31, 2014 2:42 PM, "Rahul Bhojwani&quo

Installing Spark 0.9.1 on EMR Cluster

2014-07-31 Thread Rahul Bhojwani
I wanted to install Spark version 0.9.1 on Amazon EMR Cluster. Can anyone give the install script which I can pass as the custom bootstrap action while creating a cluster? Thanks -- [image: http://] Rahul K Bhojwani [image: http://]about.me/rahul_bhojwani

Re: Can we get a spark context inside a mapper

2014-07-15 Thread Rahul Bhojwani
ka (http://www.cs.waikato.ac.nz/ml/weka/). You can then >> load your text files as an RDD of strings with SparkContext.wholeTextFiles >> and call Weka on each one. >> >> Matei >> >> On Jul 14, 2014, at 11:30 AM, Rahul Bhojwani >> wrote: >> >> I understand th

Re: Can we get a spark context inside a mapper

2014-07-14 Thread Rahul Bhojwani
I understand that the question is very unprofessional, but I am a newbie. If you could share some link where I can ask such questions, if not here. But please answer. On Mon, Jul 14, 2014 at 6:52 PM, Rahul Bhojwani wrote: > Hey, My question is for this situation: > Suppose we have

Can we get a spark context inside a mapper

2014-07-14 Thread Rahul Bhojwani
Hey, My question is for this situation: Suppose we have 10 files each containing list of features in each row. Task is that for each file cluster the features in that file and write the corresponding cluster along with it in a new file. So we have to generate 10 more files by applying clu

Error in spark: Exception in thread "delete Spark temp dir"

2014-07-14 Thread Rahul Bhojwani
I am getting an error saying: Exception in thread "delete Spark temp dir C:\Users\shawn\AppData\Local\Temp\spark-b4f1105c-d67b-488c-83f9-eff1d1b95786" java.io.IOExcept ion: Failed to delete: C:\Users\shawn\AppData\Local\Temp\spark-b4f1105c-d67b-488c-83f9-eff1d1b95786\tmppr36zu at org.apac

Re: Does MLlib Naive Bayes implementation incorporates Laplase smoothing?

2014-07-10 Thread Rahul Bhojwani
.org/jira/browse/SPARK/ > > Bertrand > > > On Thu, Jul 10, 2014 at 2:37 PM, Rahul Bhojwani < > rahulbhojwani2...@gmail.com> wrote: > >> And also that there is a small bug in implementation. As I mentioned this >> earlier also. >> >> This is my first tim

Re: Does MLlib Naive Bayes implementation incorporates Laplase smoothing?

2014-07-10 Thread Rahul Bhojwani
(Bug according to me.) I m not trying to be selfish. Its just that if I get something that can help make my profile look strong then I shouldn't miss it at this stage. Thanks, On Thu, Jul 10, 2014 at 5:54 PM, Rahul Bhojwani wrote: > Ya thanks. I can see that lambda is used as the p

Re: Does MLlib Naive Bayes implementation incorporates Laplase smoothing?

2014-07-10 Thread Rahul Bhojwani
Yes there is a smoothing parameter, and yes from the looks of it it is > simply additive / Laplace smoothing. It's been in there for a while. > > > On Thu, Jul 10, 2014 at 6:55 AM, Rahul Bhojwani < > rahulbhojwani2...@gmail.com> wrote: > >> The discussion is in

Does MLlib Naive Bayes implementation incorporates Laplase smoothing?

2014-07-09 Thread Rahul Bhojwani
The discussion is in context for spark 0.9.1 Does MLlib Naive Bayes implementation incorporates Laplase smoothing? Or any other smoothing? Or it doesn't encorporates any smoothing?? Please inform? Thanks, -- Rahul K Bhojwani 3rd Year B.Tech Computer Science and Engineering National Institute of T

Spark 0.9.1 implementation of MLlib-NaiveBayes is having bug.

2014-07-09 Thread Rahul Bhojwani
According to me there is BUG in MLlib Naive Bayes implementation in spark 0.9.1. Whom should I report this to or with whom should I discuss? I can discuss this over call as well. My Skype ID : rahul.bhijwani Phone no: +91-9945197359 Thanks, -- Rahul K Bhojwani 3rd Year B.Tech Computer Scien

Error using MLlib-NaiveBayes : "Matrices are not aligned"

2014-07-09 Thread Rahul Bhojwani
I am using Naive Bayes in MLlib . Below I have printed log of *model.theta*. after training on train data. You can check that it contains 9 features for 2 class classification. >>print numpy.log(model.theta) [[ 0.31618962 0.16636852 0.07200358 0.05411449 0.08542039 0.17620751 0.03711986

Re: Error: Could not delete temporary files.

2014-07-08 Thread Rahul Bhojwani
of your program, so Spark > > can clean up after itself? > > > > On Tue, Jul 8, 2014 at 12:40 PM, Rahul Bhojwani > > wrote: > >> Here I am adding my code. If you can have a look to help me out. > >> Thanks > >> ### > >

Re: Error and doubts in using Mllib Naive bayes for text clasification

2014-07-08 Thread Rahul Bhojwani
t; likelihood, we only need summation. > > Best, > Xiangrui > > On Tue, Jul 8, 2014 at 12:01 AM, Rahul Bhojwani > wrote: > > I am really sorry. Its actually my mistake. My problem 2 is wrong because > > using a single feature is a senseless thing. Sorry for the inconvenien

Re: Is MLlib NaiveBayes implementation for Spark 0.9.1 correct?

2014-07-08 Thread Rahul Bhojwani
for text > classificiation. I would recommend upgrading to v1.0. -Xiangrui > > On Tue, Jul 8, 2014 at 7:20 AM, Rahul Bhojwani > wrote: > > Hi, > > > > I wanted to use Naive Bayes for a text classification problem.I am using > > Spark 0.9.1. > >

Re: How to incorporate the new data in the MLlib-NaiveBayes model along with predicting?

2014-07-08 Thread Rahul Bhojwani
d to update the priors and conditional > probabilities, which means we should also remember the number of > observations for the updates. > > Best, > Xiangrui > > On Tue, Jul 8, 2014 at 7:35 AM, Rahul Bhojwani > wrote: > > Hi, > > > > I am using the MLli

OutOfMemory : Java heap space error

2014-07-08 Thread Rahul Bhojwani
Hi, My code was running properly but then it suddenly gave this error. Can you just put some light on it. ### 0 KB, free: 38.7 MB) 14/07/09 01:46:12 INFO BlockManagerMaster: Updated info of block rdd_2212_4 14/07/09 01:46:13 INFO PythonRDD: Times: total = 1486, boot = 698, ini

Re: Error: Could not delete temporary files.

2014-07-08 Thread Rahul Bhojwani
$4.run(Utils.scala:212) These are the logs. Can you suggest something after looking at it. On Wed, Jul 9, 2014 at 1:10 AM, Rahul Bhojwani wrote: > Here I am adding my code. If you can have a look to help me out. > Thanks > ### &g

Re: Error: Could not delete temporary files.

2014-07-08 Thread Rahul Bhojwani
feature = [] feature.append(float(prediction)) feature.append(float(pos_count)) feature.append(float(neg_count)) print feature train_data.append(feature) model = NaiveBayes.train(sc.parallelize(array(train_data))) file_predicted.write(msg + "##&

Re: Error: Could not delete temporary files.

2014-07-08 Thread Rahul Bhojwani
tor being killed. For > example, Yarn will do that if you're going over the requested memory > limits. > > On Tue, Jul 8, 2014 at 12:17 PM, Rahul Bhojwani > wrote: > > HI, > > > > I am getting this error. Can anyone help out to explain why is this error > >

Error: Could not delete temporary files.

2014-07-08 Thread Rahul Bhojwani
HI, I am getting this error. Can anyone help out to explain why is this error coming. Exception in thread "delete Spark temp dir C:\Users\shawn\AppData\Local\Temp\spark-27f60467-36d4-4081-aaf5-d0ad42dda560" java.io.IOException: Failed to delete: C:\Users\shawn\AppData\Local\Temp\spark-

How to incorporate the new data in the MLlib-NaiveBayes model along with predicting?

2014-07-08 Thread Rahul Bhojwani
Hi, I am using the MLlib Naive Bayes for a text classification problem. I have very less amount of training data. And then the data will be coming continuously and I need to classify it as either A or B. I am training the MLlib Naive Bayes model using the training data but next time when data come

Is MLlib NaiveBayes implementation for Spark 0.9.1 correct?

2014-07-08 Thread Rahul Bhojwani
Hi, I wanted to use Naive Bayes for a text classification problem.I am using Spark 0.9.1. I was just curious to ask that is the Naive Bayes implementation in Spark 0.9.1 correct? Or are there any bugs in the Spark 0.9.1 implementation which are taken care in Spark 1.0. My question is specific abou

Re: Error and doubts in using Mllib Naive bayes for text clasification

2014-07-08 Thread Rahul Bhojwani
I am really sorry. Its actually my mistake. My problem 2 is wrong because using a single feature is a senseless thing. Sorry for the inconvenience. But still I will be waiting for the solutions for problem 1 and 3. Thanks, On Tue, Jul 8, 2014 at 12:14 PM, Rahul Bhojwani wrote: > Hello, &g

Error and doubts in using Mllib Naive bayes for text clasification

2014-07-07 Thread Rahul Bhojwani
Hello, I am a novice.I want to classify the text into two classes. For this purpose I want to use Naive Bayes model. I am using Python for it. Here are the problems I am facing: *Problem 1:* I wanted to use all words as features for the bag of words model. Which means my features will be count

Re: Local file being refrenced in mapper function

2014-05-30 Thread Rahul Bhojwani
Thanks jey I was hellpful. On Sat, May 31, 2014 at 12:45 AM, Rahul Bhojwani < rahulbhojwani2...@gmail.com> wrote: > Thanks Marcelo, > > It actually made my few concepts clear. (y). > > > On Fri, May 30, 2014 at 10:14 PM, Marcelo Vanzin > wrote: > >> Hello

Re: Local file being refrenced in mapper function

2014-05-30 Thread Rahul Bhojwani
Thanks Marcelo, It actually made my few concepts clear. (y). On Fri, May 30, 2014 at 10:14 PM, Marcelo Vanzin wrote: > Hello there, > > On Fri, May 30, 2014 at 9:36 AM, Marcelo Vanzin > wrote: > > workbook = xlsxwriter.Workbook('output_excel.xlsx') > > worksheet = workbook.add_worksheet() > >

Local file being refrenced in mapper function

2014-05-30 Thread Rahul Bhojwani
Hi, I recently posted a question on stackoverflow but didn't get any reply. I joined the mailing list now. Can anyone of you guide me a way for the problem mentioned in http://stackoverflow.com/questions/23923966/writing-the-rdd-data-in-excel-file-along-mapping-in-apache-spark Thanks in advance