Re: OptionalDataException during Naive Bayes Training

2017-05-23 Thread elitejyo
/OptionalDataException-during-Naive-Bayes-Training-tp21059p28704.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Spark ML - Naive Bayes - how to select Threshold values

2016-11-07 Thread Nirav Patel
Few questions about `thresholds` parameter: This is what doc says "Param for Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values >= 0. The class with largest value p/t is predicted, where p

naive bayes results to not match published results

2016-01-26 Thread Andy Davidson
I have been getting strange results from Naïve Bayes. The javadoc included a link to a reference paper http://nlp.stanford.edu/IR-book/html/htmledition/naive-bayes-text-classifica tion-1.html . The test data in trivial you can easily do the computations by hand. To try and figure out what

Re: How to compute the probability of each class in Naive Bayes

2015-09-10 Thread Adamantios Corais
great. so, provided that *model.theta* represents the log-probabilities and (hence the result of *brzPi + brzTheta * testData.toBreeze* is a big number too), how can I get back the *non-*log-probabilities which - apparently - are bounded between *0.0 and 1.0*? *// Adamantios* On Tue, Sep 1,

Re: How to compute the probability of each class in Naive Bayes

2015-09-10 Thread Sean Owen
Yes, https://github.com/apache/spark/blob/v1.5.0/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala#L158 is the method you are interested in. It does normalize the probabilities and return them to non-log-space. So you can use predictProbabilities to get the actual

Re: How to compute the probability of each class in Naive Bayes

2015-09-10 Thread Sean Owen
The log probabilities are unlikely to be very large, though the probabilities may be very small. The direct answer is to exponentiate brzPi + brzTheta * testData.toBreeze -- apply exp(x). I have forgotten whether the probabilities are normalized already though. If not you'll have to normalize to

Re: How to compute the probability of each class in Naive Bayes

2015-09-10 Thread Adamantios Corais
Thanks Sean. As far as I can see probabilities are NOT normalized; denominator isn't implemented in either v1.1.0 or v1.5.0 (by denominator, I refer to the probability of feature X). So, for given lambda, how to compute the denominator? FYI:

Re: How to compute the probability of each class in Naive Bayes

2015-09-01 Thread Sean Owen
(pedantic: it's the log-probabilities) On Tue, Sep 1, 2015 at 10:48 AM, Yanbo Liang wrote: > Actually > brzPi + brzTheta * testData.toBreeze > is the probabilities of the input Vector on each class, however it's a > Breeze Vector. > Pay attention the index of this Vector need

Re: How to compute the probability of each class in Naive Bayes

2015-09-01 Thread Yanbo Liang
Actually brzPi + brzTheta * testData.toBreeze is the probabilities of the input Vector on each class, however it's a Breeze Vector. Pay attention the index of this Vector need to map to the corresponding label index. 2015-08-28 20:38 GMT+08:00 Adamantios Corais : >

How to compute the probability of each class in Naive Bayes

2015-08-28 Thread Adamantios Corais
Hi, I am trying to change the following code so as to get the probabilities of the input Vector on each class (instead of the class itself with the highest probability). I know that this is already available as part of the most recent release of Spark but I have to use Spark 1.1.0. Any help is

Re: MLlib - Naive Bayes Problem

2015-04-20 Thread Xiangrui Meng
to give a description of a car and the program to classify the category of that car. So i decided to use multinomial naive Bayes. I created a unique id for each word and replaced my whole category,description data. //My input 2,25187 15095 22608 28756 17862 29523 499 32681 9830 24957 18993 19501

MLlib - Naive Bayes Problem

2015-04-16 Thread riginos
I have a big dataset of categories of cars and descriptions of cars. So i want to give a description of a car and the program to classify the category of that car. So i decided to use multinomial naive Bayes. I created a unique id for each word and replaced my whole category,description data

Re: Naive Bayes model fails after a few predictions

2015-02-17 Thread Xiangrui Meng
Could you share the error log? What do you mean by 500 instead of 200? If this is the number of files, try to use `repartition` before calling naive Bayes, which works the best when the number of partitions matches the number of cores, or even less. -Xiangrui On Tue, Feb 10, 2015 at 10:34 PM

Naive Bayes model fails after a few predictions

2015-02-10 Thread rkgurram
Hi, I have built a Sentiment Analyzer using the Naive Bayes model, the model works fine by learning from a list of 200 movie reviews and correctly predicting with an accuracy of close to 77% to 80%. After a while of predicting I get the following stacktrace... By the way...I have only one

naive bayes text classifier with tf-idf in pyspark

2015-02-06 Thread Imran Akbar
Hi, I've got the following code http://pastebin.com/3kexKwg6 that's almost complete, but I have 2 questions: 1) Once I've computed the TF-IDF vector, how do I compute the vector for each string to feed into the LabeledPoint? 2) Does MLLib provide any methods to evaluate the model's precision,

Re: OptionalDataException during Naive Bayes Training

2015-01-09 Thread Xiangrui Meng
(ObjectInputStream.java:1896) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801) What could be the reason behind this? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OptionalDataException-during-Naive-Bayes-Training-tp21059.html Sent from

OptionalDataException during Naive Bayes Training

2015-01-09 Thread jatinpreet
(ObjectInputStream.java:1801) What could be the reason behind this? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OptionalDataException-during-Naive-Bayes-Training-tp21059.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: MLlib Naive Bayes classifier confidence

2014-12-04 Thread MariusFS
. It would appear as a - log(P(evidence)) term. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-Naive-Bayes-classifier-confidence-tp18456p20361.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: MLlib Naive Bayes classifier confidence

2014-12-03 Thread Sean Owen
/MLlib-Naive-Bayes-classifier-confidence-tp18456p20175.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail

Re: MLlib Naive Bayes classifier confidence

2014-12-02 Thread MariusFS
/MLlib-Naive-Bayes-classifier-confidence-tp18456p20175.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Probability in Naive Bayes

2014-11-17 Thread Samarth Mailinglist
I am trying to use Naive Bayes for a project of mine in Python and I want to obtain the probability value after having built the model. Suppose I have two classes - A and B. Currently there is an API to to find which class a sample belongs to (predict). Now, I want to find the probability

Re: Probability in Naive Bayes

2014-11-17 Thread Sean Owen
it is just the sum of the class probabilities. You won't be able to compute this otherwise from what Naive Bayes computes. On Nov 18, 2014 7:42 AM, Samarth Mailinglist mailinglistsama...@gmail.com wrote: I am trying to use Naive Bayes for a project of mine in Python and I want to obtain

Re: MLlib Naive Bayes classifier confidence

2014-11-10 Thread Sean Owen
to eliminate the samples that were classified with low confidence. Thanks, Jatin - Novice Big Data Programmer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-Naive-Bayes-classifier-confidence-tp18456.html Sent from the Apache Spark User

Re: MLlib Naive Bayes classifier confidence

2014-11-10 Thread jatinpreet
. Any suggestions of a way out other than replicating the whole functionality of Naive Baye's model in Java? That would be a time consuming process. - Novice Big Data Programmer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-Naive-Bayes-classifier

Re: MLlib Naive Bayes classifier confidence

2014-11-10 Thread Sean Owen
It's hacky, but you could access these fields via reflection. It'd be better to propose opening them up in a PR. On Mon, Nov 10, 2014 at 9:25 AM, jatinpreet jatinpr...@gmail.com wrote: Thanks for the answer. The variables brzPi and brzTheta are declared private. I am writing my code with Java

Re: MLlib Naive Bayes classifier confidence

2014-11-10 Thread jatinpreet
Programmer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLlib-Naive-Bayes-classifier-confidence-tp18456p18497.html Sent from the Apache Spark User List mailing list archive at Nabble.com

MLlib - Naive Bayes Java example bug

2014-11-03 Thread Dariusz Kobylarz
Hi, I noticed a bug in the sample java code in MLlib - Naive Bayes docs page: http://spark.apache.org/docs/1.1.0/mllib-naive-bayes.html In the filter: |double accuracy = 1.0 * predictionAndLabel.filter(new FunctionTuple2Double, Double, Boolean() { @Override public Boolean call

Re: MLlib - Naive Bayes Java example bug

2014-11-03 Thread Sean Owen
: Hi, I noticed a bug in the sample java code in MLlib - Naive Bayes docs page: http://spark.apache.org/docs/1.1.0/mllib-naive-bayes.html In the filter: double accuracy = 1.0 * predictionAndLabel.filter(new FunctionTuple2Double, Double, Boolean() { @Override public Boolean call

Serialize/deserialize Naive Bayes model and index files

2014-10-15 Thread jatinpreet
Hi, I am trying to persist the files generated as a result of Naive bayes training with MLlib. These comprise of the model file, label index(own class) and term dictionary(own class). I need to save them on an HDFS location and then deserialize when needed for prediction. How can I do the same

Re: Help Troubleshooting Naive Bayes

2014-10-02 Thread Sandy Ryza
Everyone, I'm working on training mllib's Naive Bayes to classify TF/IDF vectoried docs using Spark 1.1.0. I've gotten this to work fine on a smaller set of data, but when I increase the number of vectorized documents I get hung up on training. The only messages I'm seeing are below

Help Troubleshooting Naive Bayes

2014-10-01 Thread Mike Bernico
Hi Everyone, I'm working on training mllib's Naive Bayes to classify TF/IDF vectoried docs using Spark 1.1.0. I've gotten this to work fine on a smaller set of data, but when I increase the number of vectorized documents I get hung up on training. The only messages I'm seeing are below. I'm

Re: Help Troubleshooting Naive Bayes

2014-10-01 Thread Xiangrui Meng
The cost depends on the feature dimension, number of instances, number of classes, and number of partitions. Do you mind sharing those numbers? -Xiangrui On Wed, Oct 1, 2014 at 6:31 PM, Mike Bernico mike.bern...@gmail.com wrote: Hi Everyone, I'm working on training mllib's Naive Bayes

Naive Bayes

2014-08-19 Thread Phuoc Do
I'm trying Naive Bayes classifier for Higg Boson challenge on Kaggle: http://www.kaggle.com/c/higgs-boson Here's the source code I'm working on: https://github.com/dnprock/SparkHiggBoson/blob/master/src/main/scala/KaggleHiggBosonLabel.scala Training data looks like

Re: Naive Bayes

2014-08-19 Thread Xiangrui Meng
What is the ratio of examples labeled `s` to those labeled `b`? Also, Naive Bayes doesn't work on negative feature values. It assumes term frequencies as the input. We should throw an exception on negative feature values. -Xiangrui On Tue, Aug 19, 2014 at 12:07 AM, Phuoc Do phu...@vida.io wrote

Re: Naive Bayes

2014-08-19 Thread Phuoc Do
Hi Xiangrui, Training data: 42945 s out of 124659. Test data: 42722 s out of 125341. The ratio is very much the same. I tried Decision Tree. It outputs 0 to 1 decimals. I don't quite understand it yet. Would feature scaling make it work for Naive Bayes? Phuoc Do On Tue, Aug 19, 2014 at 12:51

Re: Naive Bayes

2014-08-19 Thread Xiangrui Meng
Xiangrui, Training data: 42945 s out of 124659. Test data: 42722 s out of 125341. The ratio is very much the same. I tried Decision Tree. It outputs 0 to 1 decimals. I don't quite understand it yet. Would feature scaling make it work for Naive Bayes? Phuoc Do On Tue, Aug 19, 2014 at 12:51 AM

Re: Naive Bayes

2014-08-19 Thread Phuoc Do
the same. I tried Decision Tree. It outputs 0 to 1 decimals. I don't quite understand it yet. Would feature scaling make it work for Naive Bayes? Phuoc Do On Tue, Aug 19, 2014 at 12:51 AM, Xiangrui Meng men...@gmail.com wrote: What is the ratio of examples labeled `s` to those

Re: Naive Bayes parameters

2014-08-07 Thread SK
to be cleaned up during the next release. I am currently using Spark version 1.0.1. thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Bayes-parameters-tp11592p11623.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Naive Bayes parameters

2014-08-07 Thread Xiangrui Meng
Spark version 1.0.1. thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Bayes-parameters-tp11592p11623.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Naive Bayes parameters

2014-08-06 Thread SK
: http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Bayes-parameters-tp11592.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

Re: One question about RDD.zip function when trying Naive Bayes

2014-07-11 Thread x
I tried my test case with Spark 1.0.1 and saw the same result(27 pairs becomes 25 pairs after zip). Could someone please check it? Regards, xj On Thu, Jul 3, 2014 at 2:31 PM, Xiangrui Meng men...@gmail.com wrote: This is due to a bug in sampling, which was fixed in 1.0.1 and latest master.

Re: Does MLlib Naive Bayes implementation incorporates Laplase smoothing?

2014-07-10 Thread Sean Owen
. On Thu, Jul 10, 2014 at 6:55 AM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: The discussion is in context for spark 0.9.1 Does MLlib Naive Bayes implementation incorporates Laplase smoothing? Or any other smoothing? Or it doesn't encorporates any smoothing?? Please inform? Thanks

Re: Does MLlib Naive Bayes implementation incorporates Laplase smoothing?

2014-07-10 Thread Rahul Bhojwani
there is a smoothing parameter, and yes from the looks of it it is simply additive / Laplace smoothing. It's been in there for a while. On Thu, Jul 10, 2014 at 6:55 AM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: The discussion is in context for spark 0.9.1 Does MLlib Naive Bayes

Re: Does MLlib Naive Bayes implementation incorporates Laplase smoothing?

2014-07-10 Thread Rahul Bhojwani
from the looks of it it is simply additive / Laplace smoothing. It's been in there for a while. On Thu, Jul 10, 2014 at 6:55 AM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: The discussion is in context for spark 0.9.1 Does MLlib Naive Bayes implementation incorporates Laplase smoothing

Re: Does MLlib Naive Bayes implementation incorporates Laplase smoothing?

2014-07-10 Thread Bertrand Dechoux
. On Thu, Jul 10, 2014 at 6:55 AM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: The discussion is in context for spark 0.9.1 Does MLlib Naive Bayes implementation incorporates Laplase smoothing? Or any other smoothing? Or it doesn't encorporates any smoothing?? Please inform? Thanks

Re: Does MLlib Naive Bayes implementation incorporates Laplase smoothing?

2014-07-10 Thread Rahul Bhojwani
I have created the issue: In MLlib, implementation for Naive Bayes in Spark 0.9.1 is having an implementation bug Have a look at it. Thanks, On Thu, Jul 10, 2014 at 8:37 PM, Bertrand Dechoux decho...@gmail.com wrote: A patch proposal on the apache JIRA for Spark? https://issues.apache.org

Does MLlib Naive Bayes implementation incorporates Laplase smoothing?

2014-07-09 Thread Rahul Bhojwani
The discussion is in context for spark 0.9.1 Does MLlib Naive Bayes implementation incorporates Laplase smoothing? Or any other smoothing? Or it doesn't encorporates any smoothing?? Please inform? Thanks, -- Rahul K Bhojwani 3rd Year B.Tech Computer Science and Engineering National Institute

Error and doubts in using Mllib Naive bayes for text clasification

2014-07-08 Thread Rahul Bhojwani
Hello, I am a novice.I want to classify the text into two classes. For this purpose I want to use Naive Bayes model. I am using Python for it. Here are the problems I am facing: *Problem 1:* I wanted to use all words as features for the bag of words model. Which means my features will be count

Re: Error and doubts in using Mllib Naive bayes for text clasification

2014-07-08 Thread Xiangrui Meng
, I am a novice.I want to classify the text into two classes. For this purpose I want to use Naive Bayes model. I am using Python for it. Here are the problems I am facing: Problem 1: I wanted to use all words as features for the bag of words model. Which means my features will be count

Re: Error and doubts in using Mllib Naive bayes for text clasification

2014-07-08 Thread Rahul Bhojwani
for the solutions for problem 1 and 3. Thanks, On Tue, Jul 8, 2014 at 12:14 PM, Rahul Bhojwani rahulbhojwani2...@gmail.com wrote: Hello, I am a novice.I want to classify the text into two classes. For this purpose I want to use Naive Bayes model. I am using Python

One question about RDD.zip function when trying Naive Bayes

2014-07-02 Thread x
Hello, I a newbie to Spark MLlib and ran into a curious case when following the instruction at the page below. http://spark.apache.org/docs/latest/mllib-naive-bayes.html I ran a test program on my local machine using some data. val spConfig = (new

Re: One question about RDD.zip function when trying Naive Bayes

2014-07-02 Thread Xiangrui Meng
This is due to a bug in sampling, which was fixed in 1.0.1 and latest master. See https://github.com/apache/spark/pull/1234 . -Xiangrui On Wed, Jul 2, 2014 at 8:23 PM, x wasedax...@gmail.com wrote: Hello, I a newbie to Spark MLlib and ran into a curious case when following the instruction at

Re: One question about RDD.zip function when trying Naive Bayes

2014-07-02 Thread x
Thanks for the confirm. I will be checking it. Regards, xj On Thu, Jul 3, 2014 at 2:31 PM, Xiangrui Meng men...@gmail.com wrote: This is due to a bug in sampling, which was fixed in 1.0.1 and latest master. See https://github.com/apache/spark/pull/1234 . -Xiangrui On Wed, Jul 2, 2014 at

Re: Running out of memory Naive Bayes

2014-04-28 Thread DB Tsai
Our customer asked us to implement Naive Bayes which should be able to at least train news20 one year ago, and we implemented for them in Hadoop using distributed cache to store the model. Sincerely, DB Tsai --- My Blog: https://www.dbtsai.com

Re: Running out of memory Naive Bayes

2014-04-28 Thread Matei Zaharia
Not sure if this is always ideal for Naive Bayes, but you could also hash the features into a lower-dimensional space (e.g. reduce it to 50,000 features). For each feature simply take MurmurHash3(featureID) % 5 for example. Matei On Apr 27, 2014, at 11:24 PM, DB Tsai dbt...@stanford.edu

Re: Running out of memory Naive Bayes

2014-04-26 Thread John King
I'm just wondering are the SparkVector calculations really taking into account the sparsity or just converting to dense? On Fri, Apr 25, 2014 at 10:06 PM, John King usedforprinting...@gmail.comwrote: I've been trying to use the Naive Bayes classifier. Each example in the dataset is about 2

Re: Running out of memory Naive Bayes

2014-04-26 Thread DB Tsai
usedforprinting...@gmail.com wrote: I've been trying to use the Naive Bayes classifier. Each example in the dataset is about 2 million features, only about 20-50 of which are non-zero, so the vectors are very sparse. I keep running out of memory though, even for about 1000 examples on 30gb RAM

Re: Running out of memory Naive Bayes

2014-04-26 Thread Xiangrui Meng
calculations really taking into account the sparsity or just converting to dense? On Fri, Apr 25, 2014 at 10:06 PM, John King usedforprinting...@gmail.com wrote: I've been trying to use the Naive Bayes classifier. Each example in the dataset is about 2 million features, only about 20-50 of which

Running out of memory Naive Bayes

2014-04-25 Thread John King
I've been trying to use the Naive Bayes classifier. Each example in the dataset is about 2 million features, only about 20-50 of which are non-zero, so the vectors are very sparse. I keep running out of memory though, even for about 1000 examples on 30gb RAM while the entire dataset is 4 million