Re: WEKA logistic regression on hadoop

Bertrand Dechoux Tue, 16 Oct 2012 06:58:45 -0700

Weka is indeed a more complete package of data mining solutions but its aim
is not to support Hadoop whereas it is the aim of Mahout.


The implemented methods are standard data mining methods. If you are
looking for Hadoop support you should ask the Mahout mailing list but if
you have question on Weka itself you should ask the Weka mailing list. Not
all algorithms are easy to migrate to Hadoop and lots of data mining
applications are fine without a Hadoop cluster eg the netflix prize
provided a 'big' public dataset but it was only about 1 GB.

Regards

Bertrand


On Tue, Oct 16, 2012 at 3:46 PM, Rajesh Nikam <[email protected]> wrote:

> Hi,
>
> I was looking for logistic regression algorithms on hadoop.
> mahout is one good package to use on hadoop, however I am not able to get
> could results with my experiments.
>
> There are logistic regression algorithms supported with WEKA which I have
> used on Windows.
> I guess I should be able to run these algos from JAR files as is on linux.
>
> java -classpath weka.jar weka.classifiers.functions.Logistic -R 1.0E-8 -M
> 6 -t lr.arff
>
> Have anyone ported them to take advantage of hadoop ?
>
> How to interpret the output generated from it like what is Coefficients
> and Odds Ratios that could be used for classification ?
>
>
> Options: -R 1.0E-8 -M 6
>
> Logistic Regression with ridge parameter of 1.0E-8
> Coefficients...
>                  Class
> Variable       class_1
> ======================
> a1                   0
> a2                   0
> a3                   0
> a4              0.0082
> a5              0.0151
> a6             -0.1034
> a7                   0
> a8                   0
> a9                   0
> a10            -0.0397
> a11            -0.0003
> a13            -0.1195
> a14            -0.1389
> Intercept      -21.487
>
>
> Odds Ratios...
>                  Class
> Variable       class_1
> ======================
> a1                   1
> a2                   1
> a3                   1
> a4              1.0083
> a5              1.0152
> a6              0.9018
> a7                   1
> a8                   1
> a9                   1
> a10              0.961
> a11             0.9997
> a13             0.8873
> a14             0.8703
>
> Time taken to build model: 6.39 seconds
> Time taken to test model on training data: 1.86 seconds
>
> === Error on training data ===
>
> Correctly Classified Instances       49528               99.9173 %
> Incorrectly Classified Instances        41                0.0827 %
> Kappa statistic                          0.9983
> Mean absolute error                      0.0011
> Root mean squared error                  0.0244
> Relative absolute error                  0.2202 %
> Root relative squared error              4.895  %
> Total Number of Instances            49569
>
>
> === Confusion Matrix ===
>
>      a     b   <-- classified as
>  26526    37 |     a = class_1
>      4 23002 |     b = class_2
>
>
>
> === Stratified cross-validation ===
>
> Correctly Classified Instances       49492               99.8447 %
> Incorrectly Classified Instances        77                0.1553 %
> Kappa statistic                          0.9969
> Mean absolute error                      0.0015
> Root mean squared error                  0.0358
> Relative absolute error                  0.3108 %
> Root relative squared error              7.1718 %
> Total Number of Instances            49569
>
>
> === Confusion Matrix ===
>
>      a     b   <-- classified as
>  26532    31 |     a = class_1
>     46 22960 |     b = class_2
>
> Thanks in advance.
> Rajesh
>



-- 
Bertrand Dechoux

Re: WEKA logistic regression on hadoop

Reply via email to