Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)

2015-08-13 Thread Thushan Ganegedara
Hi,

Yes, no mater which approach used, there's always going to be outliers
which does not fit the defined rules. But for these corner cases, user
always have to opportunity to change the variable to numerical.

One more approach is to introduce a measure of replication of values in a
column. If the column shows a repetition of same values many times, imo, it
is a good indicator for detecting categorical variable.

On Fri, Aug 14, 2015 at 2:41 PM, Nirmal Fernando nir...@wso2.com wrote:



 On Fri, Aug 14, 2015 at 10:01 AM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi,

 This was mainly due to the detection of a numerical feature as a
 categorical one.
 Oh, it makes sense now. Why don't we try taking a sample of data and if
 the sample contains only integers (or doubles without any decimals) or
 strings, consider it as a categorical variable.


 I tried that approach too, but there're some datasets like automobile
 dataset normalized-losses feature, which has integer values (0-164) but
 which is probably not categorical.


 We suggested increasing the categorical threshold as a work-around.
 @thushan did it work?
 Yes, it worked. After increasing the threshold to 40.

 On Fri, Aug 14, 2015 at 2:21 PM, Nirmal Fernando nir...@wso2.com wrote:

 This was mainly due to the detection of a numerical feature as a
 categorical one.

 We suggested increasing the categorical threshold as a work-around.
 @thushan did it work?

 On Tue, Aug 11, 2015 at 5:50 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 This issue occurs, if I turn the response variable to a categorical
 variable. If I get the variable as a numerical variable, the values are
 read correctly.

 So I presume there is a fault in categorical conversion of the variable.

 On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 I still get the same result

 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
 1.0 1.0 1.0 1.0 12.012.012.012.012.0
 12.012.012.012.012.013.013.013.013.0
 13.013.0
 13.013.013.013.014.014.014.014.0
 14.014.014.014.015.015.015.015.015.0
 15.015.015.015.015.015.015.016.016.0
 16.016.0
 16.016.016.016.017.017.017.017.0
 17.017.017.017.017.017.018.018.018.0
 18.018.018.018.018.018.018.018.019.0
 19.019.0
 19.019.019.019.019.019.019.019.0
 19.019.019.02.0 2.0 2.0 2.0 2.0 2.0
 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0
 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0
 5.0 5.0
 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0
 5.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0
 6.0 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0
 7.0 7.0
 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0

 On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando nir...@wso2.com
 wrote:

 Can you use following code and try;

 ListLabeledPoint points = labeledPoints.collect();
 for(int i=0;ipoints.size();i++){
  System.out.print(points.get(i).label() + \t);
 }

 On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara thu...@gmail.com
  wrote:

 I used the following snippet

 for(int i=0;ilabeledPoints.collect().size();i++){
 System.out.print(labeledPoints.collect().get(i).label()
 + \t);
 }

 in the public 

Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)

2015-08-13 Thread Thushan Ganegedara
Hi,

This was mainly due to the detection of a numerical feature as a
categorical one.
Oh, it makes sense now. Why don't we try taking a sample of data and if the
sample contains only integers (or doubles without any decimals) or strings,
consider it as a categorical variable.

We suggested increasing the categorical threshold as a work-around.
@thushan did it work?
Yes, it worked. After increasing the threshold to 40.

On Fri, Aug 14, 2015 at 2:21 PM, Nirmal Fernando nir...@wso2.com wrote:

 This was mainly due to the detection of a numerical feature as a
 categorical one.

 We suggested increasing the categorical threshold as a work-around.
 @thushan did it work?

 On Tue, Aug 11, 2015 at 5:50 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 This issue occurs, if I turn the response variable to a categorical
 variable. If I get the variable as a numerical variable, the values are
 read correctly.

 So I presume there is a fault in categorical conversion of the variable.

 On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 I still get the same result

 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
 1.0 1.0 1.0 12.012.012.012.012.012.0
 12.012.012.012.013.013.013.013.013.013.0
 13.013.013.013.014.014.014.014.014.0
 14.014.014.015.015.015.015.015.015.0
 15.015.015.015.015.015.016.016.016.016.0
 16.016.016.016.017.017.017.017.017.0
 17.017.017.017.017.018.018.018.018.0
 18.018.018.018.018.018.018.019.019.019.0
 19.019.019.019.019.019.019.019.019.0
 19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0
 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0
 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0
 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0
 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0
 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0
 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0

 On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando nir...@wso2.com
 wrote:

 Can you use following code and try;

 ListLabeledPoint points = labeledPoints.collect();
 for(int i=0;ipoints.size();i++){
  System.out.print(points.get(i).label() + \t);
 }

 On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 I used the following snippet

 for(int i=0;ilabeledPoints.collect().size();i++){
 System.out.print(labeledPoints.collect().get(i).label() +
 \t);
 }

 in the public MLModel build() throws MLModelBuilderException in
 DeeplearningModelBuilder.java


 On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando nir...@wso2.com
 wrote:

 Hi thushan,

 We need more info. What did you exactly print and where?

 On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara 
 thu...@gmail.com wrote:

 Hi,

 I found the potential cause of the poor accuracy for the leaf
 dataset. It seems the data read into ML is wrong.

 I have attached the data file as a CSV (classes are in the last
 column)

 However, when I print out the labels of the read data (classes), it
 looks something like below. Clearly there aren't this many 3.0 classes
 and there should be classes up to 36.0.

 Is this caused by a bug?

 1.0 1.0 

Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)

2015-08-13 Thread Nirmal Fernando
On Fri, Aug 14, 2015 at 10:01 AM, Thushan Ganegedara thu...@gmail.com
wrote:

 Hi,

 This was mainly due to the detection of a numerical feature as a
 categorical one.
 Oh, it makes sense now. Why don't we try taking a sample of data and if
 the sample contains only integers (or doubles without any decimals) or
 strings, consider it as a categorical variable.


I tried that approach too, but there're some datasets like automobile
dataset normalized-losses feature, which has integer values (0-164) but
which is probably not categorical.


 We suggested increasing the categorical threshold as a work-around.
 @thushan did it work?
 Yes, it worked. After increasing the threshold to 40.

 On Fri, Aug 14, 2015 at 2:21 PM, Nirmal Fernando nir...@wso2.com wrote:

 This was mainly due to the detection of a numerical feature as a
 categorical one.

 We suggested increasing the categorical threshold as a work-around.
 @thushan did it work?

 On Tue, Aug 11, 2015 at 5:50 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 This issue occurs, if I turn the response variable to a categorical
 variable. If I get the variable as a numerical variable, the values are
 read correctly.

 So I presume there is a fault in categorical conversion of the variable.

 On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 I still get the same result

 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
 1.0 1.0 1.0 12.012.012.012.012.012.0
 12.012.012.012.013.013.013.013.013.0
 13.0
 13.013.013.013.014.014.014.014.014.0
 14.014.014.015.015.015.015.015.015.0
 15.015.015.015.015.015.016.016.016.0
 16.0
 16.016.016.016.017.017.017.017.017.0
 17.017.017.017.017.018.018.018.018.0
 18.018.018.018.018.018.018.019.019.0
 19.0
 19.019.019.019.019.019.019.019.019.0
 19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0
 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0
 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0
 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0
 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0
 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0
 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0

 On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando nir...@wso2.com
 wrote:

 Can you use following code and try;

 ListLabeledPoint points = labeledPoints.collect();
 for(int i=0;ipoints.size();i++){
  System.out.print(points.get(i).label() + \t);
 }

 On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 I used the following snippet

 for(int i=0;ilabeledPoints.collect().size();i++){
 System.out.print(labeledPoints.collect().get(i).label()
 + \t);
 }

 in the public MLModel build() throws MLModelBuilderException in
 DeeplearningModelBuilder.java


 On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando nir...@wso2.com
 wrote:

 Hi thushan,

 We need more info. What did you exactly print and where?

 On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara 
 thu...@gmail.com wrote:

 Hi,

 I found the potential cause of the poor accuracy for the leaf
 dataset. It seems the data read into ML is wrong.

 I have attached the data file 

Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)

2015-08-13 Thread Nirmal Fernando
This was mainly due to the detection of a numerical feature as a
categorical one.

We suggested increasing the categorical threshold as a work-around.
@thushan did it work?

On Tue, Aug 11, 2015 at 5:50 PM, Thushan Ganegedara thu...@gmail.com
wrote:

 This issue occurs, if I turn the response variable to a categorical
 variable. If I get the variable as a numerical variable, the values are
 read correctly.

 So I presume there is a fault in categorical conversion of the variable.

 On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 I still get the same result

 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
 1.0 1.0 1.0 12.012.012.012.012.012.0
 12.012.012.012.013.013.013.013.013.013.0
 13.013.013.013.014.014.014.014.014.0
 14.014.014.015.015.015.015.015.015.0
 15.015.015.015.015.015.016.016.016.016.0
 16.016.016.016.017.017.017.017.017.0
 17.017.017.017.017.018.018.018.018.0
 18.018.018.018.018.018.018.019.019.019.0
 19.019.019.019.019.019.019.019.019.0
 19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0
 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0
 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0
 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0
 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0
 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0
 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0

 On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando nir...@wso2.com wrote:

 Can you use following code and try;

 ListLabeledPoint points = labeledPoints.collect();
 for(int i=0;ipoints.size();i++){
  System.out.print(points.get(i).label() + \t);
 }

 On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 I used the following snippet

 for(int i=0;ilabeledPoints.collect().size();i++){
 System.out.print(labeledPoints.collect().get(i).label() +
 \t);
 }

 in the public MLModel build() throws MLModelBuilderException in
 DeeplearningModelBuilder.java


 On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando nir...@wso2.com
 wrote:

 Hi thushan,

 We need more info. What did you exactly print and where?

 On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara thu...@gmail.com
  wrote:

 Hi,

 I found the potential cause of the poor accuracy for the leaf
 dataset. It seems the data read into ML is wrong.

 I have attached the data file as a CSV (classes are in the last
 column)

 However, when I print out the labels of the read data (classes), it
 looks something like below. Clearly there aren't this many 3.0 classes
 and there should be classes up to 36.0.

 Is this caused by a bug?

 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
 1.0 1.0 1.0 1.0 12.012.012.012.012.0
 12.012.012.012.012.013.013.013.013.0
 13.013.0
 13.013.013.013.014.014.014.014.0
 14.014.014.014.015.015.015.015.015.0
 15.015.015.015.015.015.015.016.016.0
 16.016.0
 16.016.016.016.017.017.017.017.0
 17.017.017.017.0  

Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)

2015-08-13 Thread Thushan Ganegedara
Moreover, I think a hybrid approach as follows might work well.

1. Select a sample

2. Filter columns by the data type and find potential categorical variables
(integer / string)

3. Filter further by checking if same values are repeated multiple times in
the dataset.

On Fri, Aug 14, 2015 at 2:48 PM, Thushan Ganegedara thu...@gmail.com
wrote:

 Hi,

 Yes, no mater which approach used, there's always going to be outliers
 which does not fit the defined rules. But for these corner cases, user
 always have to opportunity to change the variable to numerical.

 One more approach is to introduce a measure of replication of values in a
 column. If the column shows a repetition of same values many times, imo, it
 is a good indicator for detecting categorical variable.

 On Fri, Aug 14, 2015 at 2:41 PM, Nirmal Fernando nir...@wso2.com wrote:



 On Fri, Aug 14, 2015 at 10:01 AM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi,

 This was mainly due to the detection of a numerical feature as a
 categorical one.
 Oh, it makes sense now. Why don't we try taking a sample of data and if
 the sample contains only integers (or doubles without any decimals) or
 strings, consider it as a categorical variable.


 I tried that approach too, but there're some datasets like automobile
 dataset normalized-losses feature, which has integer values (0-164) but
 which is probably not categorical.


 We suggested increasing the categorical threshold as a work-around.
 @thushan did it work?
 Yes, it worked. After increasing the threshold to 40.

 On Fri, Aug 14, 2015 at 2:21 PM, Nirmal Fernando nir...@wso2.com
 wrote:

 This was mainly due to the detection of a numerical feature as a
 categorical one.

 We suggested increasing the categorical threshold as a work-around.
 @thushan did it work?

 On Tue, Aug 11, 2015 at 5:50 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 This issue occurs, if I turn the response variable to a categorical
 variable. If I get the variable as a numerical variable, the values are
 read correctly.

 So I presume there is a fault in categorical conversion of the
 variable.

 On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 I still get the same result

 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
 1.0 1.0 1.0 1.0 12.012.012.012.012.0
 12.012.012.012.012.013.013.013.013.0
 13.013.0
 13.013.013.013.014.014.014.014.0
 14.014.014.014.015.015.015.015.015.0
 15.015.015.015.015.015.015.016.016.0
 16.016.0
 16.016.016.016.017.017.017.017.0
 17.017.017.017.017.017.018.018.018.0
 18.018.018.018.018.018.018.018.019.0
 19.019.0
 19.019.019.019.019.019.019.019.0
 19.019.019.02.0 2.0 2.0 2.0 2.0 2.0
 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0
 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0
 5.0 5.0
 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0
 5.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0
 6.0 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0
 7.0 7.0
 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0

 On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando nir...@wso2.com
 wrote:

 Can you use following code and try;

 ListLabeledPoint points = labeledPoints.collect();
 for(int 

Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)

2015-08-13 Thread Nirmal Fernando
Thushan, please send your suggestions to the other thread :)

On Fri, Aug 14, 2015 at 10:22 AM, Thushan Ganegedara thu...@gmail.com
wrote:

 Moreover, I think a hybrid approach as follows might work well.

 1. Select a sample

 2. Filter columns by the data type and find potential categorical
 variables (integer / string)

 3. Filter further by checking if same values are repeated multiple times
 in the dataset.

 On Fri, Aug 14, 2015 at 2:48 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi,

 Yes, no mater which approach used, there's always going to be outliers
 which does not fit the defined rules. But for these corner cases, user
 always have to opportunity to change the variable to numerical.

 One more approach is to introduce a measure of replication of values in a
 column. If the column shows a repetition of same values many times, imo, it
 is a good indicator for detecting categorical variable.

 On Fri, Aug 14, 2015 at 2:41 PM, Nirmal Fernando nir...@wso2.com wrote:



 On Fri, Aug 14, 2015 at 10:01 AM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi,

 This was mainly due to the detection of a numerical feature as a
 categorical one.
 Oh, it makes sense now. Why don't we try taking a sample of data and if
 the sample contains only integers (or doubles without any decimals) or
 strings, consider it as a categorical variable.


 I tried that approach too, but there're some datasets like automobile
 dataset normalized-losses feature, which has integer values (0-164) but
 which is probably not categorical.


 We suggested increasing the categorical threshold as a work-around.
 @thushan did it work?
 Yes, it worked. After increasing the threshold to 40.

 On Fri, Aug 14, 2015 at 2:21 PM, Nirmal Fernando nir...@wso2.com
 wrote:

 This was mainly due to the detection of a numerical feature as a
 categorical one.

 We suggested increasing the categorical threshold as a work-around.
 @thushan did it work?

 On Tue, Aug 11, 2015 at 5:50 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 This issue occurs, if I turn the response variable to a categorical
 variable. If I get the variable as a numerical variable, the values are
 read correctly.

 So I presume there is a fault in categorical conversion of the
 variable.

 On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara thu...@gmail.com
  wrote:

 I still get the same result

 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
 1.0 1.0 1.0 1.0 12.012.012.012.012.0
 12.012.012.012.012.013.013.013.013.0
 13.013.0
 13.013.013.013.014.014.014.014.0
 14.014.014.014.015.015.015.015.015.0
 15.015.015.015.015.015.015.016.016.0
 16.016.0
 16.016.016.016.017.017.017.017.0
 17.017.017.017.017.017.018.018.018.0
 18.018.018.018.018.018.018.018.019.0
 19.019.0
 19.019.019.019.019.019.019.019.0
 19.019.019.02.0 2.0 2.0 2.0 2.0 2.0
 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0
 4.0 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0
 5.0 5.0
 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0
 5.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0
 6.0 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0
 7.0 7.0
 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0
 3.0 3.0 3.0 3.0

 On Tue, Aug 11, 2015 at 7:05 PM, Nirmal 

[Dev] [ML] Issue while loading the leaf dataset (misreading classes)

2015-08-11 Thread Thushan Ganegedara
Hi,

I found the potential cause of the poor accuracy for the leaf dataset. It
seems the data read into ML is wrong.

I have attached the data file as a CSV (classes are in the last column)

However, when I print out the labels of the read data (classes), it looks
something like below. Clearly there aren't this many 3.0 classes and
there should be classes up to 36.0.

Is this caused by a bug?

1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 12.012.012.012.012.012.0
12.012.012.012.013.013.013.013.013.013.0
13.013.013.013.014.014.014.014.014.0
14.014.014.015.015.015.015.015.015.0
15.015.015.015.015.015.016.016.016.016.0
16.016.016.016.017.017.017.017.017.0
17.017.017.017.017.018.018.018.018.0
18.018.018.018.018.018.018.019.019.019.0
19.019.019.019.019.019.019.019.019.0
19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0
2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0
4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0
5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0
6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0
6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0
7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0

-- 
Regards,

Thushan Ganegedara
School of IT
University of Sydney, Australia
0.72694,1.4742,0.32396,0.98535,1,0.83592,0.0046566,0.0039465,0.04779,0.12795,0.016108,0.0052323,0.00027477,1.1756,1
0.74173,1.5257,0.36116,0.98152,0.99825,0.79867,0.0052423,0.0050016,0.02416,0.090476,0.0081195,0.002708,7.48E-05,0.69659,1
0.76722,1.5725,0.38998,0.97755,1,0.80812,0.0074573,0.010121,0.011897,0.057445,0.0032891,0.00092068,3.79E-05,0.44348,1
0.73797,1.4597,0.35376,0.97566,1,0.81697,0.0068768,0.0086068,0.01595,0.065491,0.0042707,0.0011544,6.63E-05,0.58785,1
0.82301,1.7707,0.44462,0.97698,1,0.75493,0.007428,0.010042,0.0079379,0.045339,0.0020514,0.00055986,2.35E-05,0.34214,1
0.72997,1.4892,0.34284,0.98755,1,0.84482,0.0049451,0.0044506,0.010487,0.058528,0.0034138,0.0011248,2.48E-05,0.34068,1
0.82063,1.7529,0.44458,0.97964,0.99649,0.7677,0.0059279,0.0063954,0.018375,0.080587,0.0064523,0.0022713,4.15E-05,0.53904,1
0.77982,1.6215,0.39222,0.98512,0.99825,0.80816,0.0050987,0.0047314,0.024875,0.089686,0.0079794,0.0024664,0.00014676,0.66975,1
0.83089,1.8199,0.45693,0.9824,1,0.77106,0.0060055,0.006564,0.0072447,0.040616,0.0016469,0.00038812,3.29E-05,0.33696,1
0.90631,2.3906,0.58336,0.97683,0.99825,0.66419,0.0084019,0.012848,0.0070096,0.042347,0.0017901,0.00045889,2.83E-05,0.28082,1
0.7459,1.4927,0.34116,0.98296,1,0.83088,0.0055665,0.0056395,0.0057679,0.036511,0.0013313,0.00030872,3.18E-05,0.25026,1
0.79606,1.6934,0.43387,0.98181,1,0.76985,0.0077992,0.011071,0.013677,0.057832,0.004,0.00081648,0.00013855,0.49751,1
0.93361,2.7582,0.64257,0.98346,1,0.59851,0.0055336,0.0055731,0.029712,0.089889,0.0080153,0.0020648,0.00023883,0.91499,2
0.91186,2.4994,0.60323,0.983,1,0.64916,0.0061494,0.0068823,0.018887,0.072486,0.0052267,0.0014887,8.33E-05,0.67811,2
0.89063,2.2927,0.56667,0.98732,1,0.66427,0.0028365,0.0014643,0.029272,0.091328,0.0082717,0.0022383,0.00020166,0.87177,2
0.86755,2.009,0.51464,0.98691,1,0.70277,0.0054439,0.0053937,0.030348,0.092063,0.0084044,0.0022541,0.00019854,0.94545,2

Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)

2015-08-11 Thread Nirmal Fernando
Hi thushan,

We need more info. What did you exactly print and where?

On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara thu...@gmail.com
wrote:

 Hi,

 I found the potential cause of the poor accuracy for the leaf dataset. It
 seems the data read into ML is wrong.

 I have attached the data file as a CSV (classes are in the last column)

 However, when I print out the labels of the read data (classes), it looks
 something like below. Clearly there aren't this many 3.0 classes and
 there should be classes up to 36.0.

 Is this caused by a bug?

 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
 1.0 1.0 1.0 12.012.012.012.012.012.0
 12.012.012.012.013.013.013.013.013.013.0
 13.013.013.013.014.014.014.014.014.0
 14.014.014.015.015.015.015.015.015.0
 15.015.015.015.015.015.016.016.016.016.0
 16.016.016.016.017.017.017.017.017.0
 17.017.017.017.017.018.018.018.018.0
 18.018.018.018.018.018.018.019.019.019.0
 19.019.019.019.019.019.019.019.019.0
 19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0
 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0
 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0
 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0
 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0
 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0
 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0

 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




-- 

Thanks  regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/
___
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev


Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)

2015-08-11 Thread Nirmal Fernando
Can you use following code and try;

ListLabeledPoint points = labeledPoints.collect();
for(int i=0;ipoints.size();i++){
 System.out.print(points.get(i).label() + \t);
}

On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara thu...@gmail.com
wrote:

 I used the following snippet

 for(int i=0;ilabeledPoints.collect().size();i++){
 System.out.print(labeledPoints.collect().get(i).label() +
 \t);
 }

 in the public MLModel build() throws MLModelBuilderException in
 DeeplearningModelBuilder.java


 On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando nir...@wso2.com wrote:

 Hi thushan,

 We need more info. What did you exactly print and where?

 On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi,

 I found the potential cause of the poor accuracy for the leaf dataset.
 It seems the data read into ML is wrong.

 I have attached the data file as a CSV (classes are in the last column)

 However, when I print out the labels of the read data (classes), it
 looks something like below. Clearly there aren't this many 3.0 classes
 and there should be classes up to 36.0.

 Is this caused by a bug?

 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
 1.0 1.0 1.0 12.012.012.012.012.012.0
 12.012.012.012.013.013.013.013.013.013.0
 13.013.013.013.014.014.014.014.014.0
 14.014.014.015.015.015.015.015.015.0
 15.015.015.015.015.015.016.016.016.016.0
 16.016.016.016.017.017.017.017.017.0
 17.017.017.017.017.018.018.018.018.0
 18.018.018.018.018.018.018.019.019.019.0
 19.019.019.019.019.019.019.019.019.0
 19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0
 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0
 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0
 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0
 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0
 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0
 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0

 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




 --

 Thanks  regards,
 Nirmal

 Team Lead - WSO2 Machine Learner
 Associate Technical Lead - Data Technologies Team, WSO2 Inc.
 Mobile: +94715779733
 Blog: http://nirmalfdo.blogspot.com/





 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




-- 

Thanks  regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/
___
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev


Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)

2015-08-11 Thread Thushan Ganegedara
I used the following snippet

for(int i=0;ilabeledPoints.collect().size();i++){
System.out.print(labeledPoints.collect().get(i).label() + \t);
}

in the public MLModel build() throws MLModelBuilderException in
DeeplearningModelBuilder.java


On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando nir...@wso2.com wrote:

 Hi thushan,

 We need more info. What did you exactly print and where?

 On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi,

 I found the potential cause of the poor accuracy for the leaf dataset. It
 seems the data read into ML is wrong.

 I have attached the data file as a CSV (classes are in the last column)

 However, when I print out the labels of the read data (classes), it looks
 something like below. Clearly there aren't this many 3.0 classes and
 there should be classes up to 36.0.

 Is this caused by a bug?

 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
 1.0 1.0 1.0 12.012.012.012.012.012.0
 12.012.012.012.013.013.013.013.013.013.0
 13.013.013.013.014.014.014.014.014.0
 14.014.014.015.015.015.015.015.015.0
 15.015.015.015.015.015.016.016.016.016.0
 16.016.016.016.017.017.017.017.017.0
 17.017.017.017.017.018.018.018.018.0
 18.018.018.018.018.018.018.019.019.019.0
 19.019.019.019.019.019.019.019.019.0
 19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0
 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0
 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0
 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0
 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0
 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0
 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0

 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




 --

 Thanks  regards,
 Nirmal

 Team Lead - WSO2 Machine Learner
 Associate Technical Lead - Data Technologies Team, WSO2 Inc.
 Mobile: +94715779733
 Blog: http://nirmalfdo.blogspot.com/





-- 
Regards,

Thushan Ganegedara
School of IT
University of Sydney, Australia
___
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev


Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)

2015-08-11 Thread Thushan Ganegedara
I still get the same result

1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
1.0 1.0 1.0 12.012.012.012.012.012.0
12.012.012.012.013.013.013.013.013.013.0
13.013.013.013.014.014.014.014.014.0
14.014.014.015.015.015.015.015.015.0
15.015.015.015.015.015.016.016.016.016.0
16.016.016.016.017.017.017.017.017.0
17.017.017.017.017.018.018.018.018.0
18.018.018.018.018.018.018.019.019.019.0
19.019.019.019.019.019.019.019.019.0
19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0
2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0
4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0
5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0
6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0
6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0
7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
3.0 3.0 3.0 3.0

On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando nir...@wso2.com wrote:

 Can you use following code and try;

 ListLabeledPoint points = labeledPoints.collect();
 for(int i=0;ipoints.size();i++){
  System.out.print(points.get(i).label() + \t);
 }

 On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 I used the following snippet

 for(int i=0;ilabeledPoints.collect().size();i++){
 System.out.print(labeledPoints.collect().get(i).label() +
 \t);
 }

 in the public MLModel build() throws MLModelBuilderException in
 DeeplearningModelBuilder.java


 On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando nir...@wso2.com wrote:

 Hi thushan,

 We need more info. What did you exactly print and where?

 On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi,

 I found the potential cause of the poor accuracy for the leaf dataset.
 It seems the data read into ML is wrong.

 I have attached the data file as a CSV (classes are in the last column)

 However, when I print out the labels of the read data (classes), it
 looks something like below. Clearly there aren't this many 3.0 classes
 and there should be classes up to 36.0.

 Is this caused by a bug?

 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
 1.0 1.0 1.0 12.012.012.012.012.012.0
 12.012.012.012.013.013.013.013.013.0
 13.0
 13.013.013.013.014.014.014.014.014.0
 14.014.014.015.015.015.015.015.015.0
 15.015.015.015.015.015.016.016.016.0
 16.0
 16.016.016.016.017.017.017.017.017.0
 17.017.017.017.017.018.018.018.018.0
 18.018.018.018.018.018.018.019.019.0
 19.0
 19.019.019.019.019.019.019.019.019.0
 19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0
 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0
 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0
 5.0 5.0 5.0 5.0 5.0 

Re: [Dev] [ML] Issue while loading the leaf dataset (misreading classes)

2015-08-11 Thread Thushan Ganegedara
This issue occurs, if I turn the response variable to a categorical
variable. If I get the variable as a numerical variable, the values are
read correctly.

So I presume there is a fault in categorical conversion of the variable.

On Tue, Aug 11, 2015 at 7:11 PM, Thushan Ganegedara thu...@gmail.com
wrote:

 I still get the same result

 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
 1.0 1.0 1.0 12.012.012.012.012.012.0
 12.012.012.012.013.013.013.013.013.013.0
 13.013.013.013.014.014.014.014.014.0
 14.014.014.015.015.015.015.015.015.0
 15.015.015.015.015.015.016.016.016.016.0
 16.016.016.016.017.017.017.017.017.0
 17.017.017.017.017.018.018.018.018.0
 18.018.018.018.018.018.018.019.019.019.0
 19.019.019.019.019.019.019.019.019.0
 19.019.02.0 2.0 2.0 2.0 2.0 2.0 2.0
 2.0 2.0 2.0 2.0 2.0 2.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 4.0 4.0 4.0 4.0 4.0 4.0
 4.0 4.0 4.0 4.0 4.0 4.0 5.0 5.0 5.0 5.0
 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0
 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0 6.0
 6.0 6.0 6.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0
 7.0 7.0 7.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0
 3.0 3.0 3.0 3.0

 On Tue, Aug 11, 2015 at 7:05 PM, Nirmal Fernando nir...@wso2.com wrote:

 Can you use following code and try;

 ListLabeledPoint points = labeledPoints.collect();
 for(int i=0;ipoints.size();i++){
  System.out.print(points.get(i).label() + \t);
 }

 On Tue, Aug 11, 2015 at 2:30 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 I used the following snippet

 for(int i=0;ilabeledPoints.collect().size();i++){
 System.out.print(labeledPoints.collect().get(i).label() +
 \t);
 }

 in the public MLModel build() throws MLModelBuilderException in
 DeeplearningModelBuilder.java


 On Tue, Aug 11, 2015 at 6:17 PM, Nirmal Fernando nir...@wso2.com
 wrote:

 Hi thushan,

 We need more info. What did you exactly print and where?

 On Tue, Aug 11, 2015 at 12:47 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi,

 I found the potential cause of the poor accuracy for the leaf dataset.
 It seems the data read into ML is wrong.

 I have attached the data file as a CSV (classes are in the last column)

 However, when I print out the labels of the read data (classes), it
 looks something like below. Clearly there aren't this many 3.0 classes
 and there should be classes up to 36.0.

 Is this caused by a bug?

 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0
 1.0 1.0 1.0 1.0 12.012.012.012.012.0
 12.012.012.012.012.013.013.013.013.0
 13.013.0
 13.013.013.013.014.014.014.014.0
 14.014.014.014.015.015.015.015.015.0
 15.015.015.015.015.015.015.016.016.0
 16.016.0
 16.016.016.016.017.017.017.017.0
 17.017.017.017.017.017.018.018.018.0
 18.018.018.018.018.018.018.018.019.0
 19.019.0
 19.019.019.019.019.019.019.019.0
 19.019.019.02.0 2.0 2.0 2.0 2.0 2.0
 2.0