Re: [Dev] [ML] Progress with Deeplearning Component

2015-08-11 Thread Nirmal Fernando
Great! Please create a Jira.

On Tue, Aug 11, 2015 at 12:43 PM, Thushan Ganegedara thu...@gmail.com
wrote:

 Hi CD,

 No worries.

 On Tue, Aug 11, 2015 at 5:11 PM, CD Athuraliya chathur...@wso2.com
 wrote:

 Hi Nirmal,

 We will be able to fix this issue.

 Thanks Thushan for pointing this out! :)



 On Tue, Aug 11, 2015 at 12:32 PM, Nirmal Fernando nir...@wso2.com
 wrote:

 @CD, is this something we could fix? can we list features in the order
 of the indices?

 On Tue, Aug 11, 2015 at 12:25 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi,

 I noticed that, in certain cases, the features don't follow the correct
 ordering. Any idea why this is happening?

 For example in this image, V10 appears after V1

 On Tue, Aug 11, 2015 at 12:10 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi all,

 After a daunting struggle, I was able to corner the issue with the
 poor accuracy for the specific leaf dataset. The dataset has classes from 
 1
 to 36. However, there are no classes from 16th - 22nd. i.e. Classes go as
 1,2,..,14,15,23,24,...,35,36

 Then, while converting these class labels to enums in H-2-O (combined
 with the fact that there's very little data for each class) confuses H-2-O
 and causes it to *assign different enum values for the same classes
 in different datasets*. Which manifest itself as a poor accuracy.

 I suspect that there's a mismatch between the labels provided by
 JavaRDD and enums produced by H-2-O as well. I'm looking into this issue
 right now.

 Thank you

 On Mon, Aug 10, 2015 at 11:16 AM, Thushan Ganegedara thu...@gmail.com
  wrote:

 Hi all,

 I've been testing the new Deeplearning component with few different
 datasets (mainly leaf dataset) and the leaf dataset seems to be not 
 working
 as expected for an unknown reason.

 However, I tested the Deeplearning component extensively with the
 leaf dataset and identified several potential problems that might be
 causing the poor accuracy.

 1. Need to have higher number of epochs (compared to other datasets)
 to produce a reasonable accuracy.

 2. Too many neurons causing overfitting thereby causing poor accuracy.

 3. Some classes have quite closely related features (Especially the
 latter classes are misclassified often)

 I was able to get an accuracy of 86% with Logistic Regression L-BFGS.
 Which is quite reasonable. But I'm having trouble reaching that accuracy
 with Deeplearning (which should be quite easy). Highest accuracy I 
 reached
 so far is 71.xx%

 So I'm still looking for any definite issues causing the poor
 accuracy.

 Thank you.


 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




 --

 Thanks  regards,
 Nirmal

 Team Lead - WSO2 Machine Learner
 Associate Technical Lead - Data Technologies Team, WSO2 Inc.
 Mobile: +94715779733
 Blog: http://nirmalfdo.blogspot.com/





 --
 *CD Athuraliya*
 Software Engineer
 WSO2, Inc.
 lean . enterprise . middleware
 Mobile: +94 716288847 94716288847
 LinkedIn http://lk.linkedin.com/in/cdathuraliya | Twitter
 https://twitter.com/cdathuraliya | Blog
 http://cdathuraliya.tumblr.com/




 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




-- 

Thanks  regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/
___
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev


Re: [Dev] [ML] Progress with Deeplearning Component

2015-08-11 Thread Thushan Ganegedara
Hi,

I noticed that, in certain cases, the features don't follow the correct
ordering. Any idea why this is happening?

For example in this image, V10 appears after V1

On Tue, Aug 11, 2015 at 12:10 PM, Thushan Ganegedara thu...@gmail.com
wrote:

 Hi all,

 After a daunting struggle, I was able to corner the issue with the poor
 accuracy for the specific leaf dataset. The dataset has classes from 1 to
 36. However, there are no classes from 16th - 22nd. i.e. Classes go as
 1,2,..,14,15,23,24,...,35,36

 Then, while converting these class labels to enums in H-2-O (combined with
 the fact that there's very little data for each class) confuses H-2-O and
 causes it to *assign different enum values for the same classes in
 different datasets*. Which manifest itself as a poor accuracy.

 I suspect that there's a mismatch between the labels provided by JavaRDD
 and enums produced by H-2-O as well. I'm looking into this issue right now.

 Thank you

 On Mon, Aug 10, 2015 at 11:16 AM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi all,

 I've been testing the new Deeplearning component with few different
 datasets (mainly leaf dataset) and the leaf dataset seems to be not working
 as expected for an unknown reason.

 However, I tested the Deeplearning component extensively with the leaf
 dataset and identified several potential problems that might be causing the
 poor accuracy.

 1. Need to have higher number of epochs (compared to other datasets) to
 produce a reasonable accuracy.

 2. Too many neurons causing overfitting thereby causing poor accuracy.

 3. Some classes have quite closely related features (Especially the
 latter classes are misclassified often)

 I was able to get an accuracy of 86% with Logistic Regression L-BFGS.
 Which is quite reasonable. But I'm having trouble reaching that accuracy
 with Deeplearning (which should be quite easy). Highest accuracy I reached
 so far is 71.xx%

 So I'm still looking for any definite issues causing the poor accuracy.

 Thank you.


 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




-- 
Regards,

Thushan Ganegedara
School of IT
University of Sydney, Australia
___
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev


Re: [Dev] [ML] Progress with Deeplearning Component

2015-08-11 Thread CD Athuraliya
Hi Nirmal,

We will be able to fix this issue.

Thanks Thushan for pointing this out! :)



On Tue, Aug 11, 2015 at 12:32 PM, Nirmal Fernando nir...@wso2.com wrote:

 @CD, is this something we could fix? can we list features in the order of
 the indices?

 On Tue, Aug 11, 2015 at 12:25 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi,

 I noticed that, in certain cases, the features don't follow the correct
 ordering. Any idea why this is happening?

 For example in this image, V10 appears after V1

 On Tue, Aug 11, 2015 at 12:10 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi all,

 After a daunting struggle, I was able to corner the issue with the poor
 accuracy for the specific leaf dataset. The dataset has classes from 1 to
 36. However, there are no classes from 16th - 22nd. i.e. Classes go as
 1,2,..,14,15,23,24,...,35,36

 Then, while converting these class labels to enums in H-2-O (combined
 with the fact that there's very little data for each class) confuses H-2-O
 and causes it to *assign different enum values for the same classes in
 different datasets*. Which manifest itself as a poor accuracy.

 I suspect that there's a mismatch between the labels provided by JavaRDD
 and enums produced by H-2-O as well. I'm looking into this issue right now.

 Thank you

 On Mon, Aug 10, 2015 at 11:16 AM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi all,

 I've been testing the new Deeplearning component with few different
 datasets (mainly leaf dataset) and the leaf dataset seems to be not working
 as expected for an unknown reason.

 However, I tested the Deeplearning component extensively with the leaf
 dataset and identified several potential problems that might be causing the
 poor accuracy.

 1. Need to have higher number of epochs (compared to other datasets) to
 produce a reasonable accuracy.

 2. Too many neurons causing overfitting thereby causing poor accuracy.

 3. Some classes have quite closely related features (Especially the
 latter classes are misclassified often)

 I was able to get an accuracy of 86% with Logistic Regression L-BFGS.
 Which is quite reasonable. But I'm having trouble reaching that accuracy
 with Deeplearning (which should be quite easy). Highest accuracy I reached
 so far is 71.xx%

 So I'm still looking for any definite issues causing the poor accuracy.

 Thank you.


 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




 --

 Thanks  regards,
 Nirmal

 Team Lead - WSO2 Machine Learner
 Associate Technical Lead - Data Technologies Team, WSO2 Inc.
 Mobile: +94715779733
 Blog: http://nirmalfdo.blogspot.com/





-- 
*CD Athuraliya*
Software Engineer
WSO2, Inc.
lean . enterprise . middleware
Mobile: +94 716288847 94716288847
LinkedIn http://lk.linkedin.com/in/cdathuraliya | Twitter
https://twitter.com/cdathuraliya | Blog http://cdathuraliya.tumblr.com/
___
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev


Re: [Dev] [ML] Progress with Deeplearning Component

2015-08-11 Thread Nirmal Fernando
@CD, is this something we could fix? can we list features in the order of
the indices?

On Tue, Aug 11, 2015 at 12:25 PM, Thushan Ganegedara thu...@gmail.com
wrote:

 Hi,

 I noticed that, in certain cases, the features don't follow the correct
 ordering. Any idea why this is happening?

 For example in this image, V10 appears after V1

 On Tue, Aug 11, 2015 at 12:10 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi all,

 After a daunting struggle, I was able to corner the issue with the poor
 accuracy for the specific leaf dataset. The dataset has classes from 1 to
 36. However, there are no classes from 16th - 22nd. i.e. Classes go as
 1,2,..,14,15,23,24,...,35,36

 Then, while converting these class labels to enums in H-2-O (combined
 with the fact that there's very little data for each class) confuses H-2-O
 and causes it to *assign different enum values for the same classes in
 different datasets*. Which manifest itself as a poor accuracy.

 I suspect that there's a mismatch between the labels provided by JavaRDD
 and enums produced by H-2-O as well. I'm looking into this issue right now.

 Thank you

 On Mon, Aug 10, 2015 at 11:16 AM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi all,

 I've been testing the new Deeplearning component with few different
 datasets (mainly leaf dataset) and the leaf dataset seems to be not working
 as expected for an unknown reason.

 However, I tested the Deeplearning component extensively with the leaf
 dataset and identified several potential problems that might be causing the
 poor accuracy.

 1. Need to have higher number of epochs (compared to other datasets) to
 produce a reasonable accuracy.

 2. Too many neurons causing overfitting thereby causing poor accuracy.

 3. Some classes have quite closely related features (Especially the
 latter classes are misclassified often)

 I was able to get an accuracy of 86% with Logistic Regression L-BFGS.
 Which is quite reasonable. But I'm having trouble reaching that accuracy
 with Deeplearning (which should be quite easy). Highest accuracy I reached
 so far is 71.xx%

 So I'm still looking for any definite issues causing the poor accuracy.

 Thank you.


 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




-- 

Thanks  regards,
Nirmal

Team Lead - WSO2 Machine Learner
Associate Technical Lead - Data Technologies Team, WSO2 Inc.
Mobile: +94715779733
Blog: http://nirmalfdo.blogspot.com/
___
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev


Re: [Dev] [ML] Progress with Deeplearning Component

2015-08-11 Thread Thushan Ganegedara
Hi CD,

No worries.

On Tue, Aug 11, 2015 at 5:11 PM, CD Athuraliya chathur...@wso2.com wrote:

 Hi Nirmal,

 We will be able to fix this issue.

 Thanks Thushan for pointing this out! :)



 On Tue, Aug 11, 2015 at 12:32 PM, Nirmal Fernando nir...@wso2.com wrote:

 @CD, is this something we could fix? can we list features in the order of
 the indices?

 On Tue, Aug 11, 2015 at 12:25 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi,

 I noticed that, in certain cases, the features don't follow the correct
 ordering. Any idea why this is happening?

 For example in this image, V10 appears after V1

 On Tue, Aug 11, 2015 at 12:10 PM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi all,

 After a daunting struggle, I was able to corner the issue with the poor
 accuracy for the specific leaf dataset. The dataset has classes from 1 to
 36. However, there are no classes from 16th - 22nd. i.e. Classes go as
 1,2,..,14,15,23,24,...,35,36

 Then, while converting these class labels to enums in H-2-O (combined
 with the fact that there's very little data for each class) confuses H-2-O
 and causes it to *assign different enum values for the same classes in
 different datasets*. Which manifest itself as a poor accuracy.

 I suspect that there's a mismatch between the labels provided by
 JavaRDD and enums produced by H-2-O as well. I'm looking into this issue
 right now.

 Thank you

 On Mon, Aug 10, 2015 at 11:16 AM, Thushan Ganegedara thu...@gmail.com
 wrote:

 Hi all,

 I've been testing the new Deeplearning component with few different
 datasets (mainly leaf dataset) and the leaf dataset seems to be not 
 working
 as expected for an unknown reason.

 However, I tested the Deeplearning component extensively with the leaf
 dataset and identified several potential problems that might be causing 
 the
 poor accuracy.

 1. Need to have higher number of epochs (compared to other datasets)
 to produce a reasonable accuracy.

 2. Too many neurons causing overfitting thereby causing poor accuracy.

 3. Some classes have quite closely related features (Especially the
 latter classes are misclassified often)

 I was able to get an accuracy of 86% with Logistic Regression L-BFGS.
 Which is quite reasonable. But I'm having trouble reaching that accuracy
 with Deeplearning (which should be quite easy). Highest accuracy I reached
 so far is 71.xx%

 So I'm still looking for any definite issues causing the poor accuracy.

 Thank you.


 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




 --

 Thanks  regards,
 Nirmal

 Team Lead - WSO2 Machine Learner
 Associate Technical Lead - Data Technologies Team, WSO2 Inc.
 Mobile: +94715779733
 Blog: http://nirmalfdo.blogspot.com/





 --
 *CD Athuraliya*
 Software Engineer
 WSO2, Inc.
 lean . enterprise . middleware
 Mobile: +94 716288847 94716288847
 LinkedIn http://lk.linkedin.com/in/cdathuraliya | Twitter
 https://twitter.com/cdathuraliya | Blog
 http://cdathuraliya.tumblr.com/




-- 
Regards,

Thushan Ganegedara
School of IT
University of Sydney, Australia
___
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev


Re: [Dev] [ML] Progress with Deeplearning Component

2015-08-10 Thread Thushan Ganegedara
Hi all,

After a daunting struggle, I was able to corner the issue with the poor
accuracy for the specific leaf dataset. The dataset has classes from 1 to
36. However, there are no classes from 16th - 22nd. i.e. Classes go as
1,2,..,14,15,23,24,...,35,36

Then, while converting these class labels to enums in H-2-O (combined with
the fact that there's very little data for each class) confuses H-2-O and
causes it to *assign different enum values for the same classes in
different datasets*. Which manifest itself as a poor accuracy.

I suspect that there's a mismatch between the labels provided by JavaRDD
and enums produced by H-2-O as well. I'm looking into this issue right now.

Thank you

On Mon, Aug 10, 2015 at 11:16 AM, Thushan Ganegedara thu...@gmail.com
wrote:

 Hi all,

 I've been testing the new Deeplearning component with few different
 datasets (mainly leaf dataset) and the leaf dataset seems to be not working
 as expected for an unknown reason.

 However, I tested the Deeplearning component extensively with the leaf
 dataset and identified several potential problems that might be causing the
 poor accuracy.

 1. Need to have higher number of epochs (compared to other datasets) to
 produce a reasonable accuracy.

 2. Too many neurons causing overfitting thereby causing poor accuracy.

 3. Some classes have quite closely related features (Especially the latter
 classes are misclassified often)

 I was able to get an accuracy of 86% with Logistic Regression L-BFGS.
 Which is quite reasonable. But I'm having trouble reaching that accuracy
 with Deeplearning (which should be quite easy). Highest accuracy I reached
 so far is 71.xx%

 So I'm still looking for any definite issues causing the poor accuracy.

 Thank you.


 --
 Regards,

 Thushan Ganegedara
 School of IT
 University of Sydney, Australia




-- 
Regards,

Thushan Ganegedara
School of IT
University of Sydney, Australia
___
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev


[Dev] [ML] Progress with Deeplearning Component

2015-08-09 Thread Thushan Ganegedara
Hi all,

I've been testing the new Deeplearning component with few different
datasets (mainly leaf dataset) and the leaf dataset seems to be not working
as expected for an unknown reason.

However, I tested the Deeplearning component extensively with the leaf
dataset and identified several potential problems that might be causing the
poor accuracy.

1. Need to have higher number of epochs (compared to other datasets) to
produce a reasonable accuracy.

2. Too many neurons causing overfitting thereby causing poor accuracy.

3. Some classes have quite closely related features (Especially the latter
classes are misclassified often)

I was able to get an accuracy of 86% with Logistic Regression L-BFGS. Which
is quite reasonable. But I'm having trouble reaching that accuracy with
Deeplearning (which should be quite easy). Highest accuracy I reached so
far is 71.xx%

So I'm still looking for any definite issues causing the poor accuracy.

Thank you.


-- 
Regards,

Thushan Ganegedara
School of IT
University of Sydney, Australia
___
Dev mailing list
Dev@wso2.org
http://wso2.org/cgi-bin/mailman/listinfo/dev