[jira] [Commented] (MADLIB-1222) Support already encoded arrays for dependent var in MLP classification

2018-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426151#comment-16426151
 ] 

ASF GitHub Bot commented on MADLIB-1222:


Github user asfgit closed the pull request at:

https://github.com/apache/madlib/pull/250


> Support already encoded arrays for dependent var in MLP classification
> --
>
> Key: MADLIB-1222
> URL: https://issues.apache.org/jira/browse/MADLIB-1222
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Module: Neural Networks
>Reporter: Nandish Jayaram
>Priority: Major
> Fix For: v1.14
>
>
> MLP currently only supports scalar dependent variables for MLP 
> classification. If a user has already one-hot encoded categorical variables 
> the dependent variable will be an array, and hence unusable with 
> mlp_classification. This feature request is to allow the use of one-hot 
> encoded array for dependent vars in MLP classification.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1222) Support already encoded arrays for dependent var in MLP classification

2018-04-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426125#comment-16426125
 ] 

ASF GitHub Bot commented on MADLIB-1222:


Github user fmcquillan99 commented on the issue:

https://github.com/apache/madlib/pull/250
  
See JIRA https://issues.apache.org/jira/browse/MADLIB-1222 for examples 
showing this works for IGD and mini-batch

LGTM

I think u can go ahead and merge this PR to master


> Support already encoded arrays for dependent var in MLP classification
> --
>
> Key: MADLIB-1222
> URL: https://issues.apache.org/jira/browse/MADLIB-1222
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Module: Neural Networks
>Reporter: Nandish Jayaram
>Priority: Major
> Fix For: v1.14
>
>
> MLP currently only supports scalar dependent variables for MLP 
> classification. If a user has already one-hot encoded categorical variables 
> the dependent variable will be an array, and hence unusable with 
> mlp_classification. This feature request is to allow the use of one-hot 
> encoded array for dependent vars in MLP classification.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1222) Support already encoded arrays for dependent var in MLP classification

2018-04-04 Thread Frank McQuillan (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426112#comment-16426112
 ] 

Frank McQuillan commented on MADLIB-1222:
-

For minibatch this seems to work OK.  e.g., continuing modified version user 
docs example from above:

{code:sql}
DROP TABLE IF EXISTS iris_data_packed, iris_data_packed_standardization, 
iris_data_packed_summary; 
SELECT madlib.minibatch_preprocessor(
'iris_data',
'iris_data_packed',
'class_integer',
'attributes',
10
);
 {code}

{code:sql}
DROP TABLE IF EXISTS mlp_model, mlp_model_summary, mlp_model_standardization;
-- Set seed so results are reproducible
SELECT setseed(0);
SELECT madlib.mlp_classification(
'iris_data_packed',  -- Source table
'mlp_model',  -- Destination table
'independent_varname', -- Input features
'dependent_varname', -- Label
ARRAY[5], -- Number of units per layer
'learning_rate_init=0.003,
n_iterations=500,
tolerance=0', -- Optimizer params
'tanh',   -- Activation function
NULL, -- Default weight (1)
FALSE,-- No warm start
FALSE -- Not verbose
);
{code}

{code:sql}
DROP TABLE IF EXISTS mlp_prediction;
SELECT madlib.mlp_predict(
 'mlp_model', -- Model table
 'iris_data', -- Test data table
 'id',-- Id column in test table
 'mlp_prediction',-- Output table for predictions
 'response'   -- Output classes, not probabilities
 );
SELECT * FROM mlp_prediction JOIN iris_data USING (id) ORDER BY id;
{code}

produces

{code}
 id | estimated_class_integer |attributes | class_integer | class |   
state   
+-+---+---+---+---
  1 | {1,0}   | {5.0,3.2,1.2,0.2} | {1,0} | 1 | 
Alaska
  2 | {1,0}   | {5.5,3.5,1.3,0.2} | {1,0} | 1 | 
Alaska
  3 | {1,0}   | {4.9,3.1,1.5,0.1} | {1,0} | 1 | 
Alaska
  4 | {1,0}   | {4.4,3.0,1.3,0.2} | {1,0} | 1 | 
Alaska
  5 | {1,0}   | {5.1,3.4,1.5,0.2} | {1,0} | 1 | 
Alaska
  6 | {1,0}   | {5.0,3.5,1.3,0.3} | {1,0} | 1 | 
Alaska
  7 | {1,0}   | {4.5,2.3,1.3,0.3} | {1,0} | 1 | 
Alaska
  8 | {1,0}   | {4.4,3.2,1.3,0.2} | {1,0} | 1 | 
Alaska
  9 | {1,0}   | {5.0,3.5,1.6,0.6} | {1,0} | 1 | 
Alaska
 10 | {1,0}   | {5.1,3.8,1.9,0.4} | {1,0} | 1 | 
Alaska
 11 | {1,0}   | {4.8,3.0,1.4,0.3} | {1,0} | 1 | 
Alaska
 12 | {1,0}   | {5.1,3.8,1.6,0.2} | {1,0} | 1 | 
Alaska
 13 | {0,1}   | {5.7,2.8,4.5,1.3} | {0,1} | 2 | 
Alaska
 14 | {0,1}   | {6.3,3.3,4.7,1.6} | {0,1} | 2 | 
Alaska
 15 | {0,1}   | {4.9,2.4,3.3,1.0} | {0,1} | 2 | 
Alaska
 16 | {0,1}   | {6.6,2.9,4.6,1.3} | {0,1} | 2 | 
Alaska
 17 | {0,1}   | {5.2,2.7,3.9,1.4} | {0,1} | 2 | 
Alaska
 18 | {0,1}   | {5.0,2.0,3.5,1.0} | {0,1} | 2 | 
Alaska
 19 | {0,1}   | {5.9,3.0,4.2,1.5} | {0,1} | 2 | 
Alaska
 20 | {0,1}   | {6.0,2.2,4.0,1.0} | {0,1} | 2 | 
Alaska
 21 | {0,1}   | {6.1,2.9,4.7,1.4} | {0,1} | 2 | 
Alaska
 22 | {0,1}   | {5.6,2.9,3.6,1.3} | {0,1} | 2 | 
Alaska
 23 | {0,1}   | {6.7,3.1,4.4,1.4} | {0,1} | 2 | 
Alaska
 24 | {0,1}   | {5.6,3.0,4.5,1.5} | {0,1} | 2 | 
Alaska
 25 | {0,1}   | {5.8,2.7,4.1,1.0} | {0,1} | 2 | 
Alaska
 26 | {0,1}   | {6.2,2.2,4.5,1.5} | {0,1} | 2 | 
Alaska
 27 | {0,1}   | {5.6,2.5,3.9,1.1} | {0,1} | 2 | 
Alaska
 28 | {1,0}   | {5.0,3.4,1.5,0.2} | {1,0} | 1 | 
Tennessee
 29 | {1,0}   | {4.4,2.9,1.4,0.2} | {1,0} | 1 | 
Tennessee
 30 | {1,0}   | {4.9,3.1,1.5,0.1} | {1,0} | 1 | 
Tennessee
 31 | {1,0}   | {5.4,3.7,1.5,0.2} | {1,0} | 1 | 
Tennessee
 32 | {1,0}   | {4.8,3.4,1.6,0.2} | {1,0} | 1 | 
Tennessee
 33 | {1,0}   | {4.8,3.0,1.4,0.1} | {1,0} | 1 | 
Tennessee
 34 | {1,0}   | {4.3,3.0,1.1,0.1} | {1,0} | 1 | 
Tennessee
 35 | {1,0}   | {5.8,4.0,1.2,0.2} | {1,0} | 1 | 
Tennessee
 36 | {1,0}   | {5.7,4.4,1.5,0.4} | {1,0} | 1 | 
Tennes

[jira] [Commented] (MADLIB-1222) Support already encoded arrays for dependent var in MLP classification

2018-04-04 Thread Frank McQuillan (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426074#comment-16426074
 ] 

Frank McQuillan commented on MADLIB-1222:
-

For IGD (without minibatch) this seems to work OK.  e.g., using a modified 
version user docs example:

[http://madlib.apache.org/docs/latest/group__grp__nn.html#example]
{code:sql}
DROP TABLE IF EXISTS iris_data;
CREATE TABLE iris_data(
id serial,
attributes numeric[],
class_integer integer[],
class integer,
state varchar
);
INSERT INTO iris_data(id, attributes, class_integer, class, state) VALUES
(1,ARRAY[5.0,3.2,1.2,0.2], ARRAY[1,0],1,'Alaska'),
(2,ARRAY[5.5,3.5,1.3,0.2], ARRAY[1,0],1,'Alaska'),
(3,ARRAY[4.9,3.1,1.5,0.1], ARRAY[1,0],1,'Alaska'),
(4,ARRAY[4.4,3.0,1.3,0.2], ARRAY[1,0],1,'Alaska'),
(5,ARRAY[5.1,3.4,1.5,0.2], ARRAY[1,0],1,'Alaska'),
(6,ARRAY[5.0,3.5,1.3,0.3], ARRAY[1,0],1,'Alaska'),
(7,ARRAY[4.5,2.3,1.3,0.3], ARRAY[1,0],1,'Alaska'),
(8,ARRAY[4.4,3.2,1.3,0.2], ARRAY[1,0],1,'Alaska'),
(9,ARRAY[5.0,3.5,1.6,0.6], ARRAY[1,0],1,'Alaska'),
(10,ARRAY[5.1,3.8,1.9,0.4], ARRAY[1,0],1,'Alaska'),
(11,ARRAY[4.8,3.0,1.4,0.3], ARRAY[1,0],1,'Alaska'),
(12,ARRAY[5.1,3.8,1.6,0.2], ARRAY[1,0],1,'Alaska'),
(13,ARRAY[5.7,2.8,4.5,1.3], ARRAY[0,1],2,'Alaska'),
(14,ARRAY[6.3,3.3,4.7,1.6], ARRAY[0,1],2,'Alaska'),
(15,ARRAY[4.9,2.4,3.3,1.0], ARRAY[0,1],2,'Alaska'),
(16,ARRAY[6.6,2.9,4.6,1.3], ARRAY[0,1],2,'Alaska'),
(17,ARRAY[5.2,2.7,3.9,1.4], ARRAY[0,1],2,'Alaska'),
(18,ARRAY[5.0,2.0,3.5,1.0], ARRAY[0,1],2,'Alaska'),
(19,ARRAY[5.9,3.0,4.2,1.5], ARRAY[0,1],2,'Alaska'),
(20,ARRAY[6.0,2.2,4.0,1.0], ARRAY[0,1],2,'Alaska'),
(21,ARRAY[6.1,2.9,4.7,1.4], ARRAY[0,1],2,'Alaska'),
(22,ARRAY[5.6,2.9,3.6,1.3], ARRAY[0,1],2,'Alaska'),
(23,ARRAY[6.7,3.1,4.4,1.4], ARRAY[0,1],2,'Alaska'),
(24,ARRAY[5.6,3.0,4.5,1.5], ARRAY[0,1],2,'Alaska'),
(25,ARRAY[5.8,2.7,4.1,1.0], ARRAY[0,1],2,'Alaska'),
(26,ARRAY[6.2,2.2,4.5,1.5], ARRAY[0,1],2,'Alaska'),
(27,ARRAY[5.6,2.5,3.9,1.1], ARRAY[0,1],2,'Alaska'),
(28,ARRAY[5.0,3.4,1.5,0.2], ARRAY[1,0],1,'Tennessee'),
(29,ARRAY[4.4,2.9,1.4,0.2], ARRAY[1,0],1,'Tennessee'),
(30,ARRAY[4.9,3.1,1.5,0.1], ARRAY[1,0],1,'Tennessee'),
(31,ARRAY[5.4,3.7,1.5,0.2], ARRAY[1,0],1,'Tennessee'),
(32,ARRAY[4.8,3.4,1.6,0.2], ARRAY[1,0],1,'Tennessee'),
(33,ARRAY[4.8,3.0,1.4,0.1], ARRAY[1,0],1,'Tennessee'),
(34,ARRAY[4.3,3.0,1.1,0.1], ARRAY[1,0],1,'Tennessee'),
(35,ARRAY[5.8,4.0,1.2,0.2], ARRAY[1,0],1,'Tennessee'),
(36,ARRAY[5.7,4.4,1.5,0.4], ARRAY[1,0],1,'Tennessee'),
(37,ARRAY[5.4,3.9,1.3,0.4], ARRAY[1,0],1,'Tennessee'),
(38,ARRAY[6.0,2.9,4.5,1.5], ARRAY[0,1],2,'Tennessee'),
(39,ARRAY[5.7,2.6,3.5,1.0], ARRAY[0,1],2,'Tennessee'),
(40,ARRAY[5.5,2.4,3.8,1.1], ARRAY[0,1],2,'Tennessee'),
(41,ARRAY[5.5,2.4,3.7,1.0], ARRAY[0,1],2,'Tennessee'),
(42,ARRAY[5.8,2.7,3.9,1.2], ARRAY[0,1],2,'Tennessee'),
(43,ARRAY[6.0,2.7,5.1,1.6], ARRAY[0,1],2,'Tennessee'),
(44,ARRAY[5.4,3.0,4.5,1.5], ARRAY[0,1],2,'Tennessee'),
(45,ARRAY[6.0,3.4,4.5,1.6], ARRAY[0,1],2,'Tennessee'),
(46,ARRAY[6.7,3.1,4.7,1.5], ARRAY[0,1],2,'Tennessee'),
(47,ARRAY[6.3,2.3,4.4,1.3], ARRAY[0,1],2,'Tennessee'),
(48,ARRAY[5.6,3.0,4.1,1.3], ARRAY[0,1],2,'Tennessee'),
(49,ARRAY[5.5,2.5,4.0,1.3], ARRAY[0,1],2,'Tennessee'),
(50,ARRAY[5.5,2.6,4.4,1.2], ARRAY[0,1],2,'Tennessee'),
(51,ARRAY[6.1,3.0,4.6,1.4], ARRAY[0,1],2,'Tennessee'),
(52,ARRAY[5.8,2.6,4.0,1.2], ARRAY[0,1],2,'Tennessee');
 {code}
 
{code:sql}
DROP TABLE IF EXISTS mlp_model, mlp_model_summary, mlp_model_standardization;
-- Set seed so results are reproducible
SELECT setseed(0);
SELECT madlib.mlp_classification(
'iris_data',  -- Source table
'mlp_model',  -- Destination table
'attributes', -- Input features
'class_integer', -- Label
ARRAY[5], -- Number of units per layer
'learning_rate_init=0.003,
n_iterations=500,
tolerance=0', -- Optimizer params
'tanh',   -- Activation function
NULL, -- Default weight (1)
FALSE,-- No warm start
FALSE -- Not verbose
);
{code}
 
{code:sql}
DROP TABLE IF EXISTS mlp_prediction;
SELECT madlib.mlp_predict(
 'mlp_model', -- Model table
 'iris_data', -- Test data table
 'id',-- Id column in test table
 'mlp_prediction',-- Output table for predictions
 'prob'   -- Output classes, not probabilities
 );
SELECT * FROM mlp_prediction JOIN iris_data USING (id) ORDER BY id;
{code}
produces
{code}
 id | estimated_class_integer |attributes | class_integer | class |   
state   
+-+---+---+---+---
  1 | {1,0}   | {5.0,3.2,1.2,0.2} | {1,0} | 1 | 
Alaska
  2 | {1,0}   | {5.5,3.5,1.3,0.2} | {1,0} | 1 | 
Alaska
  3 | {1,0}   | {4.9,3.1,1.5,0.1} | {1,0} | 1 | 
Alaska

[jira] [Commented] (MADLIB-1222) Support already encoded arrays for dependent var in MLP classification

2018-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414347#comment-16414347
 ] 

ASF GitHub Bot commented on MADLIB-1222:


GitHub user njayaram2 opened a pull request:

https://github.com/apache/madlib/pull/250

MLP: Allow one-hot encoded dependent var for classification

JIRA:MADLIB-1222

MLP currently automatically encodes categorical variables for
classification but does not allow already encoded arrays for dependent
variables in mlp_classification. This commit lets users have an already
encoded array for the dependent variable and train a model.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/madlib/madlib 
feature/mlp/support-encoded-dep-var

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/madlib/pull/250.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #250


commit f5a87dee6bc8f27c1a13a4921ea726b391b1813d
Author: Nandish Jayaram 
Date:   2018-03-20T22:43:25Z

MLP: Allow one-hot encoded dependent var for classification

JIRA:MADLIB-1222

MLP currently automatically encodes categorical variables for
classification but does not allow already encoded arrays for dependent
variables in mlp_classification. This commit lets users have an already
encoded array for the dependent variable and train a model.




> Support already encoded arrays for dependent var in MLP classification
> --
>
> Key: MADLIB-1222
> URL: https://issues.apache.org/jira/browse/MADLIB-1222
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Module: Neural Networks
>Reporter: Nandish Jayaram
>Priority: Major
> Fix For: v1.14
>
>
> MLP currently only supports scalar dependent variables for MLP 
> classification. If a user has already one-hot encoded categorical variables 
> the dependent variable will be an array, and hence unusable with 
> mlp_classification. This feature request is to allow the use of one-hot 
> encoded array for dependent vars in MLP classification.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (MADLIB-1222) Support already encoded arrays for dependent var in MLP classification

2018-03-23 Thread Nandish Jayaram (JIRA)

[ 
https://issues.apache.org/jira/browse/MADLIB-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412215#comment-16412215
 ] 

Nandish Jayaram commented on MADLIB-1222:
-

Example use case, and handling it:

1) User encodes dep vars with whatever tool they want and put it in the column 
`color` . Maybe they do this to anonymize, maybe the data is just in that 
format already:
{code}
blue [1,0,0]
red [0,1,0]
green [0,0,1]
{code}

---

start MADlib

2) runs mini-batch preprocess (if planning to use mini-batch)

3) runs MLP classification train (IGD or mini-batch)

4a) runs MLP predict (response):
{code}
actual predicted
[0,1,0] [1,0,0]
[0,0,1] [0,0,1] 
[1,0,0] [1,0,0]
etc.
{code}

4b) runs MLP predict (prob):
{code}
actual estimated_prob
[0,1,0] [0.85, 0.10, ,0.05]
[0,0,1] [0.0 , 0.1 , 0.9]
[1,0,0] [0.75, 0.20, 0.05]
etc.
{code}

end MADlib

--

5) User maps back to red, blue, green since they know the mapping but MADlib 
doesn't.

> Support already encoded arrays for dependent var in MLP classification
> --
>
> Key: MADLIB-1222
> URL: https://issues.apache.org/jira/browse/MADLIB-1222
> Project: Apache MADlib
>  Issue Type: New Feature
>  Components: Module: Neural Networks
>Reporter: Nandish Jayaram
>Priority: Major
> Fix For: v1.14
>
>
> MLP currently only supports scalar dependent variables for MLP 
> classification. If a user has already one-hot encoded categorical variables 
> the dependent variable will be an array, and hence unusable with 
> mlp_classification. This feature request is to allow the use of one-hot 
> encoded array for dependent vars in MLP classification.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)