[jira] [Commented] (MADLIB-1222) Support already encoded arrays for dependent var in MLP classification
[ https://issues.apache.org/jira/browse/MADLIB-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426151#comment-16426151 ] ASF GitHub Bot commented on MADLIB-1222: Github user asfgit closed the pull request at: https://github.com/apache/madlib/pull/250 > Support already encoded arrays for dependent var in MLP classification > -- > > Key: MADLIB-1222 > URL: https://issues.apache.org/jira/browse/MADLIB-1222 > Project: Apache MADlib > Issue Type: New Feature > Components: Module: Neural Networks >Reporter: Nandish Jayaram >Priority: Major > Fix For: v1.14 > > > MLP currently only supports scalar dependent variables for MLP > classification. If a user has already one-hot encoded categorical variables > the dependent variable will be an array, and hence unusable with > mlp_classification. This feature request is to allow the use of one-hot > encoded array for dependent vars in MLP classification. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MADLIB-1222) Support already encoded arrays for dependent var in MLP classification
[ https://issues.apache.org/jira/browse/MADLIB-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426125#comment-16426125 ] ASF GitHub Bot commented on MADLIB-1222: Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/250 See JIRA https://issues.apache.org/jira/browse/MADLIB-1222 for examples showing this works for IGD and mini-batch LGTM I think u can go ahead and merge this PR to master > Support already encoded arrays for dependent var in MLP classification > -- > > Key: MADLIB-1222 > URL: https://issues.apache.org/jira/browse/MADLIB-1222 > Project: Apache MADlib > Issue Type: New Feature > Components: Module: Neural Networks >Reporter: Nandish Jayaram >Priority: Major > Fix For: v1.14 > > > MLP currently only supports scalar dependent variables for MLP > classification. If a user has already one-hot encoded categorical variables > the dependent variable will be an array, and hence unusable with > mlp_classification. This feature request is to allow the use of one-hot > encoded array for dependent vars in MLP classification. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MADLIB-1222) Support already encoded arrays for dependent var in MLP classification
[ https://issues.apache.org/jira/browse/MADLIB-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426112#comment-16426112 ] Frank McQuillan commented on MADLIB-1222: - For minibatch this seems to work OK. e.g., continuing modified version user docs example from above: {code:sql} DROP TABLE IF EXISTS iris_data_packed, iris_data_packed_standardization, iris_data_packed_summary; SELECT madlib.minibatch_preprocessor( 'iris_data', 'iris_data_packed', 'class_integer', 'attributes', 10 ); {code} {code:sql} DROP TABLE IF EXISTS mlp_model, mlp_model_summary, mlp_model_standardization; -- Set seed so results are reproducible SELECT setseed(0); SELECT madlib.mlp_classification( 'iris_data_packed', -- Source table 'mlp_model', -- Destination table 'independent_varname', -- Input features 'dependent_varname', -- Label ARRAY[5], -- Number of units per layer 'learning_rate_init=0.003, n_iterations=500, tolerance=0', -- Optimizer params 'tanh', -- Activation function NULL, -- Default weight (1) FALSE,-- No warm start FALSE -- Not verbose ); {code} {code:sql} DROP TABLE IF EXISTS mlp_prediction; SELECT madlib.mlp_predict( 'mlp_model', -- Model table 'iris_data', -- Test data table 'id',-- Id column in test table 'mlp_prediction',-- Output table for predictions 'response' -- Output classes, not probabilities ); SELECT * FROM mlp_prediction JOIN iris_data USING (id) ORDER BY id; {code} produces {code} id | estimated_class_integer |attributes | class_integer | class | state +-+---+---+---+--- 1 | {1,0} | {5.0,3.2,1.2,0.2} | {1,0} | 1 | Alaska 2 | {1,0} | {5.5,3.5,1.3,0.2} | {1,0} | 1 | Alaska 3 | {1,0} | {4.9,3.1,1.5,0.1} | {1,0} | 1 | Alaska 4 | {1,0} | {4.4,3.0,1.3,0.2} | {1,0} | 1 | Alaska 5 | {1,0} | {5.1,3.4,1.5,0.2} | {1,0} | 1 | Alaska 6 | {1,0} | {5.0,3.5,1.3,0.3} | {1,0} | 1 | Alaska 7 | {1,0} | {4.5,2.3,1.3,0.3} | {1,0} | 1 | Alaska 8 | {1,0} | {4.4,3.2,1.3,0.2} | {1,0} | 1 | Alaska 9 | {1,0} | {5.0,3.5,1.6,0.6} | {1,0} | 1 | Alaska 10 | {1,0} | {5.1,3.8,1.9,0.4} | {1,0} | 1 | Alaska 11 | {1,0} | {4.8,3.0,1.4,0.3} | {1,0} | 1 | Alaska 12 | {1,0} | {5.1,3.8,1.6,0.2} | {1,0} | 1 | Alaska 13 | {0,1} | {5.7,2.8,4.5,1.3} | {0,1} | 2 | Alaska 14 | {0,1} | {6.3,3.3,4.7,1.6} | {0,1} | 2 | Alaska 15 | {0,1} | {4.9,2.4,3.3,1.0} | {0,1} | 2 | Alaska 16 | {0,1} | {6.6,2.9,4.6,1.3} | {0,1} | 2 | Alaska 17 | {0,1} | {5.2,2.7,3.9,1.4} | {0,1} | 2 | Alaska 18 | {0,1} | {5.0,2.0,3.5,1.0} | {0,1} | 2 | Alaska 19 | {0,1} | {5.9,3.0,4.2,1.5} | {0,1} | 2 | Alaska 20 | {0,1} | {6.0,2.2,4.0,1.0} | {0,1} | 2 | Alaska 21 | {0,1} | {6.1,2.9,4.7,1.4} | {0,1} | 2 | Alaska 22 | {0,1} | {5.6,2.9,3.6,1.3} | {0,1} | 2 | Alaska 23 | {0,1} | {6.7,3.1,4.4,1.4} | {0,1} | 2 | Alaska 24 | {0,1} | {5.6,3.0,4.5,1.5} | {0,1} | 2 | Alaska 25 | {0,1} | {5.8,2.7,4.1,1.0} | {0,1} | 2 | Alaska 26 | {0,1} | {6.2,2.2,4.5,1.5} | {0,1} | 2 | Alaska 27 | {0,1} | {5.6,2.5,3.9,1.1} | {0,1} | 2 | Alaska 28 | {1,0} | {5.0,3.4,1.5,0.2} | {1,0} | 1 | Tennessee 29 | {1,0} | {4.4,2.9,1.4,0.2} | {1,0} | 1 | Tennessee 30 | {1,0} | {4.9,3.1,1.5,0.1} | {1,0} | 1 | Tennessee 31 | {1,0} | {5.4,3.7,1.5,0.2} | {1,0} | 1 | Tennessee 32 | {1,0} | {4.8,3.4,1.6,0.2} | {1,0} | 1 | Tennessee 33 | {1,0} | {4.8,3.0,1.4,0.1} | {1,0} | 1 | Tennessee 34 | {1,0} | {4.3,3.0,1.1,0.1} | {1,0} | 1 | Tennessee 35 | {1,0} | {5.8,4.0,1.2,0.2} | {1,0} | 1 | Tennessee 36 | {1,0} | {5.7,4.4,1.5,0.4} | {1,0} | 1 | Tennes
[jira] [Commented] (MADLIB-1222) Support already encoded arrays for dependent var in MLP classification
[ https://issues.apache.org/jira/browse/MADLIB-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426074#comment-16426074 ] Frank McQuillan commented on MADLIB-1222: - For IGD (without minibatch) this seems to work OK. e.g., using a modified version user docs example: [http://madlib.apache.org/docs/latest/group__grp__nn.html#example] {code:sql} DROP TABLE IF EXISTS iris_data; CREATE TABLE iris_data( id serial, attributes numeric[], class_integer integer[], class integer, state varchar ); INSERT INTO iris_data(id, attributes, class_integer, class, state) VALUES (1,ARRAY[5.0,3.2,1.2,0.2], ARRAY[1,0],1,'Alaska'), (2,ARRAY[5.5,3.5,1.3,0.2], ARRAY[1,0],1,'Alaska'), (3,ARRAY[4.9,3.1,1.5,0.1], ARRAY[1,0],1,'Alaska'), (4,ARRAY[4.4,3.0,1.3,0.2], ARRAY[1,0],1,'Alaska'), (5,ARRAY[5.1,3.4,1.5,0.2], ARRAY[1,0],1,'Alaska'), (6,ARRAY[5.0,3.5,1.3,0.3], ARRAY[1,0],1,'Alaska'), (7,ARRAY[4.5,2.3,1.3,0.3], ARRAY[1,0],1,'Alaska'), (8,ARRAY[4.4,3.2,1.3,0.2], ARRAY[1,0],1,'Alaska'), (9,ARRAY[5.0,3.5,1.6,0.6], ARRAY[1,0],1,'Alaska'), (10,ARRAY[5.1,3.8,1.9,0.4], ARRAY[1,0],1,'Alaska'), (11,ARRAY[4.8,3.0,1.4,0.3], ARRAY[1,0],1,'Alaska'), (12,ARRAY[5.1,3.8,1.6,0.2], ARRAY[1,0],1,'Alaska'), (13,ARRAY[5.7,2.8,4.5,1.3], ARRAY[0,1],2,'Alaska'), (14,ARRAY[6.3,3.3,4.7,1.6], ARRAY[0,1],2,'Alaska'), (15,ARRAY[4.9,2.4,3.3,1.0], ARRAY[0,1],2,'Alaska'), (16,ARRAY[6.6,2.9,4.6,1.3], ARRAY[0,1],2,'Alaska'), (17,ARRAY[5.2,2.7,3.9,1.4], ARRAY[0,1],2,'Alaska'), (18,ARRAY[5.0,2.0,3.5,1.0], ARRAY[0,1],2,'Alaska'), (19,ARRAY[5.9,3.0,4.2,1.5], ARRAY[0,1],2,'Alaska'), (20,ARRAY[6.0,2.2,4.0,1.0], ARRAY[0,1],2,'Alaska'), (21,ARRAY[6.1,2.9,4.7,1.4], ARRAY[0,1],2,'Alaska'), (22,ARRAY[5.6,2.9,3.6,1.3], ARRAY[0,1],2,'Alaska'), (23,ARRAY[6.7,3.1,4.4,1.4], ARRAY[0,1],2,'Alaska'), (24,ARRAY[5.6,3.0,4.5,1.5], ARRAY[0,1],2,'Alaska'), (25,ARRAY[5.8,2.7,4.1,1.0], ARRAY[0,1],2,'Alaska'), (26,ARRAY[6.2,2.2,4.5,1.5], ARRAY[0,1],2,'Alaska'), (27,ARRAY[5.6,2.5,3.9,1.1], ARRAY[0,1],2,'Alaska'), (28,ARRAY[5.0,3.4,1.5,0.2], ARRAY[1,0],1,'Tennessee'), (29,ARRAY[4.4,2.9,1.4,0.2], ARRAY[1,0],1,'Tennessee'), (30,ARRAY[4.9,3.1,1.5,0.1], ARRAY[1,0],1,'Tennessee'), (31,ARRAY[5.4,3.7,1.5,0.2], ARRAY[1,0],1,'Tennessee'), (32,ARRAY[4.8,3.4,1.6,0.2], ARRAY[1,0],1,'Tennessee'), (33,ARRAY[4.8,3.0,1.4,0.1], ARRAY[1,0],1,'Tennessee'), (34,ARRAY[4.3,3.0,1.1,0.1], ARRAY[1,0],1,'Tennessee'), (35,ARRAY[5.8,4.0,1.2,0.2], ARRAY[1,0],1,'Tennessee'), (36,ARRAY[5.7,4.4,1.5,0.4], ARRAY[1,0],1,'Tennessee'), (37,ARRAY[5.4,3.9,1.3,0.4], ARRAY[1,0],1,'Tennessee'), (38,ARRAY[6.0,2.9,4.5,1.5], ARRAY[0,1],2,'Tennessee'), (39,ARRAY[5.7,2.6,3.5,1.0], ARRAY[0,1],2,'Tennessee'), (40,ARRAY[5.5,2.4,3.8,1.1], ARRAY[0,1],2,'Tennessee'), (41,ARRAY[5.5,2.4,3.7,1.0], ARRAY[0,1],2,'Tennessee'), (42,ARRAY[5.8,2.7,3.9,1.2], ARRAY[0,1],2,'Tennessee'), (43,ARRAY[6.0,2.7,5.1,1.6], ARRAY[0,1],2,'Tennessee'), (44,ARRAY[5.4,3.0,4.5,1.5], ARRAY[0,1],2,'Tennessee'), (45,ARRAY[6.0,3.4,4.5,1.6], ARRAY[0,1],2,'Tennessee'), (46,ARRAY[6.7,3.1,4.7,1.5], ARRAY[0,1],2,'Tennessee'), (47,ARRAY[6.3,2.3,4.4,1.3], ARRAY[0,1],2,'Tennessee'), (48,ARRAY[5.6,3.0,4.1,1.3], ARRAY[0,1],2,'Tennessee'), (49,ARRAY[5.5,2.5,4.0,1.3], ARRAY[0,1],2,'Tennessee'), (50,ARRAY[5.5,2.6,4.4,1.2], ARRAY[0,1],2,'Tennessee'), (51,ARRAY[6.1,3.0,4.6,1.4], ARRAY[0,1],2,'Tennessee'), (52,ARRAY[5.8,2.6,4.0,1.2], ARRAY[0,1],2,'Tennessee'); {code} {code:sql} DROP TABLE IF EXISTS mlp_model, mlp_model_summary, mlp_model_standardization; -- Set seed so results are reproducible SELECT setseed(0); SELECT madlib.mlp_classification( 'iris_data', -- Source table 'mlp_model', -- Destination table 'attributes', -- Input features 'class_integer', -- Label ARRAY[5], -- Number of units per layer 'learning_rate_init=0.003, n_iterations=500, tolerance=0', -- Optimizer params 'tanh', -- Activation function NULL, -- Default weight (1) FALSE,-- No warm start FALSE -- Not verbose ); {code} {code:sql} DROP TABLE IF EXISTS mlp_prediction; SELECT madlib.mlp_predict( 'mlp_model', -- Model table 'iris_data', -- Test data table 'id',-- Id column in test table 'mlp_prediction',-- Output table for predictions 'prob' -- Output classes, not probabilities ); SELECT * FROM mlp_prediction JOIN iris_data USING (id) ORDER BY id; {code} produces {code} id | estimated_class_integer |attributes | class_integer | class | state +-+---+---+---+--- 1 | {1,0} | {5.0,3.2,1.2,0.2} | {1,0} | 1 | Alaska 2 | {1,0} | {5.5,3.5,1.3,0.2} | {1,0} | 1 | Alaska 3 | {1,0} | {4.9,3.1,1.5,0.1} | {1,0} | 1 | Alaska
[jira] [Commented] (MADLIB-1222) Support already encoded arrays for dependent var in MLP classification
[ https://issues.apache.org/jira/browse/MADLIB-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414347#comment-16414347 ] ASF GitHub Bot commented on MADLIB-1222: GitHub user njayaram2 opened a pull request: https://github.com/apache/madlib/pull/250 MLP: Allow one-hot encoded dependent var for classification JIRA:MADLIB-1222 MLP currently automatically encodes categorical variables for classification but does not allow already encoded arrays for dependent variables in mlp_classification. This commit lets users have an already encoded array for the dependent variable and train a model. You can merge this pull request into a Git repository by running: $ git pull https://github.com/madlib/madlib feature/mlp/support-encoded-dep-var Alternatively you can review and apply these changes as the patch at: https://github.com/apache/madlib/pull/250.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #250 commit f5a87dee6bc8f27c1a13a4921ea726b391b1813d Author: Nandish Jayaram Date: 2018-03-20T22:43:25Z MLP: Allow one-hot encoded dependent var for classification JIRA:MADLIB-1222 MLP currently automatically encodes categorical variables for classification but does not allow already encoded arrays for dependent variables in mlp_classification. This commit lets users have an already encoded array for the dependent variable and train a model. > Support already encoded arrays for dependent var in MLP classification > -- > > Key: MADLIB-1222 > URL: https://issues.apache.org/jira/browse/MADLIB-1222 > Project: Apache MADlib > Issue Type: New Feature > Components: Module: Neural Networks >Reporter: Nandish Jayaram >Priority: Major > Fix For: v1.14 > > > MLP currently only supports scalar dependent variables for MLP > classification. If a user has already one-hot encoded categorical variables > the dependent variable will be an array, and hence unusable with > mlp_classification. This feature request is to allow the use of one-hot > encoded array for dependent vars in MLP classification. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (MADLIB-1222) Support already encoded arrays for dependent var in MLP classification
[ https://issues.apache.org/jira/browse/MADLIB-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412215#comment-16412215 ] Nandish Jayaram commented on MADLIB-1222: - Example use case, and handling it: 1) User encodes dep vars with whatever tool they want and put it in the column `color` . Maybe they do this to anonymize, maybe the data is just in that format already: {code} blue [1,0,0] red [0,1,0] green [0,0,1] {code} --- start MADlib 2) runs mini-batch preprocess (if planning to use mini-batch) 3) runs MLP classification train (IGD or mini-batch) 4a) runs MLP predict (response): {code} actual predicted [0,1,0] [1,0,0] [0,0,1] [0,0,1] [1,0,0] [1,0,0] etc. {code} 4b) runs MLP predict (prob): {code} actual estimated_prob [0,1,0] [0.85, 0.10, ,0.05] [0,0,1] [0.0 , 0.1 , 0.9] [1,0,0] [0.75, 0.20, 0.05] etc. {code} end MADlib -- 5) User maps back to red, blue, green since they know the mapping but MADlib doesn't. > Support already encoded arrays for dependent var in MLP classification > -- > > Key: MADLIB-1222 > URL: https://issues.apache.org/jira/browse/MADLIB-1222 > Project: Apache MADlib > Issue Type: New Feature > Components: Module: Neural Networks >Reporter: Nandish Jayaram >Priority: Major > Fix For: v1.14 > > > MLP currently only supports scalar dependent variables for MLP > classification. If a user has already one-hot encoded categorical variables > the dependent variable will be an array, and hence unusable with > mlp_classification. This feature request is to allow the use of one-hot > encoded array for dependent vars in MLP classification. -- This message was sent by Atlassian JIRA (v7.6.3#76005)