[GitHub] madlib issue #342: Minibatch Preprocessor for Deep learning

2018-12-19 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/342 https://issues.apache.org/jira/browse/MADLIB-1290 associated JIRA ---

[GitHub] madlib issue #329: Release/prep 1.15.1

2018-10-03 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/329 Comments on release notes: (1) MADLIB-1171 does not apply to AO tables, maybe a typo in this release note: "Build: Disable AppendOnly if available (MADLIB-1171, MADLIB

[GitHub] madlib issue #321: RF: Increase the dataset size of dev-check test

2018-09-21 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/321 in that case LGTM ---

[GitHub] madlib issue #321: RF: Increase the dataset size of dev-check test

2018-09-21 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/321 Does this fix the sporadic IC/DC issues that we have been seeing with RF? ---

[GitHub] madlib issue #319: Allocator: Remove 16-byte alignment for pointers in GP6

2018-09-12 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/319 So this PR only affects GP 6+ ? It means that GP 4.3.x, GP 5 and all supported PG versions will continue to work as is, and use Eigen vectorization, if the underlying infra supports it? ---

[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 re-running the failed test, seems to pass now: ``` SELECT * FROM knn_result_list_neighbors ORDER BY id; ``` produces ``` id | data | k_nearest_neighbours

[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 Actually the earlier issue above ^^^ is OK, where I said `I'm not sure what this is doing` because forcing all training data to be a single point means that the distance to all test points

[GitHub] madlib issue #317: Fixed trailing whitespace in many sql_in files

2018-09-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/317 then let's merge it ---

[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 (1) expression for test data array: ``` DROP TABLE IF EXISTS knn_result_classification; SELECT * FROM madlib.knn( 'knn_train_data', -- Table

[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 I'm not sure what this is doing: ``` %%sql DROP TABLE IF EXISTS knn_result_classification; SELECT * FROM madlib.knn( 'knn_train_data', -- Table

[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 load data: ``` DROP TABLE IF EXISTS knn_train_data; CREATE TABLE knn_train_data ( id integer, data integer

[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...

2018-09-05 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/315#discussion_r215462116 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, point_column_name

[GitHub] madlib issue #313: MLP: Simplify momentum and Nesterov updates

2018-09-04 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/313 is this ready to merge? ---

[GitHub] madlib issue #314: Ubuntu support: Enable creation of gppkg on Ubuntu

2018-08-27 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/314 Thanks @njayaram2 for the clarification. ---

[GitHub] madlib issue #314: Ubuntu support: Enable creation of gppkg on Ubuntu

2018-08-27 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/314 So this requires Alien, but we do not automatically download or bundle Alien, correct? ---

[GitHub] madlib pull request #:

2018-08-17 Thread fmcquillan99
Github user fmcquillan99 commented on the pull request: https://github.com/apache/madlib/commit/5e707f745c50343dd7395a3e8f86c04428210977#commitcomment-30142753 Also fixed some spacing issues ---

[GitHub] madlib issue #308: Release: Release Notes for v1.15

2018-08-06 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/308 LGTM ---

[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-08-01 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/295 LGTM, here is an RF example: ``` SELECT * FROM mt_imp_output ORDER BY am, oob_var_importance DESC; am | feature | oob_var_importance | impurity_var_importance

[GitHub] madlib issue #291: Feature: Vector-Column Transformations

2018-07-31 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 thanks, that makes sense. I added a type casting example to the user docs. LGTM ---

[GitHub] madlib issue #291: Feature: Vector-Column Transformations

2018-07-31 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 Where did we land on the boolean casting issue? Testing on Greenplum 5, I see: ``` (psycopg2.ProgrammingError) plpy.SPIError: ARRAY types boolean and text cannot be matched

[GitHub] madlib pull request #298: misc 1.15 user doc updates

2018-07-25 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/298#discussion_r205310071 --- Diff: doc/mainpage.dox.in --- @@ -100,13 +86,14 @@ complete matrix stored as a distributed table. @defgroup grp_matrix Matrix

[GitHub] madlib issue #298: misc 1.15 user doc updates

2018-07-25 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/298 This should be ready to merge if if looks OK. I don't have any other 1.15 doc related items to deliver. ---

[GitHub] madlib pull request #298: misc 1.15 user doc updates

2018-07-24 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/298 misc 1.15 user doc updates Added descriptions to left panel for modules that were missing. Fixed types and formatting in various places. Cleaned up main use doc page and removed links

[GitHub] madlib issue #291: Feature: Vector-Column Transformations

2018-07-23 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 In cols2vec and vec2cols, ordering has been fixed so new columns are always on the right of the source table columns in the output (if any). In cols2vec, casting seems OK now. I tested

[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-20 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/295 I like this last suggestion from @iyerr3, that we report raw values for oob and impurity VI in the model output file. (OK to keep the shifted oob > 0 as we do now.) For the hel

[GitHub] madlib issue #291: Feature: Vector-Column Transformations

2018-07-20 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 (1) Now I think it is casting all numeric to DOUBLE and all non-numeric to TEXT? But if all the columns are INT, should not cast them to DOUBLE, rather should create an array of INTs

[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-19 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/295 Another run I got ``` grp 0 grp1 31.01364943 31.6576 22.85881741

[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-19 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/295 Should impurity_var_importance always add up to 100? From the regression example in the user docs: ``` DROP TABLE IF EXISTS mt_imp_output; SELECT madlib.get_var_importance

[GitHub] madlib issue #291: Feature: Vector to Columns

2018-07-19 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 After the above 2 issues I mentioned are fixed, I will have 1 more commit on user docs to this PR ---

[GitHub] madlib issue #291: Feature: Vector to Columns

2018-07-19 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 In cols2vec, For this table: ``` CREATE TABLE golf ( id integer NOT NULL, "OUTLOOK" text, temperature double precision, humidity double

[GitHub] madlib issue #289: RF: Add impurity variable importance

2018-07-17 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/289 ``` The model table produced by the training function contains the following columns: gid INTEGER. Group id that uniquely identifies a set of grouping column values. sample_id

[GitHub] madlib issue #291: Feature: Vector to Columns

2018-07-12 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 user docs seem incomplete ---

[GitHub] madlib issue #282: Utilites: Add CTAS while dropping some columns

2018-07-12 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/282 ah, i see. I think it is fine as you have put it. LGTM ---

[GitHub] madlib issue #282: Utilites: Add CTAS while dropping some columns

2018-07-11 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/282 looks like user docs lost the params description for dropcols() ---

[GitHub] madlib issue #282: Utilites: Add CTAS while dropping some columns

2018-07-11 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/282 There is a bit of inconsistency related to the last param `cols_to_drop` ``` SELECT madlib.dropcols( 'houses', 'houses_out

[GitHub] madlib issue #287: Fix incorrect dict expansion in table header

2018-07-11 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/287 This latest commit makes the following changes to use docs: 1) clarify cv for SVM and add user examples 2) clarify cv for elastic net and fix user examples 3) correct rmse calc

[GitHub] madlib issue #288: Jira:1239: Converts features from multiple columns into a...

2018-07-06 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/288 update my comment above to remove the rows processed and skipped. ---

[GitHub] madlib issue #288: Jira:1239: Converts features from multiple columns into a...

2018-07-06 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/288 Since we are writing out a summary table, may as well add more info in it. {code} A summary table named _summary is also created at the same time, which has the following columns

[GitHub] madlib pull request #288: Jira:1239: Converts features from multiple columns...

2018-07-05 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/288#discussion_r200510366 --- Diff: src/ports/postgres/modules/cols_vec/cols2vec.py_in --- @@ -0,0 +1,110 @@ +""" +@file cols2vec.py_in +

[GitHub] madlib issue #269: Statistics: Add grouping support for correlation function...

2018-05-16 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/269 Thanks for the explanation. I pushed one additional small commit that changes the name of the module from "Pearson's Correlation" to "Covariance and Correlation&q

[GitHub] madlib issue #269: Statistics: Add grouping support for correlation function...

2018-05-11 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/269 (1) ``` DROP TABLE IF EXISTS example_data_output, example_data_output_summary; SELECT madlib.correlation( 'example_data', 'example_data_output

[GitHub] madlib issue #267: Multiple: Remove support for HAWQ from all modules

2018-05-04 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/267 There is some reference to HAWQ in https://github.com/apache/madlib/blob/master/ReadMe_Build.txt which I don’t see removed in the PR. Otherwise seems OK though I did not do

[GitHub] madlib pull request #264: updated pagerank docs for PPR, minor formating and...

2018-04-17 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/264 updated pagerank docs for PPR, minor formating and such 1) minor formatting improvements 2) added reference for PPR and changed PR reference to paper and not wikipedia You can merge

[GitHub] madlib issue #257: mini-batch user docs

2018-04-17 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/257 OK done now ---

[GitHub] madlib issue #257: mini-batch user docs

2018-04-17 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/257 Main changes: 4. Clarified grouping as per https://github.com/apache/madlib/pull/263 This is final change so you can review and merge if it looks good. ---

[GitHub] madlib issue #263: Bugfix/mlp minibatch grouping

2018-04-17 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/263 I tested this quite a bit and it seems to work nicely for me. LGTM ---

[GitHub] madlib issue #257: mini-batch user docs

2018-04-15 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/257 Main changes: 1) Updated minibatch docs to show use of encoding scalar integer dep var 2) Added minibatch examples and explanations to MLP 3) Reduced the number of redundant

[GitHub] madlib issue #256: Minibatch Preprocessing: change default buffer size formu...

2018-04-10 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/256 LGTM Default selection looks reasonable: (0) data DROP TABLE IF EXISTS iris_data; CREATE TABLE iris_data( id serial, attributes numeric

[GitHub] madlib issue #255: MLP: Remove source table dependency for predicting regres...

2018-04-10 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/255 LGTM, see https://issues.apache.org/jira/browse/MADLIB-1223 for tests i ran ---

[GitHub] madlib pull request #257: mini-batch user docs

2018-04-06 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/257 mini-batch user docs This commit is for the preprocessor user docs. MLP user doc updates to follow in subsequent commit. Can someone please review this content? thx You can

[GitHub] madlib issue #256: Minibatch Preprocessing: change default buffer size formu...

2018-04-06 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/256 Oh I see, with the averaging approach: buffer_size = avg_num_rows_per_segment / num_segments = 21.5 / 2 = 10.75 and rounding up

[GitHub] madlib issue #256: Minibatch Preprocessing: change default buffer size formu...

2018-04-06 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/256 Is this expected behavior? last group for NJ gets only 1 observation ``` DROP TABLE IF EXISTS iris_data; CREATE TABLE iris_data( id serial, attributes numeric

[GitHub] madlib issue #256: Minibatch Preprocessing: change default buffer size formu...

2018-04-05 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/256 We seem to be computing batch size using master but prob should just consider num segments. ---

[GitHub] madlib issue #251: MLP: Simplify initialization of model coefficients

2018-04-04 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/251 Using the data set from http://madlib.apache.org/docs/latest/group__grp__nn.html#example the warm start seems to be functioning OK in the sense that it is picking up where it left off

[GitHub] madlib issue #250: MLP: Allow one-hot encoded dependent var for classificati...

2018-04-04 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/250 See JIRA https://issues.apache.org/jira/browse/MADLIB-1222 for examples showing this works for IGD and mini-batch LGTM I think u can go ahead and merge this PR to master ---

[GitHub] madlib pull request #252: leftover minor RF user doc update

2018-03-28 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/252 leftover minor RF user doc update A few remaining RF user doc changes I missed in https://github.com/apache/madlib/commit/7f3aae92f2d84bf7e4501ac5efec1ebfc7a80834 Also added

[GitHub] madlib issue #246: DT and RF user doc updates

2018-03-26 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/246 https://issues.apache.org/jira/browse/MADLIB-1217 https://issues.apache.org/jira/browse/MADLIB-1218 https://issues.apache.org/jira/browse/MADLIB-1219 have all been fixed so I made

[GitHub] madlib issue #249: RF: Use NULL::integer[] when no continuous features

2018-03-26 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/249 See https://issues.apache.org/jira/browse/MADLIB-1219 for results from my tests. LGTM ---

[GitHub] madlib issue #248: DT: Ensure proper quoting in grouping coalesce

2018-03-23 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/248 I checked against the examples in JIRA: MADLIB-1217 JIRA: MADLIB-1218 and both work OK for me. So from the fix to the functionality perspective, LGTM. Other

[GitHub] madlib issue #246: DT user doc updates

2018-03-22 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/246 @rahiyer RF docs ready for review too. ---

[GitHub] madlib issue #242: PCA: Fix issue with text grouping col input

2018-03-21 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/242 LGTM, this can be merged ---

[GitHub] madlib pull request #246: DT user doc updates

2018-03-21 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/246#discussion_r176152043 --- Diff: src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in --- @@ -418,7 +468,10 @@ tree_predict(tree_model

[GitHub] madlib pull request #246: DT user doc updates

2018-03-21 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/246#discussion_r176150844 --- Diff: src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in --- @@ -127,7 +132,11 @@ tree_train( weights (optional

[GitHub] madlib pull request #246: DT user doc updates

2018-03-20 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/246 DT user doc updates @rahiyer please review DT user doc updates Will start working on RF in parallel. You can merge this pull request into a Git repository by running: $ git pull

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-08 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r173254469 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +545,95 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-08 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r173239594 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +545,95 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-08 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r173238804 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +545,95 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172922334 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172921714 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172921328 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172920935 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172920825 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172920581 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -149,8 +149,10 @@ non-stratified, that is, the whole table is treated

[GitHub] madlib issue #238: MLP: Use array_upper to get the last array element

2018-02-22 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/238 LGTM ---

[GitHub] madlib issue #234: Create lower case column name in encode_categorical_varia...

2018-02-21 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/234 ``` DROP TABLE IF EXISTS abalone_out, abalone_out_dictionary; SELECT madlib.encode_categorical_variables ( 'abalone', -- Source table

[GitHub] madlib issue #234: Create lower case column name in encode_categorical_varia...

2018-02-20 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/234 was just testing 1.13 on postgres 9.6 and found this error ``` DROP TABLE IF EXISTS abalone_out, abalone_out_dictionary; SELECT madlib.encode_categorical_variables

[GitHub] madlib issue #234: Create lower case column name in encode_categorical_varia...

2018-02-16 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/234 Similarly ``` DROP TABLE IF EXISTS abalone_out, abalone_out_dictionary; SELECT madlib.encode_categorical_variables ( 'abalone', -- Source table

[GitHub] madlib pull request #235: update KNN, DT and RF docs to match recent commits

2018-02-15 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/235#discussion_r168557191 --- Diff: src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in --- @@ -208,13 +208,26 @@ forest_train(training_table_name

[GitHub] madlib pull request #235: update KNN, DT and RF docs to match recent commits

2018-02-13 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/235 update KNN, DT and RF docs to match recent commits KNN * describe weighted average in more detail DT & RF * correct some doc errors and omissions * update example to

[GitHub] madlib issue #231: RF: Output non-negative importance values

2018-02-06 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/231 Does this mean, then, that all var importance values are >= 0 now, and that the largest positive value corresponds to the most "important" variable? Also, what is the rang

[GitHub] madlib issue #223: Balance datasets : re-sampling technique

2018-01-16 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/223 Regarding (2) and (3) above, looks like it does not fail with `'red:7, blue:7'` but the MADlib convention is 'red=7, blue=7' so need to change to use `=`. (4) Seems to take only

[GitHub] madlib issue #223: Balance datasets : re-sampling technique

2018-01-12 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/223 Can you please double check that install checks are robust with respect to different Python rounding on different hardware? ---

[GitHub] madlib issue #223: Balance datasets : re-sampling technique

2018-01-12 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/223 Started testing, some early observations: (1) class_size default should be ‘uniform’, it seems to be set to ‘undersample’ currently (2) ` SELECT

[GitHub] madlib pull request #222: minor update to summary() user docs

2018-01-02 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/222 minor update to summary() user docs to finish off https://issues.apache.org/jira/browse/MADLIB-1167 You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] madlib-site pull request #10: 1dot13 website updates

2017-12-18 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib-site/pull/10 1dot13 website updates You can merge this pull request into a Git repository by running: $ git pull https://github.com/fmcquillan99/incubator-madlib-site website-1dot13

[GitHub] madlib issue #217: Release: Update RELEASE_NOTES for v1.13

2017-12-18 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/217 LGTM Until we fix https://issues.apache.org/jira/browse/MADLIB-1185 we cannot claim full postgres 10 support, so I think these release notes are accurate ---

[GitHub] madlib-site pull request #3: Asf site

2017-12-08 Thread fmcquillan99
Github user fmcquillan99 closed the pull request at: https://github.com/apache/madlib-site/pull/3 ---

[GitHub] madlib pull request #212: update PyXB version in README.md

2017-12-07 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/212#discussion_r155685205 --- Diff: README.md --- @@ -11,9 +11,11 @@ Installation and Contribution == See the project website [`MADlib Home

[GitHub] madlib pull request #212: update PyXB version in README.md

2017-12-07 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/212 update PyXB version in README.md minor version update for PyXB from 1.2.4 to 1.2.6 in README.md You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] madlib issue #211: Change madlib gppkg version string

2017-12-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/211 naming convention looks good. I assume for prod releases there is no "_dev" in the name. ---

[GitHub] madlib pull request #209: add grouping predict e.g. for lin reg

2017-12-06 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/209 add grouping predict e.g. for lin reg suggestion from user to add this example You can merge this pull request into a Git repository by running: $ git pull https://github.com/fmcquillan99

[GitHub] madlib pull request #208: correct knn user docs

2017-12-04 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/208 correct knn user docs mostly corrections related to distance function explanations You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] madlib issue #206: Feature: Allow NULL in rows for computing correlations an...

2017-12-01 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/206 The user doc explanation/caution looks good on mean imputation, thanks for adding LGTM ---

[GitHub] madlib issue #204: Added additional distance metrics for k-NN: Jira-1059

2017-11-29 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/204 @hpandeycodeit we would like to get this PR in the upcoming 1.13 release. Are you planning to do additional work as per the comments above? ---

[GitHub] madlib pull request #205: hits and lin regr doc updates

2017-11-27 Thread fmcquillan99
Github user fmcquillan99 closed the pull request at: https://github.com/apache/madlib/pull/205 ---

[GitHub] madlib pull request #205: hits and lin regr doc updates

2017-11-22 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/205#discussion_r152645105 --- Diff: src/ports/postgres/modules/graph/hits.sql_in --- @@ -103,18 +102,18 @@ It is named by adding the suffix '_summary' to the 'out_table

[GitHub] madlib pull request #205: hits and lin regr doc updates

2017-11-22 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/205#discussion_r152636816 --- Diff: src/ports/postgres/modules/regress/linear.sql_in --- @@ -183,16 +183,15 @@ FROM ( @par Prediction Function The prediction function

[GitHub] madlib pull request #205: hits and lin regr doc updates

2017-11-21 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/205 hits and lin regr doc updates minor updates to graph HITS algo docs fixed order of params in lin regr prediction docs You can merge this pull request into a Git repository by running

[GitHub] madlib issue #199: Bugfix: Hard coded schema name in WCC install check

2017-11-13 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/199 Did this pass previous IC and functional tests with madlib as the schema? Also, is it possible to do a global search to see if we have done this in other modules too? ---

[GitHub] madlib issue #191: KNN: Fix optional parameters and ordering

2017-10-23 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/191 LGTM based on some testing and docs review. I would ask other community folks to pls review code in more detail however. ---

  1   2   >