[GitHub] madlib issue #186: Add error message for checking postgres install configura...

2017-09-20 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/186 Please replace "portid (your platform)" with actual values in the message. ---

[GitHub] madlib issue #75: SVM: Implement c++ functions for training multi-class svm ...

2017-09-20 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/75 Update on this PR since it has been open for awhile. This was some good work by mktal to build a multi-class svm module. The issue with the PR is that the mini-batching is embedded

[GitHub] madlib issue #189: Pivot: Reference "current" PostgreSQL docs instead of 8.2...

2017-10-06 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/189 Ed, do you mind making the same changes in these 2 modules: http://madlib.apache.org/docs/latest/group__grp__path.html#literature http://madlib.apache.org/docs/latest

[GitHub] madlib issue #189: Docs: Reference "current" PostgreSQL docs instead of EOL ...

2017-10-06 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/189 LGTM ---

[GitHub] incubator-madlib-site issue #9: Update website for MADlib 1.21 release.

2017-08-29 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/incubator-madlib-site/pull/9 design.pdf only displays the top 150 pages in the github viewer, can you please confirm the whole document is there? --- If your project is set up for it, you can reply

[GitHub] incubator-madlib-site issue #6: clarify dmg for pg on download page

2017-08-29 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/incubator-madlib-site/pull/6 see Ed's comment above --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] incubator-madlib-site pull request #6: clarify dmg for pg on download page

2017-08-29 Thread fmcquillan99
Github user fmcquillan99 closed the pull request at: https://github.com/apache/incubator-madlib-site/pull/6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] madlib issue #191: KNN: Fix optional parameters and ordering

2017-10-23 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/191 LGTM based on some testing and docs review. I would ask other community folks to pls review code in more detail however. ---

[GitHub] madlib pull request #205: hits and lin regr doc updates

2017-11-27 Thread fmcquillan99
Github user fmcquillan99 closed the pull request at: https://github.com/apache/madlib/pull/205 ---

[GitHub] madlib pull request #205: hits and lin regr doc updates

2017-11-22 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/205#discussion_r152636816 --- Diff: src/ports/postgres/modules/regress/linear.sql_in --- @@ -183,16 +183,15 @@ FROM ( @par Prediction Function The prediction function

[GitHub] madlib issue #204: Added additional distance metrics for k-NN: Jira-1059

2017-11-29 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/204 @hpandeycodeit we would like to get this PR in the upcoming 1.13 release. Are you planning to do additional work as per the comments above? ---

[GitHub] madlib pull request #205: hits and lin regr doc updates

2017-11-22 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/205#discussion_r152645105 --- Diff: src/ports/postgres/modules/graph/hits.sql_in --- @@ -103,18 +102,18 @@ It is named by adding the suffix '_summary' to the 'out_table

[GitHub] madlib pull request #208: correct knn user docs

2017-12-04 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/208 correct knn user docs mostly corrections related to distance function explanations You can merge this pull request into a Git repository by running: $ git pull https://github.com

[GitHub] madlib issue #206: Feature: Allow NULL in rows for computing correlations an...

2017-12-01 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/206 The user doc explanation/caution looks good on mean imputation, thanks for adding LGTM ---

[GitHub] madlib issue #217: Release: Update RELEASE_NOTES for v1.13

2017-12-18 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/217 LGTM Until we fix https://issues.apache.org/jira/browse/MADLIB-1185 we cannot claim full postgres 10 support, so I think these release notes are accurate ---

[GitHub] madlib-site pull request #10: 1dot13 website updates

2017-12-18 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib-site/pull/10 1dot13 website updates You can merge this pull request into a Git repository by running: $ git pull https://github.com/fmcquillan99/incubator-madlib-site website-1dot13

[GitHub] madlib issue #199: Bugfix: Hard coded schema name in WCC install check

2017-11-13 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/199 Did this pass previous IC and functional tests with madlib as the schema? Also, is it possible to do a global search to see if we have done this in other modules too? ---

[GitHub] madlib pull request #205: hits and lin regr doc updates

2017-11-21 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/205 hits and lin regr doc updates minor updates to graph HITS algo docs fixed order of params in lin regr prediction docs You can merge this pull request into a Git repository by running

[GitHub] madlib pull request #212: update PyXB version in README.md

2017-12-07 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/212 update PyXB version in README.md minor version update for PyXB from 1.2.4 to 1.2.6 in README.md You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] madlib pull request #212: update PyXB version in README.md

2017-12-07 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/212#discussion_r155685205 --- Diff: README.md --- @@ -11,9 +11,11 @@ Installation and Contribution == See the project website [`MADlib Home

[GitHub] madlib issue #211: Change madlib gppkg version string

2017-12-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/211 naming convention looks good. I assume for prod releases there is no "_dev" in the name. ---

[GitHub] madlib pull request #209: add grouping predict e.g. for lin reg

2017-12-06 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/209 add grouping predict e.g. for lin reg suggestion from user to add this example You can merge this pull request into a Git repository by running: $ git pull https://github.com/fmcquillan99

[GitHub] madlib-site pull request #3: Asf site

2017-12-08 Thread fmcquillan99
Github user fmcquillan99 closed the pull request at: https://github.com/apache/madlib-site/pull/3 ---

[GitHub] madlib issue #267: Multiple: Remove support for HAWQ from all modules

2018-05-04 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/267 There is some reference to HAWQ in https://github.com/apache/madlib/blob/master/ReadMe_Build.txt which I don’t see removed in the PR. Otherwise seems OK though I did not do

[GitHub] madlib issue #269: Statistics: Add grouping support for correlation function...

2018-05-11 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/269 (1) ``` DROP TABLE IF EXISTS example_data_output, example_data_output_summary; SELECT madlib.correlation( 'example_data', 'example_data_output

[GitHub] madlib issue #269: Statistics: Add grouping support for correlation function...

2018-05-16 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/269 Thanks for the explanation. I pushed one additional small commit that changes the name of the module from "Pearson's Correlation" to "Covariance and Correlation&q

[GitHub] madlib issue #223: Balance datasets : re-sampling technique

2018-01-12 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/223 Started testing, some early observations: (1) class_size default should be ‘uniform’, it seems to be set to ‘undersample’ currently (2) ` SELECT

[GitHub] madlib issue #223: Balance datasets : re-sampling technique

2018-01-12 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/223 Can you please double check that install checks are robust with respect to different Python rounding on different hardware? ---

[GitHub] madlib issue #231: RF: Output non-negative importance values

2018-02-06 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/231 Does this mean, then, that all var importance values are >= 0 now, and that the largest positive value corresponds to the most "important" variable? Also, what is the rang

[GitHub] madlib pull request #235: update KNN, DT and RF docs to match recent commits

2018-02-13 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/235 update KNN, DT and RF docs to match recent commits KNN * describe weighted average in more detail DT & RF * correct some doc errors and omissions * update example to

[GitHub] madlib issue #234: Create lower case column name in encode_categorical_varia...

2018-02-21 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/234 ``` DROP TABLE IF EXISTS abalone_out, abalone_out_dictionary; SELECT madlib.encode_categorical_variables ( 'abalone', -- Source table

[GitHub] madlib issue #234: Create lower case column name in encode_categorical_varia...

2018-02-20 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/234 was just testing 1.13 on postgres 9.6 and found this error ``` DROP TABLE IF EXISTS abalone_out, abalone_out_dictionary; SELECT madlib.encode_categorical_variables

[GitHub] madlib issue #238: MLP: Use array_upper to get the last array element

2018-02-22 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/238 LGTM ---

[GitHub] madlib pull request #235: update KNN, DT and RF docs to match recent commits

2018-02-15 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/235#discussion_r168557191 --- Diff: src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in --- @@ -208,13 +208,26 @@ forest_train(training_table_name

[GitHub] madlib issue #234: Create lower case column name in encode_categorical_varia...

2018-02-16 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/234 Similarly ``` DROP TABLE IF EXISTS abalone_out, abalone_out_dictionary; SELECT madlib.encode_categorical_variables ( 'abalone', -- Source table

[GitHub] madlib pull request #222: minor update to summary() user docs

2018-01-02 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/222 minor update to summary() user docs to finish off https://issues.apache.org/jira/browse/MADLIB-1167 You can merge this pull request into a Git repository by running: $ git pull https

[GitHub] madlib pull request #298: misc 1.15 user doc updates

2018-07-25 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/298#discussion_r205310071 --- Diff: doc/mainpage.dox.in --- @@ -100,13 +86,14 @@ complete matrix stored as a distributed table. @defgroup grp_matrix Matrix

[GitHub] madlib issue #298: misc 1.15 user doc updates

2018-07-25 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/298 This should be ready to merge if if looks OK. I don't have any other 1.15 doc related items to deliver. ---

[GitHub] madlib issue #291: Feature: Vector-Column Transformations

2018-07-31 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 Where did we land on the boolean casting issue? Testing on Greenplum 5, I see: ``` (psycopg2.ProgrammingError) plpy.SPIError: ARRAY types boolean and text cannot be matched

[GitHub] madlib issue #291: Feature: Vector-Column Transformations

2018-07-31 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 thanks, that makes sense. I added a type casting example to the user docs. LGTM ---

[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-08-01 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/295 LGTM, here is an RF example: ``` SELECT * FROM mt_imp_output ORDER BY am, oob_var_importance DESC; am | feature | oob_var_importance | impurity_var_importance

[GitHub] madlib issue #308: Release: Release Notes for v1.15

2018-08-06 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/308 LGTM ---

[GitHub] madlib pull request #:

2018-08-17 Thread fmcquillan99
Github user fmcquillan99 commented on the pull request: https://github.com/apache/madlib/commit/5e707f745c50343dd7395a3e8f86c04428210977#commitcomment-30142753 Also fixed some spacing issues ---

[GitHub] madlib issue #314: Ubuntu support: Enable creation of gppkg on Ubuntu

2018-08-27 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/314 Thanks @njayaram2 for the clarification. ---

[GitHub] madlib issue #314: Ubuntu support: Enable creation of gppkg on Ubuntu

2018-08-27 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/314 So this requires Alien, but we do not automatically download or bundle Alien, correct? ---

[GitHub] madlib issue #291: Feature: Vector-Column Transformations

2018-07-20 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 (1) Now I think it is casting all numeric to DOUBLE and all non-numeric to TEXT? But if all the columns are INT, should not cast them to DOUBLE, rather should create an array of INTs

[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-20 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/295 I like this last suggestion from @iyerr3, that we report raw values for oob and impurity VI in the model output file. (OK to keep the shifted oob > 0 as we do now.) For the hel

[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-19 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/295 Should impurity_var_importance always add up to 100? From the regression example in the user docs: ``` DROP TABLE IF EXISTS mt_imp_output; SELECT madlib.get_var_importance

[GitHub] madlib issue #295: Recursive Partitioning: Add function to report importance...

2018-07-19 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/295 Another run I got ``` grp 0 grp1 31.01364943 31.6576 22.85881741

[GitHub] madlib issue #289: RF: Add impurity variable importance

2018-07-17 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/289 ``` The model table produced by the training function contains the following columns: gid INTEGER. Group id that uniquely identifies a set of grouping column values. sample_id

[GitHub] madlib issue #291: Feature: Vector-Column Transformations

2018-07-23 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 In cols2vec and vec2cols, ordering has been fixed so new columns are always on the right of the source table columns in the output (if any). In cols2vec, casting seems OK now. I tested

[GitHub] madlib pull request #298: misc 1.15 user doc updates

2018-07-24 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/298 misc 1.15 user doc updates Added descriptions to left panel for modules that were missing. Fixed types and formatting in various places. Cleaned up main use doc page and removed links

[GitHub] madlib issue #291: Feature: Vector to Columns

2018-07-19 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 In cols2vec, For this table: ``` CREATE TABLE golf ( id integer NOT NULL, "OUTLOOK" text, temperature double precision, humidity double

[GitHub] madlib issue #291: Feature: Vector to Columns

2018-07-19 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 After the above 2 issues I mentioned are fixed, I will have 1 more commit on user docs to this PR ---

[GitHub] madlib issue #313: MLP: Simplify momentum and Nesterov updates

2018-09-04 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/313 is this ready to merge? ---

[GitHub] madlib pull request #315: JIRA:1060 - Modified KNN to accept expressions in ...

2018-09-05 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/315#discussion_r215462116 --- Diff: src/ports/postgres/modules/knn/knn.py_in --- @@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source, point_column_name

[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 I'm not sure what this is doing: ``` %%sql DROP TABLE IF EXISTS knn_result_classification; SELECT * FROM madlib.knn( 'knn_train_data', -- Table

[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 load data: ``` DROP TABLE IF EXISTS knn_train_data; CREATE TABLE knn_train_data ( id integer, data integer

[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 (1) expression for test data array: ``` DROP TABLE IF EXISTS knn_result_classification; SELECT * FROM madlib.knn( 'knn_train_data', -- Table

[GitHub] madlib issue #317: Fixed trailing whitespace in many sql_in files

2018-09-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/317 then let's merge it ---

[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 Actually the earlier issue above ^^^ is OK, where I said `I'm not sure what this is doing` because forcing all training data to be a single point means that the distance to all test points

[GitHub] madlib issue #315: JIRA:1060 - Modified KNN to accept expressions in point_c...

2018-09-07 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/315 re-running the failed test, seems to pass now: ``` SELECT * FROM knn_result_list_neighbors ORDER BY id; ``` produces ``` id | data | k_nearest_neighbours

[GitHub] madlib pull request #288: Jira:1239: Converts features from multiple columns...

2018-07-05 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/288#discussion_r200510366 --- Diff: src/ports/postgres/modules/cols_vec/cols2vec.py_in --- @@ -0,0 +1,110 @@ +""" +@file cols2vec.py_in +

[GitHub] madlib issue #287: Fix incorrect dict expansion in table header

2018-07-11 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/287 This latest commit makes the following changes to use docs: 1) clarify cv for SVM and add user examples 2) clarify cv for elastic net and fix user examples 3) correct rmse calc

[GitHub] madlib issue #282: Utilites: Add CTAS while dropping some columns

2018-07-11 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/282 There is a bit of inconsistency related to the last param `cols_to_drop` ``` SELECT madlib.dropcols( 'houses', 'houses_out

[GitHub] madlib issue #288: Jira:1239: Converts features from multiple columns into a...

2018-07-06 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/288 Since we are writing out a summary table, may as well add more info in it. {code} A summary table named _summary is also created at the same time, which has the following columns

[GitHub] madlib issue #288: Jira:1239: Converts features from multiple columns into a...

2018-07-06 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/288 update my comment above to remove the rows processed and skipped. ---

[GitHub] madlib issue #282: Utilites: Add CTAS while dropping some columns

2018-07-11 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/282 looks like user docs lost the params description for dropcols() ---

[GitHub] madlib issue #282: Utilites: Add CTAS while dropping some columns

2018-07-12 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/282 ah, i see. I think it is fine as you have put it. LGTM ---

[GitHub] madlib issue #291: Feature: Vector to Columns

2018-07-12 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/291 user docs seem incomplete ---

[GitHub] madlib issue #223: Balance datasets : re-sampling technique

2018-01-16 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/223 Regarding (2) and (3) above, looks like it does not fail with `'red:7, blue:7'` but the MADlib convention is 'red=7, blue=7' so need to change to use `=`. (4) Seems to take only

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-08 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r173254469 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +545,95 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-08 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r173239594 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +545,95 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-08 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r173238804 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +545,95 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172920581 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -149,8 +149,10 @@ non-stratified, that is, the whole table is treated

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172920825 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172920935 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172921328 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172921714 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib pull request #239: Balance Sample: Add support for grouping

2018-03-07 Thread fmcquillan99
Github user fmcquillan99 commented on a diff in the pull request: https://github.com/apache/madlib/pull/239#discussion_r172922334 --- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in --- @@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name; (25

[GitHub] madlib issue #256: Minibatch Preprocessing: change default buffer size formu...

2018-04-05 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/256 We seem to be computing batch size using master but prob should just consider num segments. ---

[GitHub] madlib pull request #257: mini-batch user docs

2018-04-06 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/257 mini-batch user docs This commit is for the preprocessor user docs. MLP user doc updates to follow in subsequent commit. Can someone please review this content? thx You can

[GitHub] madlib issue #255: MLP: Remove source table dependency for predicting regres...

2018-04-10 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/255 LGTM, see https://issues.apache.org/jira/browse/MADLIB-1223 for tests i ran ---

[GitHub] madlib issue #256: Minibatch Preprocessing: change default buffer size formu...

2018-04-10 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/256 LGTM Default selection looks reasonable: (0) data DROP TABLE IF EXISTS iris_data; CREATE TABLE iris_data( id serial, attributes numeric

[GitHub] madlib issue #256: Minibatch Preprocessing: change default buffer size formu...

2018-04-06 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/256 Oh I see, with the averaging approach: buffer_size = avg_num_rows_per_segment / num_segments = 21.5 / 2 = 10.75 and rounding up

[GitHub] madlib issue #250: MLP: Allow one-hot encoded dependent var for classificati...

2018-04-04 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/250 See JIRA https://issues.apache.org/jira/browse/MADLIB-1222 for examples showing this works for IGD and mini-batch LGTM I think u can go ahead and merge this PR to master ---

[GitHub] madlib issue #256: Minibatch Preprocessing: change default buffer size formu...

2018-04-06 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/256 Is this expected behavior? last group for NJ gets only 1 observation ``` DROP TABLE IF EXISTS iris_data; CREATE TABLE iris_data( id serial, attributes numeric

[GitHub] madlib issue #251: MLP: Simplify initialization of model coefficients

2018-04-04 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/251 Using the data set from http://madlib.apache.org/docs/latest/group__grp__nn.html#example the warm start seems to be functioning OK in the sense that it is picking up where it left off

[GitHub] madlib issue #257: mini-batch user docs

2018-04-15 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/257 Main changes: 1) Updated minibatch docs to show use of encoding scalar integer dep var 2) Added minibatch examples and explanations to MLP 3) Reduced the number of redundant

[GitHub] madlib pull request #264: updated pagerank docs for PPR, minor formating and...

2018-04-17 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/264 updated pagerank docs for PPR, minor formating and such 1) minor formatting improvements 2) added reference for PPR and changed PR reference to paper and not wikipedia You can merge

[GitHub] madlib issue #257: mini-batch user docs

2018-04-17 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/257 OK done now ---

[GitHub] madlib issue #263: Bugfix/mlp minibatch grouping

2018-04-17 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/263 I tested this quite a bit and it seems to work nicely for me. LGTM ---

[GitHub] madlib issue #257: mini-batch user docs

2018-04-17 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/257 Main changes: 4. Clarified grouping as per https://github.com/apache/madlib/pull/263 This is final change so you can review and merge if it looks good. ---

[GitHub] madlib pull request #252: leftover minor RF user doc update

2018-03-28 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/252 leftover minor RF user doc update A few remaining RF user doc changes I missed in https://github.com/apache/madlib/commit/7f3aae92f2d84bf7e4501ac5efec1ebfc7a80834 Also added

[GitHub] madlib issue #249: RF: Use NULL::integer[] when no continuous features

2018-03-26 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/249 See https://issues.apache.org/jira/browse/MADLIB-1219 for results from my tests. LGTM ---

[GitHub] madlib issue #246: DT and RF user doc updates

2018-03-26 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/246 https://issues.apache.org/jira/browse/MADLIB-1217 https://issues.apache.org/jira/browse/MADLIB-1218 https://issues.apache.org/jira/browse/MADLIB-1219 have all been fixed so I made

[GitHub] madlib pull request #246: DT user doc updates

2018-03-20 Thread fmcquillan99
GitHub user fmcquillan99 opened a pull request: https://github.com/apache/madlib/pull/246 DT user doc updates @rahiyer please review DT user doc updates Will start working on RF in parallel. You can merge this pull request into a Git repository by running: $ git pull

[GitHub] madlib issue #242: PCA: Fix issue with text grouping col input

2018-03-21 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/242 LGTM, this can be merged ---

[GitHub] madlib issue #246: DT user doc updates

2018-03-22 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/246 @rahiyer RF docs ready for review too. ---

[GitHub] madlib issue #248: DT: Ensure proper quoting in grouping coalesce

2018-03-23 Thread fmcquillan99
Github user fmcquillan99 commented on the issue: https://github.com/apache/madlib/pull/248 I checked against the examples in JIRA: MADLIB-1217 JIRA: MADLIB-1218 and both work OK for me. So from the fix to the functionality perspective, LGTM. Other

  1   2   >