Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/186
Please replace "portid (your platform)" with actual values in the message.
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/75
Update on this PR since it has been open for awhile. This was some good
work by mktal to build a multi-class svm module.
The issue with the PR is that the mini-batching is embedded
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/189
Ed, do you mind making the same changes in these 2 modules:
http://madlib.apache.org/docs/latest/group__grp__path.html#literature
http://madlib.apache.org/docs/latest
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/189
LGTM
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/incubator-madlib-site/pull/9
design.pdf only displays the top 150 pages in the github viewer, can you
please confirm the whole document is there?
---
If your project is set up for it, you can reply
Github user fmcquillan99 commented on the issue:
https://github.com/apache/incubator-madlib-site/pull/6
see Ed's comment above
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user fmcquillan99 closed the pull request at:
https://github.com/apache/incubator-madlib-site/pull/6
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/191
LGTM based on some testing and docs review.
I would ask other community folks to pls review code in more detail however.
---
Github user fmcquillan99 closed the pull request at:
https://github.com/apache/madlib/pull/205
---
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/205#discussion_r152636816
--- Diff: src/ports/postgres/modules/regress/linear.sql_in ---
@@ -183,16 +183,15 @@ FROM (
@par Prediction Function
The prediction function
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/204
@hpandeycodeit we would like to get this PR in the upcoming 1.13 release.
Are you planning to do additional work as per the comments above?
---
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/205#discussion_r152645105
--- Diff: src/ports/postgres/modules/graph/hits.sql_in ---
@@ -103,18 +102,18 @@ It is named by adding the suffix '_summary' to the
'out_table
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/208
correct knn user docs
mostly corrections related to distance function explanations
You can merge this pull request into a Git repository by running:
$ git pull https://github.com
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/206
The user doc explanation/caution looks good on mean imputation, thanks for
adding
LGTM
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/217
LGTM
Until we fix https://issues.apache.org/jira/browse/MADLIB-1185 we cannot
claim full postgres 10 support, so I think these release notes are accurate
---
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib-site/pull/10
1dot13 website updates
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/fmcquillan99/incubator-madlib-site
website-1dot13
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/199
Did this pass previous IC and functional tests with madlib as the schema?
Also, is it possible to do a global search to see if we have done this in
other modules too?
---
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/205
hits and lin regr doc updates
minor updates to graph HITS algo docs
fixed order of params in lin regr prediction docs
You can merge this pull request into a Git repository by running
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/212
update PyXB version in README.md
minor version update for PyXB from 1.2.4 to 1.2.6 in README.md
You can merge this pull request into a Git repository by running:
$ git pull https
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/212#discussion_r155685205
--- Diff: README.md ---
@@ -11,9 +11,11 @@ Installation and Contribution
==
See the project website [`MADlib Home
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/211
naming convention looks good.
I assume for prod releases there is no "_dev" in the name.
---
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/209
add grouping predict e.g. for lin reg
suggestion from user to add this example
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/fmcquillan99
Github user fmcquillan99 closed the pull request at:
https://github.com/apache/madlib-site/pull/3
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/267
There is some reference to HAWQ in
https://github.com/apache/madlib/blob/master/ReadMe_Build.txt
which I donât see removed in the PR.
Otherwise seems OK though I did not do
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/269
(1)
```
DROP TABLE IF EXISTS example_data_output, example_data_output_summary;
SELECT madlib.correlation( 'example_data',
'example_data_output
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/269
Thanks for the explanation.
I pushed one additional small commit that changes the name of the module
from "Pearson's Correlation" to "Covariance and Correlation&q
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/223
Started testing, some early observations:
(1)
class_size default should be âuniformâ, it seems to be set to
âundersampleâ currently
(2)
`
SELECT
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/223
Can you please double check that install checks are robust with respect to
different Python rounding on different hardware?
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/231
Does this mean, then, that all var importance values are >= 0 now, and that
the largest positive value corresponds to the most "important" variable?
Also, what is the rang
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/235
update KNN, DT and RF docs to match recent commits
KNN
* describe weighted average in more detail
DT & RF
* correct some doc errors and omissions
* update example to
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/234
```
DROP TABLE IF EXISTS abalone_out, abalone_out_dictionary;
SELECT madlib.encode_categorical_variables (
'abalone', -- Source table
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/234
was just testing 1.13 on postgres 9.6 and found this error
```
DROP TABLE IF EXISTS abalone_out, abalone_out_dictionary;
SELECT madlib.encode_categorical_variables
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/238
LGTM
---
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/235#discussion_r168557191
--- Diff:
src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in ---
@@ -208,13 +208,26 @@ forest_train(training_table_name
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/234
Similarly
```
DROP TABLE IF EXISTS abalone_out, abalone_out_dictionary;
SELECT madlib.encode_categorical_variables (
'abalone', -- Source table
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/222
minor update to summary() user docs
to finish off
https://issues.apache.org/jira/browse/MADLIB-1167
You can merge this pull request into a Git repository by running:
$ git pull https
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/298#discussion_r205310071
--- Diff: doc/mainpage.dox.in ---
@@ -100,13 +86,14 @@ complete matrix stored as a distributed table.
@defgroup grp_matrix Matrix
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/298
This should be ready to merge if if looks OK. I don't have any other 1.15
doc related items to deliver.
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/291
Where did we land on the boolean casting issue? Testing on Greenplum 5, I
see:
```
(psycopg2.ProgrammingError) plpy.SPIError: ARRAY types boolean and text
cannot be matched
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/291
thanks, that makes sense.
I added a type casting example to the user docs.
LGTM
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/295
LGTM, here is an RF example:
```
SELECT * FROM mt_imp_output ORDER BY am, oob_var_importance DESC;
am | feature | oob_var_importance | impurity_var_importance
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/308
LGTM
---
Github user fmcquillan99 commented on the pull request:
https://github.com/apache/madlib/commit/5e707f745c50343dd7395a3e8f86c04428210977#commitcomment-30142753
Also fixed some spacing issues
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/314
Thanks @njayaram2 for the clarification.
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/314
So this requires Alien, but we do not automatically download or bundle
Alien, correct?
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/291
(1)
Now I think it is casting all numeric to DOUBLE and all non-numeric to TEXT?
But if all the columns are INT, should not cast them to DOUBLE, rather
should create an array of INTs
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/295
I like this last suggestion from @iyerr3, that we report raw values for oob
and impurity VI in the model output file. (OK to keep the shifted oob > 0 as
we do now.)
For the hel
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/295
Should impurity_var_importance always add up to 100?
From the regression example in the user docs:
```
DROP TABLE IF EXISTS mt_imp_output;
SELECT madlib.get_var_importance
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/295
Another run I got
```
grp 0 grp1
31.01364943 31.6576
22.85881741
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/289
```
The model table produced by the training function contains the following
columns:
gid INTEGER. Group id that uniquely identifies a set of grouping column
values.
sample_id
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/291
In cols2vec and vec2cols, ordering has been fixed so new columns are always
on the right of the source table columns in the output (if any).
In cols2vec, casting seems OK now. I tested
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/298
misc 1.15 user doc updates
Added descriptions to left panel for modules that were missing.
Fixed types and formatting in various places.
Cleaned up main use doc page and removed links
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/291
In cols2vec,
For this table:
```
CREATE TABLE golf (
id integer NOT NULL,
"OUTLOOK" text,
temperature double precision,
humidity double
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/291
After the above 2 issues I mentioned are fixed, I will have 1 more commit
on user docs to this PR
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/313
is this ready to merge?
---
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/315#discussion_r215462116
--- Diff: src/ports/postgres/modules/knn/knn.py_in ---
@@ -53,22 +55,12 @@ def knn_validate_src(schema_madlib, point_source,
point_column_name
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/315
I'm not sure what this is doing:
```
%%sql
DROP TABLE IF EXISTS knn_result_classification;
SELECT * FROM madlib.knn(
'knn_train_data', -- Table
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/315
load data:
```
DROP TABLE IF EXISTS knn_train_data;
CREATE TABLE knn_train_data (
id integer,
data integer
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/315
(1)
expression for test data array:
```
DROP TABLE IF EXISTS knn_result_classification;
SELECT * FROM madlib.knn(
'knn_train_data', -- Table
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/317
then let's merge it
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/315
Actually the earlier issue above ^^^ is OK, where I said `I'm not sure what
this is doing` because forcing all training data to be a single point means
that the distance to all test points
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/315
re-running the failed test, seems to pass now:
```
SELECT * FROM knn_result_list_neighbors ORDER BY id;
```
produces
```
id | data | k_nearest_neighbours
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/288#discussion_r200510366
--- Diff: src/ports/postgres/modules/cols_vec/cols2vec.py_in ---
@@ -0,0 +1,110 @@
+"""
+@file cols2vec.py_in
+
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/287
This latest commit makes the following changes to use docs:
1) clarify cv for SVM and add user examples
2) clarify cv for elastic net and fix user examples
3) correct rmse calc
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/282
There is a bit of inconsistency related to the last param `cols_to_drop`
```
SELECT madlib.dropcols(
'houses',
'houses_out
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/288
Since we are writing out a summary table, may as well add more info in it.
{code}
A summary table named _summary is also created at the same time,
which has the following columns
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/288
update my comment above to remove the rows processed and skipped.
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/282
looks like user docs lost the params description for dropcols()
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/282
ah, i see. I think it is fine as you have put it.
LGTM
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/291
user docs seem incomplete
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/223
Regarding (2) and (3) above, looks like it does not fail with `'red:7,
blue:7'` but the MADlib convention is 'red=7, blue=7' so need to change to use
`=`.
(4)
Seems to take only
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r173254469
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -543,6 +545,95 @@ SELECT * FROM output_table ORDER BY mainhue, name;
(25
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r173239594
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -543,6 +545,95 @@ SELECT * FROM output_table ORDER BY mainhue, name;
(25
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r173238804
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -543,6 +545,95 @@ SELECT * FROM output_table ORDER BY mainhue, name;
(25
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r172920581
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -149,8 +149,10 @@ non-stratified, that is, the whole table is treated
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r172920825
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name;
(25
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r172920935
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name;
(25
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r172921328
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name;
(25
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r172921714
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name;
(25
Github user fmcquillan99 commented on a diff in the pull request:
https://github.com/apache/madlib/pull/239#discussion_r172922334
--- Diff: src/ports/postgres/modules/sample/balance_sample.sql_in ---
@@ -543,6 +544,90 @@ SELECT * FROM output_table ORDER BY mainhue, name;
(25
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/256
We seem to be computing batch size using master but prob should just
consider num segments.
---
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/257
mini-batch user docs
This commit is for the preprocessor user docs.
MLP user doc updates to follow in subsequent commit.
Can someone please review this content? thx
You can
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/255
LGTM, see https://issues.apache.org/jira/browse/MADLIB-1223 for tests i ran
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/256
LGTM
Default selection looks reasonable:
(0) data
DROP TABLE IF EXISTS iris_data;
CREATE TABLE iris_data(
id serial,
attributes numeric
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/256
Oh I see, with the averaging approach:
buffer_size = avg_num_rows_per_segment / num_segments
= 21.5 / 2
= 10.75
and rounding up
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/250
See JIRA https://issues.apache.org/jira/browse/MADLIB-1222 for examples
showing this works for IGD and mini-batch
LGTM
I think u can go ahead and merge this PR to master
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/256
Is this expected behavior? last group for NJ gets only 1 observation
```
DROP TABLE IF EXISTS iris_data;
CREATE TABLE iris_data(
id serial,
attributes numeric
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/251
Using the data set from
http://madlib.apache.org/docs/latest/group__grp__nn.html#example
the warm start seems to be functioning OK in the sense that it is picking
up where it left off
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/257
Main changes:
1) Updated minibatch docs to show use of encoding scalar integer dep var
2) Added minibatch examples and explanations to MLP
3) Reduced the number of redundant
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/264
updated pagerank docs for PPR, minor formating and such
1) minor formatting improvements
2) added reference for PPR and changed PR reference to paper and not
wikipedia
You can merge
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/257
OK done now
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/263
I tested this quite a bit and it seems to work nicely for me.
LGTM
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/257
Main changes:
4. Clarified grouping as per
https://github.com/apache/madlib/pull/263
This is final change so you can review and merge if it looks good.
---
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/252
leftover minor RF user doc update
A few remaining RF user doc changes I missed in
https://github.com/apache/madlib/commit/7f3aae92f2d84bf7e4501ac5efec1ebfc7a80834
Also added
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/249
See https://issues.apache.org/jira/browse/MADLIB-1219 for results from my
tests.
LGTM
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/246
https://issues.apache.org/jira/browse/MADLIB-1217
https://issues.apache.org/jira/browse/MADLIB-1218
https://issues.apache.org/jira/browse/MADLIB-1219
have all been fixed so I made
GitHub user fmcquillan99 opened a pull request:
https://github.com/apache/madlib/pull/246
DT user doc updates
@rahiyer please review DT user doc updates
Will start working on RF in parallel.
You can merge this pull request into a Git repository by running:
$ git pull
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/242
LGTM, this can be merged
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/246
@rahiyer RF docs ready for review too.
---
Github user fmcquillan99 commented on the issue:
https://github.com/apache/madlib/pull/248
I checked against the examples in
JIRA: MADLIB-1217
JIRA: MADLIB-1218
and both work OK for me.
So from the fix to the functionality perspective, LGTM.
Other
1 - 100 of 107 matches
Mail list logo