[madlib] branch master updated: K-NN: Add kd-tree method for approximate knn
This is an automated email from the ASF dual-hosted git repository. riyer pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/madlib.git The following commit(s) were added to refs/heads/master by this push: new 5e601fb K-NN: Add kd-tree method for approximate knn 5e601fb is described below commit 5e601fbdb4c6423c148f8bdfead0a9988f31800d Author: Orhan Kislal AuthorDate: Wed Feb 20 16:33:46 2019 -0800 K-NN: Add kd-tree method for approximate knn JIRA: MADLIB-1061 This commit adds a kd-tree option to the 'knn' function. A kd-tree is used to reduce the search space to find nearest neighbors. The method implemented here does not produce the complete kd-tree, instead it allows the user to specify a maximum depth for the binary tree. Additional changes: - Add function to clean madlib views - Move k-nn out of 'Early Stage Development' Closes #352 Co-authored-by: Rahul Iyer Co-authored-by: Frank McQuillan --- doc/design/design.tex | 1 + doc/design/figures/2d_kdtree.pdf | Bin 0 -> 10652 bytes doc/design/modules/knn.tex | 146 +++ doc/literature.bib | 11 + doc/mainpage.dox.in| 2 +- src/ports/postgres/modules/knn/knn.py_in | 480 + src/ports/postgres/modules/knn/knn.sql_in | 249 +-- src/ports/postgres/modules/knn/test/knn.sql_in | 287 +--- src/ports/postgres/modules/utilities/admin.py_in | 22 + .../postgres/modules/utilities/utilities.py_in | 1 - .../postgres/modules/utilities/utilities.sql_in| 8 + 11 files changed, 1033 insertions(+), 174 deletions(-) diff --git a/doc/design/design.tex b/doc/design/design.tex index e9ed7b8..6772f89 100644 --- a/doc/design/design.tex +++ b/doc/design/design.tex @@ -231,6 +231,7 @@ \input{modules/SVM} \input{modules/graph} \input{modules/neural-network} +\input{modules/knn} \printbibliography \end{document} diff --git a/doc/design/figures/2d_kdtree.pdf b/doc/design/figures/2d_kdtree.pdf new file mode 100644 index 000..062ae23 Binary files /dev/null and b/doc/design/figures/2d_kdtree.pdf differ diff --git a/doc/design/modules/knn.tex b/doc/design/modules/knn.tex new file mode 100644 index 000..71af411 --- /dev/null +++ b/doc/design/modules/knn.tex @@ -0,0 +1,146 @@ +% Licensed to the Apache Software Foundation (ASF) under one +% or more contributor license agreements. See the NOTICE file +% distributed with this work for additional information +% regarding copyright ownership. The ASF licenses this file +% to you under the Apache License, Version 2.0 (the +% "License"); you may not use this file except in compliance +% with the License. You may obtain a copy of the License at + +% http://www.apache.org/licenses/LICENSE-2.0 + +% Unless required by applicable law or agreed to in writing, +% software distributed under the License is distributed on an +% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +% KIND, either express or implied. See the License for the +% specific language governing permissions and limitations +% under the License. + +!TEX root = ../design.tex + + +\chapter[k Nearest Neighbors]{k Nearest Neighbors} + +\begin{moduleinfo} +\item[Authors] \href{mailto:okis...@pivotal.io}{Orhan Kislal} + +\item[History] + \begin{modulehistory} + \item[v0.1] Initial version: knn and kd-tree. + \end{modulehistory} +\end{moduleinfo} + + +% Abstract. What is the problem we want to solve? +\section{Introduction} % (fold) +\label{sec:knn_introduction} + +\emph{Some notes and figures in this section are borrowed from \cite{medium_knn} and \cite{point_knn}}. + +K-nearest neighbors (KNN) is one of the most commonly used learning +algorithms. The goal of knn is to find a number (k) of training data points +closest to the test point. These neighbors can be used to predict labels via +classification or regression. + +KNN does not have a training phase like the most of learning techniques. It +does not create a model to generalize the data, instead the algorithm uses the +whole training dataset (or a specific subset of it). + +KNN can be used for classification, the output is a class membership (a +discrete value). An object is classified by a majority vote of its neighbors, +with the object being assigned to the class most common among its k nearest +neighbors. It can also be used for regression, output is the value for the +object (predicts continuous values). This value is the average (or median) of +the values of its k nearest neighbors. + +\section{Implementation Details} + +The basic KNN implementation depends on the table join between the training dataset and the test dataset. + +\begin{sql} + (SELECT test_id, +train_id, +fn_dist(train_
[madlib] branch mini-batch-dl-v1 deleted (was 4671a0a)
This is an automated email from the ASF dual-hosted git repository. riyer pushed a change to branch mini-batch-dl-v1 in repository https://gitbox.apache.org/repos/asf/madlib.git. was 4671a0a mini-batch preprocessor for image user doc improvements This change permanently discards the following revisions: discard 4671a0a mini-batch preprocessor for image user doc improvements
[madlib] branch master updated: Encode categorical: Add BIGINT as valid categorical type
This is an automated email from the ASF dual-hosted git repository. riyer pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/madlib.git The following commit(s) were added to refs/heads/master by this push: new 7c3c1a3 Encode categorical: Add BIGINT as valid categorical type 7c3c1a3 is described below commit 7c3c1a35ab921f2401df4684ab6d48a14fa51b2d Author: Rahul Iyer AuthorDate: Fri Jan 18 14:52:28 2019 -0800 Encode categorical: Add BIGINT as valid categorical type JIRA: MADLIB-1295 --- src/ports/postgres/modules/utilities/encode_categorical.py_in | 2 +- src/ports/postgres/modules/utilities/test/encode_categorical.sql_in | 5 - 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/src/ports/postgres/modules/utilities/encode_categorical.py_in b/src/ports/postgres/modules/utilities/encode_categorical.py_in index cd08012..8695a73 100644 --- a/src/ports/postgres/modules/utilities/encode_categorical.py_in +++ b/src/ports/postgres/modules/utilities/encode_categorical.py_in @@ -396,7 +396,7 @@ class CategoricalEncoder(object): self._all_cols_types = get_cols_and_types(self.source_table) # any column belonging to the following types are considered categorical -int_types = ['integer', 'smallint'] +int_types = ['integer', 'smallint', 'bigint'] text_types = ['text', 'varchar', 'character varying', 'char', 'character'] boolean_types = ['boolean'] self._cat_types = set(int_types + text_types + boolean_types) diff --git a/src/ports/postgres/modules/utilities/test/encode_categorical.sql_in b/src/ports/postgres/modules/utilities/test/encode_categorical.sql_in index 7dc6169..f7addc8 100644 --- a/src/ports/postgres/modules/utilities/test/encode_categorical.sql_in +++ b/src/ports/postgres/modules/utilities/test/encode_categorical.sql_in @@ -109,7 +109,7 @@ CREATE TABLE abalone_special_char ( "len$$'%*()gth" double precision, diameter double precision, height double precision, -"ClaЖss" integer +"ClaЖss" bigint ); COPY abalone_special_char ("se$$''x", "len$$'%*()gth", diameter, height, "ClaЖss") FROM stdin WITH DELIMITER '|' NULL as '@'; F"F|0.475|0.37|0.125|2 @@ -121,6 +121,9 @@ M,M|0.47|0.355|0.100|1 'F'F'|0.55|0.44|0.15|0 \. +select encode_categorical_variables('abalone_special_char', 'abalone_special_char_out0', '*'); +select * from abalone_special_char_out0; + select encode_categorical_variables('abalone_special_char', 'abalone_special_char_out1', '"se$$x", "len$$''%*()gth"'); select * from abalone_special_char_out1;
madlib git commit: Allocator: Remove 16-byte alignment in GPDB 6
Repository: madlib Updated Branches: refs/heads/master 3540a5603 -> d62e5516b Allocator: Remove 16-byte alignment in GPDB 6 Findings: 1. MADlib performs a 16-byte alignment for pointers returned by palloc. 2. Postgres prepends a small (16 byte usually) header before every pointer which includes a. the memory context and b. the size of the memory allocation. 3. Greenplum 6+ tweaks that scheme a little: instead of the memory context, the header tracks a "shared header" which points to another struct with richer information (aside from the memory context). 4. Postgres calls MemoryContextContains both with the final func for an aggregate and final function for a windowed aggregate. 5. Currently Postgres always concludes that the datum from MADlib is allocated outside of the context and makes an extra copy. In Greenplum, MemoryContextContains needs to dereference the shared header. This is a problem since the pointer has been shifted and the function is getting a bad header. In this commit, we disable the pointer alignment for GPDB 6+ to avoid failure in this check. Further, we also have to disable vectorization in Eigen since it does not work when pointers are not 16-byte aligned. Closes #319 Co-authored-by: Jesse Zhang Co-authored-by: Nandish Jayaram Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/d62e5516 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/d62e5516 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/d62e5516 Branch: refs/heads/master Commit: d62e5516bc6741beee18678da1b9b3e6cc95cdcf Parents: 3540a56 Author: Rahul Iyer Authored: Wed Sep 12 16:59:59 2018 -0700 Committer: Rahul Iyer Committed: Tue Sep 18 11:46:45 2018 -0700 -- src/ports/greenplum/dbconnector/dbconnector.hpp | 17 + src/ports/postgres/dbconnector/Allocator_impl.hpp | 10 +- 2 files changed, 22 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/d62e5516/src/ports/greenplum/dbconnector/dbconnector.hpp -- diff --git a/src/ports/greenplum/dbconnector/dbconnector.hpp b/src/ports/greenplum/dbconnector/dbconnector.hpp index 9c38ef6..d06b154 100644 --- a/src/ports/greenplum/dbconnector/dbconnector.hpp +++ b/src/ports/greenplum/dbconnector/dbconnector.hpp @@ -32,6 +32,23 @@ extern "C" { #include "Compatibility.hpp" +#if GP_VERSION_NUM >= 6 +// MADlib aligns the pointers returned by palloc() to 16-byte boundaries +// (see Allocator_impl.hpp). This is done to allow Eigen vectorization (see +// http://eigen.tuxfamily.org/index.php?title=FAQ#Vectorization for more +// info). This vectorization has to be explicitly disabled if pointers are +// not 16-byte aligned. Further, the pointer realignment invalidates a +// header that palloc creates just prior to the pointer address. Greenplum +// after commit f62bd1c fails due to this invalid header. Hence, the +// pointer realignment and Eigen vectorization is disabled below for +// Greenplum 6 and above. + +// See http://eigen.tuxfamily.org/dox/group__TopicUnalignedArrayAssert.html +// for steps to disable vectorization +#define EIGEN_DONT_VECTORIZE +#define EIGEN_DISABLE_UNALIGNED_ARRAY_ASSERT +#endif + #include "../../postgres/dbconnector/dbconnector.hpp" #endif // defined(MADLIB_GREENPLUM_DBCONNECTOR_HPP) http://git-wip-us.apache.org/repos/asf/madlib/blob/d62e5516/src/ports/postgres/dbconnector/Allocator_impl.hpp -- diff --git a/src/ports/postgres/dbconnector/Allocator_impl.hpp b/src/ports/postgres/dbconnector/Allocator_impl.hpp index 4c44207..996117b 100644 --- a/src/ports/postgres/dbconnector/Allocator_impl.hpp +++ b/src/ports/postgres/dbconnector/Allocator_impl.hpp @@ -211,7 +211,7 @@ template inline void * Allocator::internalPalloc(size_t inSize) const { -#if MAXIMUM_ALIGNOF >= 16 +#if MAXIMUM_ALIGNOF >= 16 || defined EIGEN_DONT_VECTORIZE return (ZM == dbal::DoZero) ? palloc0(inSize) : palloc(inSize); #else if (inSize > std::numeric_limits::max() - 16) @@ -221,7 +221,7 @@ Allocator::internalPalloc(size_t inSize) const { const size_t size = inSize + 16; void *raw = (ZM == dbal::DoZero) ? palloc0(size) : palloc(size); return makeAligned(raw); -#endif +#endif // MAXIMUM_ALIGNOF >= 16 } /** @@ -243,7 +243,7 @@ template inline void * Allocator::internalRePalloc(void *inPtr, size_t inSize) const { -#if MAXIMUM_ALIGNOF >= 16 +#if MAXIMUM_ALIGNOF >= 16 || defined EIGEN_DONT_VECTORIZE return repalloc(inPtr, inSize); #else if (inSize > std::numeric_limits::max() - 16) { @@ -262,7 +262,7 @@ Allocator::internalRePalloc(void *inPtr, size_t
madlib git commit: Control: Add minor comments to context managers
Repository: madlib Updated Branches: refs/heads/master 85d09e675 -> 2cde01d1f Control: Add minor comments to context managers Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/2cde01d1 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/2cde01d1 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/2cde01d1 Branch: refs/heads/master Commit: 2cde01d1ff011c47a1e6f03007e0ada5395617f4 Parents: 85d09e6 Author: Rahul Iyer Authored: Thu Sep 13 14:43:09 2018 -0700 Committer: Rahul Iyer Committed: Thu Sep 13 14:43:13 2018 -0700 -- src/ports/postgres/modules/utilities/control.py_in | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/2cde01d1/src/ports/postgres/modules/utilities/control.py_in -- diff --git a/src/ports/postgres/modules/utilities/control.py_in b/src/ports/postgres/modules/utilities/control.py_in index 7900086..d147103 100644 --- a/src/ports/postgres/modules/utilities/control.py_in +++ b/src/ports/postgres/modules/utilities/control.py_in @@ -100,6 +100,10 @@ class HashaggControl(ContextDecorator): """ @brief: A wrapper that enables/disables the hashagg and then sets it back to the original value on exit + +This context manager should be used at the top-level and any exception +raised from this should be re-raised (if caught) to ensure the transaction +does not commit. """ def __init__(self, enable=True): @@ -134,6 +138,10 @@ class MinWarning(ContextDecorator): """ @brief A wrapper for setting the level of logs going into client + +This context manager should be used at the top-level and any exception +raised from this should be re-raised (if caught) to ensure the transaction +does not commit. """ def __init__(self, warningLevel='error'): @@ -163,6 +171,10 @@ class AOControl(ContextDecorator): """ @brief: A wrapper that enables/disables the AO storage option + +This context manager should be used at the top-level and any exception +raised from this should be re-raised (if caught) to ensure the transaction +does not commit. """ def __init__(self, enable=False): @@ -192,7 +204,7 @@ class AOControl(ContextDecorator): "show gp_default_storage_options")[0]["gp_default_storage_options"] self._parse_gp_default_storage_options(_storage_options_str) -# Set APPENDONLY=False after backing up existing value +# Set APPENDONLY= after backing up existing value self.was_ao_enabled = self.storage_options_dict['appendonly'] self.storage_options_dict['appendonly'] = self.to_enable plpy.execute("set gp_default_storage_options={0}".
[1/2] madlib git commit: Build: Disable AppendOnly if available
Repository: madlib Updated Branches: refs/heads/master b76a08344 -> 3db98babe http://git-wip-us.apache.org/repos/asf/madlib/blob/3db98bab/src/ports/postgres/modules/stats/pred_metrics.sql_in -- diff --git a/src/ports/postgres/modules/stats/pred_metrics.sql_in b/src/ports/postgres/modules/stats/pred_metrics.sql_in index 3f62746..32de9a9 100644 --- a/src/ports/postgres/modules/stats/pred_metrics.sql_in +++ b/src/ports/postgres/modules/stats/pred_metrics.sql_in @@ -411,8 +411,9 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.mean_abs_error( ) RETURNS VOID AS $$ PythonFunctionBodyOnly(`stats', `pred_metrics') -return pred_metrics.mean_abs_error( -table_in, table_out, prediction_col, observed_col, grouping_cols) +with AOControl(False): +return pred_metrics.mean_abs_error( +table_in, table_out, prediction_col, observed_col, grouping_cols) $$ LANGUAGE plpythonu m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `'); @@ -430,8 +431,9 @@ m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `'); CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.mean_abs_error(message TEXT) RETURNS TEXT AS $$ PythonFunctionBodyOnly(`stats', `pred_metrics') -return pred_metrics.metric_agg_help_msg(schema_madlib, message, -'mean_abs_error') +with AOControl(False): +return pred_metrics.metric_agg_help_msg(schema_madlib, message, +'mean_abs_error') $$ language plpythonu m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `CONTAINS SQL', `'); @@ -463,8 +465,9 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.mean_abs_perc_error( ) RETURNS VOID AS $$ PythonFunctionBodyOnly(`stats', `pred_metrics') -return pred_metrics.mean_abs_perc_error( -table_in, table_out, prediction_col, observed_col, grouping_cols) +with AOControl(False): +return pred_metrics.mean_abs_perc_error( +table_in, table_out, prediction_col, observed_col, grouping_cols) $$ LANGUAGE plpythonu m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `'); @@ -482,8 +485,9 @@ m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `'); CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.mean_abs_perc_error(message TEXT) RETURNS TEXT AS $$ PythonFunctionBodyOnly(`stats', `pred_metrics') -return pred_metrics.metric_agg_help_msg(schema_madlib, message, -'mean_abs_perc_error') +with AOControl(False): +return pred_metrics.metric_agg_help_msg(schema_madlib, message, +'mean_abs_perc_error') $$ language plpythonu m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `CONTAINS SQL', `'); @@ -515,8 +519,9 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.mean_perc_error( ) RETURNS VOID AS $$ PythonFunctionBodyOnly(`stats', `pred_metrics') -return pred_metrics.mean_perc_error( - table_in, table_out, prediction_col, observed_col, grouping_cols) +with AOControl(False): +return pred_metrics.mean_perc_error( +table_in, table_out, prediction_col, observed_col, grouping_cols) $$ LANGUAGE plpythonu m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `'); @@ -534,8 +539,9 @@ m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `'); CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.mean_perc_error(message TEXT) RETURNS TEXT AS $$ PythonFunctionBodyOnly(`stats', `pred_metrics') -return pred_metrics.metric_agg_help_msg(schema_madlib, message, -'mean_perc_error') +with AOControl(False): +return pred_metrics.metric_agg_help_msg(schema_madlib, message, +'mean_perc_error') $$ language plpythonu m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `CONTAINS SQL', `'); @@ -567,8 +573,9 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.mean_squared_error( ) RETURNS VOID AS $$ PythonFunctionBodyOnly(`stats', `pred_metrics') -return pred_metrics.mean_squared_error( - table_in, table_out, prediction_col, observed_col, grouping_cols) +with AOControl(False): +return pred_metrics.mean_squared_error( +table_in, table_out, prediction_col, observed_col, grouping_cols) $$ LANGUAGE plpythonu m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `'); @@ -586,8 +593,9 @@ m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `'); CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.mean_squared_error(message TEXT) RETURNS TEXT AS $$ PythonFunctionBodyOnly(`stats', `pred_metrics') -return pred_metrics.metric_agg_help_msg(schema_madlib, message, -'mean_squared_error') +with AOControl(False): +return pred_metrics.metric_agg_help_msg(schema_madlib, message, +
[2/2] madlib git commit: Build: Disable AppendOnly if available
Build: Disable AppendOnly if available JIRA: MADLIB-1171 Greenplum provides an Append-optimized table storage that does not allow UPDATE and DELETE. MADlib model tables are small enough that they won't see a big benefit of using AO instead of Heap tables. This commit ensures that APPENDONLY=False during MADlib function call (the GUC is reset back to original value during exit). For cases where we recreate the data table (standardization, redistribution, etc), we have to explicitly add an 'APPENDONLY=true' to see the AO benefits. Closes #316 Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/3db98bab Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/3db98bab Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/3db98bab Branch: refs/heads/master Commit: 3db98babe3326fb5e2cd16d0639a2bef264f4b04 Parents: b76a083 Author: Rahul Iyer Authored: Wed Aug 29 16:23:04 2018 -0700 Committer: Rahul Iyer Committed: Thu Sep 13 11:24:22 2018 -0700 -- src/ports/postgres/madpack/SQLCommon.m4_in | 15 +- .../modules/assoc_rules/assoc_rules.sql_in | 80 src/ports/postgres/modules/convex/mlp.sql_in| 72 --- .../modules/convex/utils_regularization.py_in | 129 ++-- .../modules/elastic_net/elastic_net.sql_in | 13 +- src/ports/postgres/modules/knn/knn.py_in| 2 +- src/ports/postgres/modules/knn/knn.sql_in | 36 +--- src/ports/postgres/modules/lda/lda.py_in| 10 +- src/ports/postgres/modules/lda/lda.sql_in | 44 ++-- .../postgres/modules/linalg/matrix_ops.sql_in | 201 +++ src/ports/postgres/modules/linalg/svd.sql_in| 47 +++-- src/ports/postgres/modules/pca/pca.py_in| 10 +- src/ports/postgres/modules/pca/pca.sql_in | 6 +- .../postgres/modules/pca/pca_project.py_in | 4 +- .../recursive_partitioning/decision_tree.sql_in | 50 ++--- .../recursive_partitioning/random_forest.sql_in | 41 ++-- .../postgres/modules/stats/correlation.sql_in | 27 ++- .../modules/stats/cox_prop_hazards.sql_in | 49 ++--- .../postgres/modules/stats/pred_metrics.sql_in | 82 +--- .../postgres/modules/summary/summary.sql_in | 15 +- src/ports/postgres/modules/tsa/arima.sql_in | 25 ++- .../postgres/modules/utilities/cols2vec.sql_in | 8 +- .../postgres/modules/utilities/control.py_in| 55 + .../utilities/minibatch_preprocessing.py_in | 20 +- .../utilities/minibatch_preprocessing.sql_in| 7 +- .../utilities/test/unit_tests/plpy_mock.py_in | 8 + .../test/unit_tests/test_control.py_in | 81 .../modules/utilities/test/utilities.sql_in | 5 +- .../modules/utilities/text_utilities.sql_in | 5 +- 29 files changed, 684 insertions(+), 463 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/3db98bab/src/ports/postgres/madpack/SQLCommon.m4_in -- diff --git a/src/ports/postgres/madpack/SQLCommon.m4_in b/src/ports/postgres/madpack/SQLCommon.m4_in index afc82d2..ffc0c37 100644 --- a/src/ports/postgres/madpack/SQLCommon.m4_in +++ b/src/ports/postgres/madpack/SQLCommon.m4_in @@ -28,14 +28,14 @@ m4_changequote() * RETURNS DOUBLE PRECISION[] * AS $$PythonFunction(regress, logistic, compute_logregr_coef)$$ * LANGUAGE plpythonu VOLATILE; - */ + */ m4_define(, , ) /* @@ -59,14 +61,14 @@ m4_define(, , , ) /* http://git-wip-us.apache.org/repos/asf/madlib/blob/3db98bab/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in -- diff --git a/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in b/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in index 8ee9fcb..ec3c330 100644 --- a/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in +++ b/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in @@ -493,23 +493,19 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.assoc_rules ) RETURNS MADLIB_SCHEMA.assoc_rules_results AS $$ - PythonFunctionBodyOnly(`assoc_rules', `assoc_rules') - -plpy.execute("SET client_min_messages = error;") - -# schema_madlib comes from PythonFunctionBodyOnly -return assoc_rules.assoc_rules( -schema_madlib, -support, -confidence, -tid_col, -item_col, -input_table, -output_schema, -verbose, -max_itemset_size -); +with AOControl(False): +plpy.execute("SET client_min_messages = error;") +# schema_madlib comes from PythonFunctionBodyOnly +return assoc_rules.assoc_rules(schema_madlib, + support, + confidence, +
[3/3] madlib git commit: Multiple: Remove trailing whitespace from all SQL
Multiple: Remove trailing whitespace from all SQL Markup language states that two trailing whitespace should be interpreted as a break line (), which has been implemented by Doxygen 1.8+. This commit removes all such instances since the trailing whitespace is inadvertent in most cases. If a break line is required, then it should be added explicitly (using HTML tag ). Closes #317 Co-authored-by: Domino Valdano Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/35818fa3 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/35818fa3 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/35818fa3 Branch: refs/heads/master Commit: 35818fa395f965191b59ddbfcd1469470f44271b Parents: 92bdf8c Author: Rahul Iyer Authored: Fri Sep 7 15:12:49 2018 -0700 Committer: Rahul Iyer Committed: Fri Sep 7 15:12:49 2018 -0700 -- methods/array_ops/src/pg_gp/array_ops.sql_in| 7 +- src/ports/postgres/modules/bayes/bayes.sql_in | 46 ++--- .../postgres/modules/bayes/test/bayes.sql_in| 114 ++-- .../conjugate_gradient/test/conj_grad.sql_in| 24 +-- src/ports/postgres/modules/convex/mlp.sql_in| 4 +- .../modules/crf/test/crf_test_large.sql_in | 6 +- .../modules/crf/test/crf_train_small.sql_in | 10 +- .../modules/elastic_net/elastic_net.sql_in | 12 +- src/ports/postgres/modules/glm/glm.sql_in | 4 +- src/ports/postgres/modules/glm/ordinal.sql_in | 38 ++-- .../postgres/modules/glm/test/ordinal.sql_in| 12 +- src/ports/postgres/modules/graph/bfs.sql_in | 88 +- src/ports/postgres/modules/graph/hits.sql_in| 36 ++-- .../postgres/modules/graph/pagerank.sql_in | 12 +- src/ports/postgres/modules/graph/wcc.sql_in | 10 +- src/ports/postgres/modules/knn/knn.sql_in | 14 +- src/ports/postgres/modules/lda/lda.sql_in | 172 +-- src/ports/postgres/modules/linalg/svd.sql_in| 62 +++ src/ports/postgres/modules/pca/pca.sql_in | 104 +-- .../postgres/modules/pca/pca_project.sql_in | 54 +++--- .../recursive_partitioning/decision_tree.sql_in | 4 +- .../recursive_partitioning/random_forest.sql_in | 6 +- .../postgres/modules/regress/linear.sql_in | 24 +-- .../postgres/modules/regress/logistic.sql_in| 26 +-- .../modules/regress/test/clustered.sql_in | 8 +- .../postgres/modules/stats/correlation.sql_in | 54 +++--- .../modules/stats/hypothesis_tests.sql_in | 6 +- .../postgres/modules/stats/pred_metrics.sql_in | 8 +- .../postgres/modules/stats/test/f_test.sql_in | 2 +- .../postgres/modules/stats/test/ks_test.sql_in | 2 +- .../postgres/modules/stats/test/mw_test.sql_in | 2 +- .../postgres/modules/stats/test/t_test.sql_in | 4 +- .../postgres/modules/stats/test/wsr_test.sql_in | 2 +- .../postgres/modules/summary/summary.sql_in | 44 ++--- src/ports/postgres/modules/svm/svm.sql_in | 130 +++--- .../modules/tsa/test/arima_train.sql_in | 54 +++--- .../postgres/modules/utilities/cols2vec.sql_in | 12 +- .../postgres/modules/utilities/path.sql_in | 24 +-- .../postgres/modules/utilities/pivot.sql_in | 4 +- .../modules/utilities/sessionize.sql_in | 34 ++-- .../modules/utilities/text_utilities.sql_in | 48 +++--- .../postgres/modules/utilities/utilities.sql_in | 2 +- .../postgres/modules/utilities/vec2cols.sql_in | 42 ++--- 43 files changed, 685 insertions(+), 686 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/35818fa3/methods/array_ops/src/pg_gp/array_ops.sql_in -- diff --git a/methods/array_ops/src/pg_gp/array_ops.sql_in b/methods/array_ops/src/pg_gp/array_ops.sql_in index 3c905ce..e1aa368 100644 --- a/methods/array_ops/src/pg_gp/array_ops.sql_in +++ b/methods/array_ops/src/pg_gp/array_ops.sql_in @@ -275,12 +275,11 @@ m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `NO SQL', `'); * @brief Aggregate, element-wise sum of arrays. It requires that all the values are NON-NULL. Return type is the same as the input type. * * @param x Array x - * @param y Array y - * @returns Sum of x and y. + * @returns Sum of x * */ -DROP AGGREGATE IF EXISTS MADLIB_SCHEMA.sum(anyarray) CASCADE; -CREATE AGGREGATE MADLIB_SCHEMA.sum(anyarray) ( +DROP AGGREGATE IF EXISTS MADLIB_SCHEMA.sum(/* x */ anyarray) CASCADE; +CREATE AGGREGATE MADLIB_SCHEMA.sum(/* x */ anyarray) ( SFUNC = MADLIB_SCHEMA.array_add, STYPE = anyarray m4_ifdef( `__POSTGRESQL__', `', `, PREFUNC = MADLIB_SCHEMA.array_add') http://git-wip-us.apache.org/repos/asf/madlib/blob/35818fa3/src/ports/postgres/modules/bayes/bayes.sql_in -- diff --git
[1/3] madlib git commit: Multiple: Remove trailing whitespace from all SQL
Repository: madlib Updated Branches: refs/heads/master 92bdf8cab -> 35818fa39 http://git-wip-us.apache.org/repos/asf/madlib/blob/35818fa3/src/ports/postgres/modules/tsa/test/arima_train.sql_in -- diff --git a/src/ports/postgres/modules/tsa/test/arima_train.sql_in b/src/ports/postgres/modules/tsa/test/arima_train.sql_in index e1b2919..e6f5bd2 100644 --- a/src/ports/postgres/modules/tsa/test/arima_train.sql_in +++ b/src/ports/postgres/modules/tsa/test/arima_train.sql_in @@ -66,55 +66,55 @@ drop table if exists tsa_out; drop table if exists tsa_out_summary; drop table if exists tsa_out_residual; select arima_train('mini_ts', 'tsa_out', 'id', 'val', NULL, TRUE, ARRAY[1,0,1]); -select assert(relative_error(ar_params, ARRAY[0.685268276058]) < 1e-2, 'ARIMA: wrong ar_params') from tsa_out; -select assert(relative_error(ar_std_errors, ARRAY[0.103996616127]) < 1e-2, 'ARIMA: wrong ar_std_errors') from tsa_out; -select assert(relative_error(ma_params, ARRAY[0.730629026211]) < 1e-2, 'ARIMA: wrong ma_params') from tsa_out; -select assert(relative_error(ma_std_errors, ARRAY[0.0979481470864]) < 1e-2, 'ARIMA: wrong ma_std_errors') from tsa_out; -select assert(relative_error(mean, 38.6009250545) < 1e-2, 'ARIMA: wrong mean') from tsa_out; -select assert(relative_error(mean_std_error, 13.2499230619) < 1e-2, 'ARIMA: wrong mean_std_errors') from tsa_out; -select assert(relative_error(residual_variance, 281.669418496) < 1e-2, 'ARIMA: wrong residual_variance') from tsa_out_summary; -select assert(relative_error(log_likelihood, -207.725973784) < 1e-2, 'ARIMA: wrong log_likelihood') from tsa_out_summary; +select assert(relative_error(ar_params, ARRAY[0.685268276058]) < 1e-2, 'ARIMA: wrong ar_params') from tsa_out; +select assert(relative_error(ar_std_errors, ARRAY[0.103996616127]) < 1e-2, 'ARIMA: wrong ar_std_errors') from tsa_out; +select assert(relative_error(ma_params, ARRAY[0.730629026211]) < 1e-2, 'ARIMA: wrong ma_params') from tsa_out; +select assert(relative_error(ma_std_errors, ARRAY[0.0979481470864]) < 1e-2, 'ARIMA: wrong ma_std_errors') from tsa_out; +select assert(relative_error(mean, 38.6009250545) < 1e-2, 'ARIMA: wrong mean') from tsa_out; +select assert(relative_error(mean_std_error, 13.2499230619) < 1e-2, 'ARIMA: wrong mean_std_errors') from tsa_out; +select assert(relative_error(residual_variance, 281.669418496) < 1e-2, 'ARIMA: wrong residual_variance') from tsa_out_summary; +select assert(relative_error(log_likelihood, -207.725973784) < 1e-2, 'ARIMA: wrong log_likelihood') from tsa_out_summary; -- FALSE, ARRAY[1,0,1] drop table if exists tsa_out; drop table if exists tsa_out_summary; drop table if exists tsa_out_residual; select arima_train('mini_ts', 'tsa_out', 'id', 'val', NULL, FALSE, ARRAY[1,0,1]); -select assert(relative_error(ar_params, ARRAY[0.831752901064]) < 1e-2, 'ARIMA: wrong ar_params') from tsa_out; -select assert(relative_error(ar_std_errors, ARRAY[0.0695053543058]) < 1e-2, 'ARIMA: wrong ar_std_errors') from tsa_out; -select assert(relative_error(ma_params, ARRAY[0.701393608306]) < 1e-2, 'ARIMA: wrong ma_params') from tsa_out; -select assert(relative_error(ma_std_errors, ARRAY[0.0969171335486]) < 1e-2, 'ARIMA: wrong ma_std_errors') from tsa_out; -select assert(relative_error(residual_variance, 304.217719576) < 1e-2, 'ARIMA: wrong residual_variance') from tsa_out_summary; -select assert(relative_error(log_likelihood, -209.61270701) < 1e-2, 'ARIMA: wrong log_likelihood') from tsa_out_summary; +select assert(relative_error(ar_params, ARRAY[0.831752901064]) < 1e-2, 'ARIMA: wrong ar_params') from tsa_out; +select assert(relative_error(ar_std_errors, ARRAY[0.0695053543058]) < 1e-2, 'ARIMA: wrong ar_std_errors') from tsa_out; +select assert(relative_error(ma_params, ARRAY[0.701393608306]) < 1e-2, 'ARIMA: wrong ma_params') from tsa_out; +select assert(relative_error(ma_std_errors, ARRAY[0.0969171335486]) < 1e-2, 'ARIMA: wrong ma_std_errors') from tsa_out; +select assert(relative_error(residual_variance, 304.217719576) < 1e-2, 'ARIMA: wrong residual_variance') from tsa_out_summary; +select assert(relative_error(log_likelihood, -209.61270701) < 1e-2, 'ARIMA: wrong log_likelihood') from tsa_out_summary; -- FALSE, ARRAY[1,1,1] drop table if exists tsa_out; drop table if exists tsa_out_summary; drop table if exists tsa_out_residual; select arima_train('mini_ts', 'tsa_out', 'id', 'val', NULL, FALSE, ARRAY[1,1,1]); -select assert(relative_error(ar_params, ARRAY[0.16327119476]) < 1e-2, 'ARIMA: wrong ar_params') from tsa_out; -select assert(relative_error(ar_std_errors, ARRAY[0.211608737666]) < 1e-2, 'ARIMA: wrong ar_std_errors') from tsa_out; -select assert(relative_error(ma_params, ARRAY[0.630297255402]) < 1e-2, 'ARIMA: wrong ma_params') from tsa_out; -select assert(relative_error(ma_std_errors, ARRAY[0.163395070851]) < 1e-2, 'ARIMA: wrong ma_std_errors') from tsa_out;
[2/3] madlib git commit: Multiple: Remove trailing whitespace from all SQL
http://git-wip-us.apache.org/repos/asf/madlib/blob/35818fa3/src/ports/postgres/modules/linalg/svd.sql_in -- diff --git a/src/ports/postgres/modules/linalg/svd.sql_in b/src/ports/postgres/modules/linalg/svd.sql_in index b6d763b..070f5e4 100644 --- a/src/ports/postgres/modules/linalg/svd.sql_in +++ b/src/ports/postgres/modules/linalg/svd.sql_in @@ -79,17 +79,17 @@ row22 {0, 1} output_table_prefix -TEXT. Prefix for output tables. See -Output Tables below for a description +TEXT. Prefix for output tables. See +Output Tables below for a description of the convention used. row_id TEXT. ID for each row. k INTEGER. Number of singular values to compute. n_iterations (optional). -INTEGER. Number of iterations to run. -@note The number of iterations must be -in the range [k, column dimension], where +INTEGER. Number of iterations to run. +@note The number of iterations must be +in the range [k, column dimension], where k is number of singular values. result_summary_table (optional) TEXT. The name of the table to store the result summary. @@ -99,7 +99,7 @@ row22 {0, 1} SVD Function for Sparse Matrices Use this function for matrices that are represented in the sparse-matrix -format (example below). Note that the input matrix is converted to a +format (example below). Note that the input matrix is converted to a dense matrix before the SVD operation, for efficient computation reasons. @@ -142,8 +142,8 @@ matrix, indicating that the 4th row and 7th column contain all zeros. output_table_prefix -TEXT. Prefix for output tables. See -Output Tables below for a description +TEXT. Prefix for output tables. See +Output Tables below for a description of the convention used. row_id TEXT. Name of the column containing the row index for each entry in sparse matrix. @@ -158,9 +158,9 @@ matrix, indicating that the 4th row and 7th column contain all zeros. k INTEGER. Number of singular values to compute. n_iterations (optional) -INTEGER. Number of iterations to run. -@note The number of iterations must be -in the range [k, column dimension], where +INTEGER. Number of iterations to run. +@note The number of iterations must be +in the range [k, column dimension], where k is number of singular values. result_summary_table (optional) TEXT. The name of the table to store the result summary. @@ -171,10 +171,10 @@ matrix, indicating that the 4th row and 7th column contain all zeros. Native Implementation for Sparse Matrices Use this function for matrices that are represented in the sparse-matrix -format (see sparse matrix example above). This function uses the +format (see sparse matrix example above). This function uses the native sparse representation while computing the SVD. -@note Note that this function should be favored if the matrix is -highly sparse, since it computes very sparse matrices +@note Note that this function should be favored if the matrix is +highly sparse, since it computes very sparse matrices efficiently. @@ -195,8 +195,8 @@ svd_sparse_native( source_table, source_table TEXT. Source table name (sparse matrix - see example above). output_table_prefix -TEXT. Prefix for output tables. See -Output Tables below for a description +TEXT. Prefix for output tables. See +Output Tables below for a description of the convention used. row_id TEXT. ID for each row. @@ -211,9 +211,9 @@ svd_sparse_native( source_table, k INTEGER. Number of singular values to compute. n_iterations (optional) -INTEGER. Number of iterations to run. -@note The number of iterations must be -in the range [k, column dimension], where +INTEGER. Number of iterations to run. +@note The number of iterations must be +in the range [k, column dimension], where k is number of singular values. result_summary_table (optional) TEXT. Table name to store result summary. @@ -307,7 +307,7 @@ CREATE TABLE mat ( ); INSERT INTO mat VALUES (1,'{396,840,353,446,318,886,15,584,159,383}'), -(2,'{691,58,899,163,159,533,604,582,269,390}'), +(2,'{691,58,899,163,159,533,604,582,269,390}'), (3,'{293,742,298,75,404,857,941,662,846,2}'), (4,'{462,532,787,265,982,306,600,608,212,885}'), (5,'{304,151,337,387,643,753,603,531,459,652}'), @@ -328,7 +328,7 @@ INSERT INTO mat VALUES SELECT madlib.svd( 'mat', -- Input table 'svd', -- Output table prefix - 'row_id',-- Column name with row index + 'row_id',-- Column name with row index 10, -- Number of singular values to compute NULL,-- Use default number of iterations 'svd_summary_table'
madlib git commit: MLP: Simplify momentum and Nesterov updates
Repository: madlib Updated Branches: refs/heads/master 5ab573bec -> 92bdf8cab MLP: Simplify momentum and Nesterov updates JIRA: MADLIB-1272 Momentum updates are complicated due to Nesterov requiring an initial update before gradient calculations. There is, however, a different form of the Nesterov update that can be cleanly performed after the regular update, simplifying the code. This allows performing the gradient calculations before any update - with or without Nesterov. Closes #313 Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/92bdf8ca Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/92bdf8ca Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/92bdf8ca Branch: refs/heads/master Commit: 92bdf8cab087472da1b2962f4ce51dc20255f6ba Parents: 5ab573b Author: Rahul Iyer Authored: Fri Aug 17 01:42:53 2018 -0700 Committer: Rahul Iyer Committed: Wed Aug 29 10:31:08 2018 -0700 -- src/modules/convex/task/mlp.hpp | 53 +- src/modules/convex/type/model.hpp | 44 ++-- 2 files changed, 42 insertions(+), 55 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/92bdf8ca/src/modules/convex/task/mlp.hpp -- diff --git a/src/modules/convex/task/mlp.hpp b/src/modules/convex/task/mlp.hpp index 3915ab1..b772549 100644 --- a/src/modules/convex/task/mlp.hpp +++ b/src/modules/convex/task/mlp.hpp @@ -158,9 +158,6 @@ MLP::getLossAndUpdateModel( const double ) { double total_loss = 0.; -// model is updated with the momentum step (i.e. velocity vector) -// if Nesterov Accelerated Gradient is enabled -model.nesterovUpdatePosition(); // initialize gradient vector std::vector total_gradient_per_layer(model.num_layers); @@ -188,22 +185,37 @@ MLP::getLossAndUpdateModel( total_loss += getLoss(y_true, o.back(), model.is_classification); } -// convert gradient to a gradient update vector -// 1. normalize to per row update -// 2. discount by stepsize -// 3. add regularization -// 4. make negative for (Index k=0; k < model.num_layers; k++){ +// convert gradient to a gradient update vector +// 1. normalize to per row update +// 2. discount by stepsize +// 3. add regularization +// 4. make negative for descent Matrix regularization = MLP::lambda * model.u[k]; regularization.row(0).setZero(); // Do not update bias -total_gradient_per_layer[k] = -stepsize * (total_gradient_per_layer[k] / static_cast(num_rows_in_batch) + - regularization); -model.updateVelocity(total_gradient_per_layer[k], k); -model.updatePosition(total_gradient_per_layer[k], k); +total_gradient_per_layer[k] = -stepsize * +(total_gradient_per_layer[k] / static_cast(num_rows_in_batch) + + regularization); + +// total_gradient_per_layer is now the update vector +if (model.momentum > 0){ +model.velocity[k] = model.momentum * model.velocity[k] + total_gradient_per_layer[k]; +if (model.is_nesterov){ +// Below equation ensures that Nesterov updates are half step +// ahead of regular momentum updates i.e. next step's discounted +// velocity update is already added in the current step. +model.u[k] += model.momentum * model.velocity[k] + total_gradient_per_layer[k]; +} +else{ +model.u[k] += model.velocity[k]; +} +} else { +// no momentum +model.u[k] += total_gradient_per_layer[k]; +} } return total_loss; - } @@ -215,8 +227,6 @@ MLP::gradientInPlace( const dependent_variable_type _true, const double) { -model.nesterovUpdatePosition(); - std::vector net, o, delta; feedForward(model, x, net, o); @@ -225,15 +235,18 @@ MLP::gradientInPlace( for (Index k=0; k < model.num_layers; k++){ Matrix regularization = MLP::lambda*model.u[k]; regularization.row(0).setZero(); // Do not update bias + if (model.momentum > 0){ Matrix gradient = -stepsize * (o[k] * delta[k].transpose() + regularization); -model.updateVelocity(gradient, k); -model.updatePosition(gradient, k); +model.velocity[k] = model.momentum * model.velocity[k] + gradient; +if (model.is_nesterov) +model.u[k] += model.momentum * model.velocity[k] + gradient; +else +model.u[k] += model.velocity[k];
[3/3] madlib git commit: Multiple: Re-enable tests in PCA, Pagerank
Multiple: Re-enable tests in PCA, Pagerank JIRA: MADLIB-1264 Some tests were commented out due to failures on GPDB 5.X. These tests are now working and have been enabled again. Closes #312 Co-authored-by: Arvind Sridhar Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/a3b59356 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/a3b59356 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/a3b59356 Branch: refs/heads/master Commit: a3b59356f328fb949d63758a518aeca6d72220cf Parents: 5ccf12e Author: Jingyi Mei Authored: Thu Aug 16 20:12:25 2018 -0700 Committer: Rahul Iyer Committed: Thu Aug 16 20:18:21 2018 -0700 -- .../postgres/modules/graph/test/pagerank.sql_in | 23 +++- src/ports/postgres/modules/pca/test/pca.sql_in | 16 ++ 2 files changed, 20 insertions(+), 19 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/a3b59356/src/ports/postgres/modules/graph/test/pagerank.sql_in -- diff --git a/src/ports/postgres/modules/graph/test/pagerank.sql_in b/src/ports/postgres/modules/graph/test/pagerank.sql_in index 14d3371..e797812 100644 --- a/src/ports/postgres/modules/graph/test/pagerank.sql_in +++ b/src/ports/postgres/modules/graph/test/pagerank.sql_in @@ -60,6 +60,7 @@ INSERT INTO "EDGE" VALUES (5, 6, 2), (6, 3, 2); +-- Test pagerank without group DROP TABLE IF EXISTS pagerank_out, pagerank_out_summary; SELECT pagerank( 'vertex',-- Vertex table @@ -73,6 +74,8 @@ SELECT assert(relative_error(SUM(pagerank), 1) < 0.1, 'PageRank: Scores do not sum up to 1.' ) FROM pagerank_out; + +-- Test pagerank with group DROP TABLE IF EXISTS pagerank_gr_out; DROP TABLE IF EXISTS pagerank_gr_out_summary; SELECT pagerank( @@ -84,7 +87,7 @@ SELECT pagerank( NULL, -- Default damping factor (0.85) NULL, -- Default max iters (100) NULL, -- Default Threshold - 'user_id'); -- Personlized Nodes + 'user_id');-- Grouping Column -- View the PageRank of all vertices, sorted by their scores. SELECT assert(relative_error(SUM(pagerank), 1) < 0.1, @@ -94,8 +97,16 @@ SELECT assert(relative_error(SUM(pagerank), 1) < 0.1, 'PageRank: Scores do not sum up to 1 for group 2.' ) FROM pagerank_gr_out WHERE user_id=2; --- Tests for Personalized Page Rank +-- Check the iteration numbers for convergency +SELECT assert(relative_error(__iterations__, 11) = 0, +'PageRank: Incorrect iterations for group 1.' +) FROM pagerank_gr_out_summary WHERE user_id=1; +SELECT assert(relative_error(__iterations__, 14) = 0, +'PageRank: Incorrect iterations for group 2.' +) FROM pagerank_gr_out_summary WHERE user_id=2; + +-- Tests for Personalized Page Rank -- Test without grouping DROP TABLE IF EXISTS pagerank_ppr_out; @@ -141,14 +152,6 @@ SELECT assert(relative_error(SUM(pagerank), 1) < 0.005, ) FROM pagerank_ppr_grp_out WHERE user_id=1; select assert(array_agg(user_id order by pagerank desc)= '{2, 2, 1, 1, 2, 2, 1, 1, 2, 1, 2, 1, 2, 1}','Unexpected Ranking with grouping ') from pagerank_ppr_grp_out ; --- These tests have been temporarily removed for GPDB5 alpha support - --- SELECT assert(relative_error(__iterations__, 27) = 0, --- 'PageRank: Incorrect iterations for group 1.' --- ) FROM pagerank_gr_out_summary WHERE user_id=1; --- SELECT assert(relative_error(__iterations__, 31) = 0, --- 'PageRank: Incorrect iterations for group 2.' --- ) FROM pagerank_gr_out_summary WHERE user_id=2; -- Test to capture corner case reported in https://issues.apache.org/jira/browse/MADLIB-1229 http://git-wip-us.apache.org/repos/asf/madlib/blob/a3b59356/src/ports/postgres/modules/pca/test/pca.sql_in -- diff --git a/src/ports/postgres/modules/pca/test/pca.sql_in b/src/ports/postgres/modules/pca/test/pca.sql_in index 5a97c94..1510254 100644 --- a/src/ports/postgres/modules/pca/test/pca.sql_in +++ b/src/ports/postgres/modules/pca/test/pca.sql_in @@ -145,16 +145,14 @@ COPY mat (id, row_vec, grp) FROM stdin delimiter '|'; 16|{739,651,678,577,273,935,661,47,373,618}|2 \. --- This test has been temporarily removed for GPDB5 alpha support - -- Learn individaul PCA models based on grouping column (grp) --- drop table if exists result_table_214712398172490837; --- drop table if exists result_table_214712398172490837_mean; --- drop table if exists result_table_214712398172490838; --- select pca_train('mat', 'result_table_214712398172490837', 'id', 0.8, --- 'grp', 5, FALSE, 'result_table_214712398172490838'); --- select *
[1/3] madlib git commit: Elastic Net: Allow grouping by non-numeric column
Repository: madlib Updated Branches: refs/heads/master 441f16bd5 -> a3b59356f Elastic Net: Allow grouping by non-numeric column JIRA: MADLIB-1262 - Grouping columns should be quoted if the type of the column is of type TEXT. - Grouping column names that require double quoting need special handling. Closes #309 Co-authored-by: Domino Valdano Co-authored-by: Rahul Iyer Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/ec328dba Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/ec328dba Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/ec328dba Branch: refs/heads/master Commit: ec328dba6853d31df5b1bd6bbdcd35933596fe78 Parents: 441f16b Author: Arvind Sridhar Authored: Thu Aug 16 20:02:48 2018 -0700 Committer: Rahul Iyer Committed: Thu Aug 16 20:03:22 2018 -0700 -- .../elastic_net_generate_result.py_in | 63 ++ .../modules/elastic_net/test/elastic_net.sql_in | 122 +++ 2 files changed, 161 insertions(+), 24 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/ec328dba/src/ports/postgres/modules/elastic_net/elastic_net_generate_result.py_in -- diff --git a/src/ports/postgres/modules/elastic_net/elastic_net_generate_result.py_in b/src/ports/postgres/modules/elastic_net/elastic_net_generate_result.py_in index 1dbd664..15881b4 100644 --- a/src/ports/postgres/modules/elastic_net/elastic_net_generate_result.py_in +++ b/src/ports/postgres/modules/elastic_net/elastic_net_generate_result.py_in @@ -2,7 +2,10 @@ import plpy from elastic_net_utils import _process_results from elastic_net_utils import _compute_log_likelihood from utilities.validate_args import get_cols_and_types +from utilities.validate_args import quote_ident from utilities.utilities import split_quoted_delimited_str +from internal.db_utils import quote_literal + def _elastic_net_generate_result(optimizer, iteration_run, **args): """ @@ -33,26 +36,30 @@ def _elastic_net_generate_result(optimizer, iteration_run, **args): col_grp_key = args['col_grp_key'] grouping_str = args['grouping_str'] cols_types = dict(get_cols_and_types(args["tbl_source"])) -grouping_str1 = grouping_column + "," +grouping_cols_list = split_quoted_delimited_str(grouping_column) +grouping_str1 = ','.join(['{0} AS {1}'.format(g, quote_ident(g)) + for g in grouping_cols_list]) select_mean_and_std = '' inner_join_x = '' inner_join_y = '' -grouping_cols_list = split_quoted_delimited_str(grouping_column) -select_grp = ','.join(['n_tuples_including_nulls_subq.'+str(grp) -for grp in grouping_cols_list]) + ',' -select_grouping_info = ','.join([grp_col+"\t"+cols_types[grp_col] +select_grp = ','.join(['n_tuples_including_nulls_subq.' + str(quote_ident(grp)) + for grp in grouping_cols_list]) + ',' +select_grouping_info = ','.join([grp_col + "\t" + cols_types[grp_col] for grp_col in grouping_cols_list]) + "," if data_scaled: x_grp_cols = ' AND '.join([ -'n_tuples_including_nulls_subq.{0}={1}.{2}'.format(grp, -args["x_mean_table"], grp) for grp in grouping_cols_list]) +'{0} = {1}.{2}'.format('n_tuples_including_nulls_subq.' + str(quote_ident(grp)), + args["x_mean_table"], grp) +for grp in grouping_cols_list]) y_grp_cols = ' AND '.join([ -'n_tuples_including_nulls_subq.{0}={1}.{2}'.format(grp, -args["y_mean_table"], grp) for grp in grouping_cols_list]) -select_mean_and_std = ' {0}.mean AS x_mean, '.format(args["x_mean_table"]) +\ -' {0}.mean AS y_mean, '.format(args["y_mean_table"]) +\ -' {0}.std AS x_std, '.format(args["x_mean_table"]) +'{0}={1}.{2}'.format('n_tuples_including_nulls_subq.' + str(quote_ident(grp)), + args["y_mean_table"], grp) +for grp in grouping_cols_list]) +select_mean_and_std = ( +' {0}.mean AS x_mean, '.format(args["x_mean_table"]) + +' {0}.mean AS y_mean, '.format(args["y_mean_table"]) + +' {0}.std AS x_std, '.format(args["x_mean_table"])) inner_join_x = ' INNER JOIN {0} ON {1} '.format( args["x_mean_table"], x_grp_cols) inner_join_y = ' INNER JOIN {0} ON {1} '.format( @@ -66,7 +73,7 @@ def _elastic_net_generate_result(optimizer, iteration_run, **args): FROM
[2/3] madlib git commit: Vec2Cols: Allow arrays of different lengths
Vec2Cols: Allow arrays of different lengths JIRA: MADLIB-1270 Added support to split arrays of different lengths in the vector_col. If the user does not provide feature names, we pad each array to the maximum length and split across the maximum possible number of features. If the user does provide feature names, we truncate/pad the arrays according to the number of features the user desires. Closes #311 Co-authored-by: Arvind Sridhar Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/5ccf12e1 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/5ccf12e1 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/5ccf12e1 Branch: refs/heads/master Commit: 5ccf12e113f04b02b9ccf8c9aee107a4feb4bd88 Parents: ec328db Author: Rahul Iyer Authored: Thu Aug 16 20:08:32 2018 -0700 Committer: Rahul Iyer Committed: Thu Aug 16 20:08:32 2018 -0700 -- .../utilities/test/transform_vec_cols.sql_in| 47 .../unit_tests/test_transform_vec_cols.py_in| 14 +- .../modules/utilities/transform_vec_cols.py_in | 34 +++--- 3 files changed, 64 insertions(+), 31 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/5ccf12e1/src/ports/postgres/modules/utilities/test/transform_vec_cols.sql_in -- diff --git a/src/ports/postgres/modules/utilities/test/transform_vec_cols.sql_in b/src/ports/postgres/modules/utilities/test/transform_vec_cols.sql_in index 47ab299..b43b39f 100644 --- a/src/ports/postgres/modules/utilities/test/transform_vec_cols.sql_in +++ b/src/ports/postgres/modules/utilities/test/transform_vec_cols.sql_in @@ -104,6 +104,53 @@ SELECT assert ((SELECT count(*) FROM information_schema.columns WHERE table_name SELECT assert ((SELECT clouds_airquality[1] FROM dt_golf WHERE id = 1) = (SELECT clouds FROM out_table WHERE id = 1), 'Split values do not match up'); SELECT assert ((SELECT clouds_airquality[2] FROM dt_golf WHERE id = 1) = (SELECT air_quality FROM out_table WHERE id = 1), 'Split values do not match up'); +-- Testing splitting arrays of different lengths into features + +DROP TABLE IF EXISTS diff_lengths_test; +CREATE TABLE diff_lengths_test( +"id" INTEGER, +"arr" TEXT[]); +INSERT INTO diff_lengths_test VALUES (1, '{a, b}'), (2, '{c, d}'), (3, '{e, f, g, h}'), (4, '{i}'), (5, '{}'); + +DROP TABLE IF EXISTS out_table; +SELECT vec2cols( +'diff_lengths_test', +'out_table', +'arr' +); + +SELECT assert ((SELECT count(*) FROM information_schema.columns WHERE table_name='out_table') = (SELECT max(array_upper(arr, 1)) from diff_lengths_test), 'Number of split columns does not match'); + +DROP TABLE IF EXISTS out_table; +SELECT vec2cols( +'diff_lengths_test', +'out_table', +'arr', +ARRAY['a'] +); + +SELECT assert ((SELECT count(*) FROM information_schema.columns WHERE table_name='out_table') = 1, 'Number of split columns does not match'); + +DROP TABLE IF EXISTS out_table; +SELECT vec2cols( +'diff_lengths_test', +'out_table', +'arr', +ARRAY['a', 'b', 'c'] +); + +SELECT assert ((SELECT count(*) FROM information_schema.columns WHERE table_name='out_table') = 3, 'Number of split columns does not match'); + +DROP TABLE IF EXISTS out_table; +SELECT vec2cols( +'diff_lengths_test', +'out_table', +'arr', +ARRAY['a', 'b', 'c', 'd', 'e', 'f', 'g'] +); + +SELECT assert ((SELECT count(*) FROM information_schema.columns WHERE table_name='out_table') = 7, 'Number of split columns does not match'); + -- Special character tests DROP TABLE IF EXISTS special_char_check; http://git-wip-us.apache.org/repos/asf/madlib/blob/5ccf12e1/src/ports/postgres/modules/utilities/test/unit_tests/test_transform_vec_cols.py_in -- diff --git a/src/ports/postgres/modules/utilities/test/unit_tests/test_transform_vec_cols.py_in b/src/ports/postgres/modules/utilities/test/unit_tests/test_transform_vec_cols.py_in index 6475f9b..3020309 100644 --- a/src/ports/postgres/modules/utilities/test/unit_tests/test_transform_vec_cols.py_in +++ b/src/ports/postgres/modules/utilities/test/unit_tests/test_transform_vec_cols.py_in @@ -125,23 +125,13 @@ class Vec2ColsTestSuite(unittest.TestCase): def test_get_names_for_split_output_cols_feature_names_none(self): self.plpy_mock_execute.return_value = [{"n_x": 3}] -new_cols = self.subject.get_names_for_split_output_cols(self.default_source_table, 'foobar', None) +new_cols = self.subject.get_names_for_split_output_cols(self.default_source_table, 'foobar') self.assertEqual(['f1', 'f2', 'f3'], new_cols) -def test_get_names_for_split_output_cols_feature_names_not_none(self): -
madlib git commit: Build: Download compatible Boost if version >= 1.65
Repository: madlib Updated Branches: refs/heads/master 0490ea779 -> cf5ace944 Build: Download compatible Boost if version >= 1.65 JIRA: MADLIB-1235 BOOST 1.65.0 removed the TR1 library which is required by MADlib till C++11 is completely supported. Hence, we force download of a compatible version if existing Boost is 1.65 or greater. This should be removed when TR1 dependency is removed. Closes #310 Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/cf5ace94 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/cf5ace94 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/cf5ace94 Branch: refs/heads/master Commit: cf5ace944bef74648fd456c1f00356df78e90f4f Parents: 0490ea7 Author: Rahul Iyer Authored: Sat Aug 11 12:28:29 2018 -0700 Committer: Rahul Iyer Committed: Wed Aug 15 10:14:18 2018 -0700 -- src/CMakeLists.txt | 23 --- 1 file changed, 16 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/cf5ace94/src/CMakeLists.txt -- diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index e2ce352..c9759ad 100644 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -103,21 +103,30 @@ set(MAD_MODULE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/modules) # -- Third-party dependencies: Find or download Boost -- find_package(Boost 1.47) - -# We use BOOST_ASSERT_MSG, which only exists in Boost 1.47 and later. -# Unfortunately, the FindBoost module seems to be broken with respect to version -# checking, so we will set Boost_FOUND to FALSE if the version is too old. if(Boost_FOUND) +# We use BOOST_ASSERT_MSG, which only exists in Boost 1.47 and later. +# Unfortunately, the FindBoost module seems to be broken with respect to +# version checking, so we will set Boost_FOUND to FALSE if the version is +# too old. if(Boost_VERSION LESS 104600) +message(STATUS "No sufficiently recent version (>= 1.47) of Boost was found. Will download.") +set(Boost_FOUND FALSE) +endif(Boost_VERSION LESS 104600) + +# BOOST 1.65.0 removed the TR1 library which is required by MADlib till +# C++11 is completely supported. Hence, we force download of a compatible +# version if existing Boost is 1.65 or greater. FIXME: This should be +# removed when TR1 dependency is removed. +if(NOT Boost_VERSION LESS 106500) +message(STATUS +"Incompatible Boost version (>= 1.65) found. Will download a compatible version.") set(Boost_FOUND FALSE) -endif(Boost_VERSION LESS 104600 ) +endif(NOT Boost_VERSION LESS 106500) endif(Boost_FOUND) if(Boost_FOUND) include_directories(${Boost_INCLUDE_DIRS}) else(Boost_FOUND) -message(STATUS "No sufficiently recent version (>= 1.47) of Boost was found. Will download.") - ExternalProject_Add(EP_boost PREFIX ${MAD_THIRD_PARTY} DOWNLOAD_DIR ${MAD_THIRD_PARTY}/downloads
madlib git commit: Utilities: Use plpy.quote_ident if available
Repository: madlib Updated Branches: refs/heads/master aa18c0a3b -> 0490ea779 Utilities: Use plpy.quote_ident if available Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/0490ea77 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/0490ea77 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/0490ea77 Branch: refs/heads/master Commit: 0490ea77911c33c941a59352ac5a3568f968b186 Parents: aa18c0a Author: Rahul Iyer Authored: Mon Aug 13 15:45:31 2018 -0700 Committer: Rahul Iyer Committed: Mon Aug 13 15:45:31 2018 -0700 -- .../modules/utilities/validate_args.py_in | 28 +++- 1 file changed, 15 insertions(+), 13 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/0490ea77/src/ports/postgres/modules/utilities/validate_args.py_in -- diff --git a/src/ports/postgres/modules/utilities/validate_args.py_in b/src/ports/postgres/modules/utilities/validate_args.py_in index f7f79e9..28e6aa4 100644 --- a/src/ports/postgres/modules/utilities/validate_args.py_in +++ b/src/ports/postgres/modules/utilities/validate_args.py_in @@ -72,19 +72,21 @@ def quote_ident(input_str): Returns: String """ - -def quote_not_needed(ch): -return (ch in string.ascii_lowercase or ch in string.digits or ch == '_') - -if input_str: -input_str = input_str.strip() -if all(quote_not_needed(c) for c in input_str): -return input_str -else: -# if input_str has double quotes then each double quote -# is prependend with a double quote -# (the 1st double quote is used to escape the 2nd double quote) -return '"' + re.sub(r'"', r'""', input_str) + '"' +try: +return plpy.quote_ident(input_str) +except AttributeError: +def quote_not_needed(ch): +return (ch in string.ascii_lowercase or ch in string.digits or ch == '_') + +if input_str: +input_str = input_str.strip() +if all(quote_not_needed(c) for c in input_str): +return input_str +else: +# if input_str has double quotes then each double quote +# is prependend with a double quote +# (the 1st double quote is used to escape the 2nd double quote) +return '"' + re.sub(r'"', r'""', input_str) + '"' # -
madlib git commit: Build: Update versions after release
Repository: madlib Updated Branches: refs/heads/master fa02339dc -> aa18c0a3b Build: Update versions after release Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/aa18c0a3 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/aa18c0a3 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/aa18c0a3 Branch: refs/heads/master Commit: aa18c0a3bffea472eab87f8163a5146f5effb671 Parents: fa02339 Author: Rahul Iyer Authored: Mon Aug 13 11:40:36 2018 -0700 Committer: Rahul Iyer Committed: Mon Aug 13 11:40:36 2018 -0700 -- deploy/postflight.sh | 2 +- pom.xml| 2 +- src/config/Version.yml | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/aa18c0a3/deploy/postflight.sh -- diff --git a/deploy/postflight.sh b/deploy/postflight.sh index df430bd..2cb5460 100755 --- a/deploy/postflight.sh +++ b/deploy/postflight.sh @@ -2,7 +2,7 @@ # $0 - Script Path, $1 - Package Path, $2 - Target Location, and $3 - Target Volume -MADLIB_VERSION=1.15 +MADLIB_VERSION=1.15.1-dev find $2/usr/local/madlib/bin -type d -exec cp -RPf {} $2/usr/local/madlib/old_bin \; 2>/dev/null find $2/usr/local/madlib/bin -depth -type d -exec rm -r {} \; 2>/dev/null http://git-wip-us.apache.org/repos/asf/madlib/blob/aa18c0a3/pom.xml -- diff --git a/pom.xml b/pom.xml index 1417ff8..e441dbc 100644 --- a/pom.xml +++ b/pom.xml @@ -22,7 +22,7 @@ org.apache.madlib madlib - 1.15 + 1.15.1-dev pom http://git-wip-us.apache.org/repos/asf/madlib/blob/aa18c0a3/src/config/Version.yml -- diff --git a/src/config/Version.yml b/src/config/Version.yml index 8870dbc..6c9f460 100644 --- a/src/config/Version.yml +++ b/src/config/Version.yml @@ -1 +1 @@ -version: 1.15 +version: 1.15.1-dev
[madlib] Git Push Summary
Repository: madlib Updated Branches: refs/heads/latest_release [created] fa02339dc
[madlib] Git Push Summary
Repository: madlib Updated Branches: refs/heads/latest_release [deleted] d0ad93d26
[22/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__glm.html -- diff --git a/docs/rc/group__grp__glm.html b/docs/rc/group__grp__glm.html deleted file mode 100644 index 78d953c..000 --- a/docs/rc/group__grp__glm.html +++ /dev/null @@ -1,585 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: Generalized Linear Models - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('group__grp__glm.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -Generalized Linear ModelsSupervised Learning Regression Models - - -Contents - -Training Function - -Prediction Function - -Examples - -Related Topics - -Generalized linear models extends ordinary linear regression by allowing the response variable to follow a more general set of distributions (rather than simply Gaussian distributions), and for a general family of functions of the response variable (the link function) to vary linearly with the predicted values (rather than assuming that the response itself must vary linearly). -For example, data of counts would typically be modeled with a Poisson distribution and a log link, while binary outcomes would typically be modeled with a Bernoulli distribution (or binomial distribution, depending on exactly how the problem is phrased) and a log-odds (or logit) link function. -Currently, the implemented distribution families are - -Distribution Family Link Functions - -Binomial logit, probit - -Gamma inverse, identity, log - -Gaussian identity, inverse, log - -Inverse Gaussian inverse of square, inverse, identity, log - -Poisson log, identity, square-root - - -Training FunctionGLM training function has the following format: -glm(source_table, -model_table, -dependent_varname, -independent_varname, -family_params, -grouping_col, -optim_params, -verbose -) - Arguments -source_table -TEXT. The name of the table containing the training data. - - -model_table -TEXT. Name of the generated table containing the model. -The model table produced by glm contains the following columns: - - -... Text. Grouping columns, if provided in input. This could be multiple columns depending on the grouping_col input. - - - -coef FLOAT8. Vector of the coefficients in linear predictor. - - - -log_likelihood FLOAT8. The log-likelihood \( l(\boldsymbol \beta) \). We use the maximum likelihood estimate of dispersion parameter to calculate the log-likelihood while R and Python use deviance estimate and Pearson estimate respectively. - - - -std_err FLOAT8[]. Vector of the standard error of the coefficients. - - - -z_stats or t_stats FLOAT8[]. Vector of the z-statistics (in Poisson distribtuion and Binomial distribution) or the t-statistics (in all other distributions) of the coefficients. - - - -p_values FLOAT8[]. Vector of the p-values of the coefficients. - - - -dispersion FLOAT8. The dispersion value (Pearson estimate). When family=poisson or family=binomial, the dispersion is always 1. - - - -num_rows_processed BIGINT. Numbers of rows processed. - - - -num_rows_skipped BIGINT. Numbers of rows skipped due to missing values or failures. - - - -num_iterations INTEGER. The number of iterations actually completed. This would be different from
[21/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__graph__measures.html -- diff --git a/docs/rc/group__grp__graph__measures.html b/docs/rc/group__grp__graph__measures.html deleted file mode 100644 index 9339d92..000 --- a/docs/rc/group__grp__graph__measures.html +++ /dev/null @@ -1,155 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: Measures - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('group__grp__graph__measures.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -Modules - -MeasuresGraph - - -Detailed Description -A collection of metrics computed on a graph. - - -Modules -Average Path Length -Computes the average shortest-path length of a graph. - -Closeness -Computes the closeness centrality value of each node in the graph. - -Graph Diameter -Computes the diameter of a graph. - -In-Out Degree -Computes the degrees for each vertex. - - - - - - - -Generated on Mon Aug 6 2018 21:55:39 for MADlib by -http://www.doxygen.org/index.html;> - 1.8.14 - - - - http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__graph__measures.js -- diff --git a/docs/rc/group__grp__graph__measures.js b/docs/rc/group__grp__graph__measures.js deleted file mode 100644 index 6272fba..000 --- a/docs/rc/group__grp__graph__measures.js +++ /dev/null @@ -1,7 +0,0 @@ -var group__grp__graph__measures = -[ -[ "Average Path Length", "group__grp__graph__avg__path__length.html", null ], -[ "Closeness", "group__grp__graph__closeness.html", null ], -[ "Graph Diameter", "group__grp__graph__diameter.html", null ], -[ "In-Out Degree", "group__grp__graph__vertex__degrees.html", null ] -]; \ No newline at end of file http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__graph__vertex__degrees.html -- diff --git a/docs/rc/group__grp__graph__vertex__degrees.html b/docs/rc/group__grp__graph__vertex__degrees.html deleted file mode 100644 index 9d8a2f5..000 --- a/docs/rc/group__grp__graph__vertex__degrees.html +++ /dev/null @@ -1,273 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: In-Out Degree - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) -
[19/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__lda.html -- diff --git a/docs/rc/group__grp__lda.html b/docs/rc/group__grp__lda.html deleted file mode 100644 index 3a04a90..000 --- a/docs/rc/group__grp__lda.html +++ /dev/null @@ -1,765 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: Latent Dirichlet Allocation - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('group__grp__lda.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -Latent Dirichlet AllocationUnsupervised Learning Topic Modelling - - -Contents - -Background - -Training Function - -Prediction Function - -Perplexity - -Helper Functions - -Examples - -Literature - -Related Topics - - - -Latent Dirichlet Allocation (LDA) is a generative probabilistic model for natural texts. It is used in problems such as automated topic discovery, collaborative filtering, and document classification. -In addition to an implementation of LDA, this MADlib module also provides a number of additional helper functions to interpret results of the LDA output. -NoteTopic modeling is often used as part of a larger text processing pipeline, which may include operations such as term frequency, stemming and stop word removal. You can use the function Term Frequency to generate the required vocabulary format from raw documents for the LDA training function. See the examples later on this page for more details. -Background -The LDA model posits that each document is associated with a mixture of various topics (e.g., a document is related to Topic 1 with probability 0.7, and Topic 2 with probability 0.3), and that each word in the document is attributable to one of the document's topics. There is a (symmetric) Dirichlet prior with parameter \( \alpha \) on each document's topic mixture. In addition, there is another (symmetric) Dirichlet prior with parameter \( \beta \) on the distribution of words for each topic. -The following generative process then defines a distribution over a corpus of documents: - -Sample for each topic \( i \), a per-topic word distribution \( \phi_i \) from the Dirichlet( \(\beta\)) prior. -For each document: -Sample a document length N from a suitable distribution, say, Poisson. -Sample a topic mixture \( \theta \) for the document from the Dirichlet( \(\alpha\)) distribution. -For each of the N words: -Sample a topic \( z_n \) from the multinomial topic distribution \( \theta \). -Sample a word \( w_n \) from the multinomial word distribution \( \phi_{z_n} \) associated with topic \( z_n \). - - - - - -In practice, only the words in each document are observable. The topic mixture of each document and the topic for each word in each document are latent unobservable variables that need to be inferred from the observables, and this is referred to as the inference problem for LDA. Exact inference is intractable, but several approximate inference algorithms for LDA have been developed. The simple and effective Gibbs sampling algorithm described in Griffiths and Steyvers [2] appears to be the current algorithm of choice. -This implementation provides a parallel and scalable in-database solution for LDA based on Gibbs sampling. It takes advantage of the shared-nothing MPP
[24/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__deprecated.html -- diff --git a/docs/rc/group__grp__deprecated.html b/docs/rc/group__grp__deprecated.html deleted file mode 100644 index aaa9813..000 --- a/docs/rc/group__grp__deprecated.html +++ /dev/null @@ -1,149 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: Deprecated Modules - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('group__grp__deprecated.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -Modules - -Deprecated Modules - - -Detailed Description -Deprecated modules that will be removed in the next major version (2.0). There are newer MADlib modules that have replaced these functions. - - -Modules -Create Indicator Variables -Provides utility functions helpful for data preparation before modeling. - -Multinomial Logistic Regression -Also called as softmax regression, models the relationship between one or more independent variables and a categorical dependent variable. - - - - - - - -Generated on Mon Aug 6 2018 21:55:39 for MADlib by -http://www.doxygen.org/index.html;> - 1.8.14 - - - - http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__deprecated.js -- diff --git a/docs/rc/group__grp__deprecated.js b/docs/rc/group__grp__deprecated.js deleted file mode 100644 index 05ef03b..000 --- a/docs/rc/group__grp__deprecated.js +++ /dev/null @@ -1,5 +0,0 @@ -var group__grp__deprecated = -[ -[ "Create Indicator Variables", "group__grp__indicator.html", null ], -[ "Multinomial Logistic Regression", "group__grp__mlogreg.html", null ] -]; \ No newline at end of file http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__desc__stats.html -- diff --git a/docs/rc/group__grp__desc__stats.html b/docs/rc/group__grp__desc__stats.html deleted file mode 100644 index 21c7333..000 --- a/docs/rc/group__grp__desc__stats.html +++ /dev/null @@ -1,152 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: Descriptive Statistics - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - -
svn commit: r28667 - in /release/madlib: 1.14/ 1.15/
Author: riyer Date: Fri Aug 10 23:23:27 2018 New Revision: 28667 Log: Add 1.15 binaries and remove 1.14 Added: release/madlib/1.15/ release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg (with props) release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg.asc release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg.sha512 release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm (with props) release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm.asc release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm.sha512 release/madlib/1.15/apache-madlib-1.15-bin-Linux.rpm (with props) release/madlib/1.15/apache-madlib-1.15-bin-Linux.rpm.asc release/madlib/1.15/apache-madlib-1.15-bin-Linux.rpm.sha512 release/madlib/1.15/apache-madlib-1.15-src.tar.gz (with props) release/madlib/1.15/apache-madlib-1.15-src.tar.gz.asc release/madlib/1.15/apache-madlib-1.15-src.tar.gz.sha512 Removed: release/madlib/1.14/ Added: release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg == Binary file - no diff available. Propchange: release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg -- svn:mime-type = application/octet-stream Added: release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg.asc == --- release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg.asc (added) +++ release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg.asc Fri Aug 10 23:23:27 2018 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCAAdFiEE+pCcIAZjKnTgT/pQYwVq5BLE49cFAltpL1QACgkQYwVq5BLE +49cDJA/+IhMzkgxv6zX1Omuo8ofNMCetHJC4RmB8rwxem7DnLVUgwYNn+xK7lpAU +Yn9nm/XtFGXVqJ4CWGzaDL/iW2fsUqI5LX22CgeRaRD/iXasYB5TWMKvspaYY5RW +23Y7lYv3ea/+Gxnjj3uG7BwqxJ5YvtNiWoKWpq8PhSgo1souBivMGLGVS1DK55Wy +gnZuGULY9qq3cr0n5N7HDRS0e3bzKWqpm5xcGAtz2O5hW7tVDqT2FBrJmOG8mkPQ +GZ7cRPbeIeAi+CQzuvm522DtqPepJJW99UAl+0oksHgB6ag+iS80bufF27Fr9P0n +18Lq59/mJwdeUIxK95ak2AWjjmuuFzLY5QB06kJ5Mze96m4SA/VFJ9qdGljcDesX +BkwKNboi/zQSrUY5xVWNPWn3Qe5v0FUH8H0K1laqkczkeN+TGh8BlmOUF9DGbZ3l +L8spewzlbjuUAVUX9Q5Sren4qiliTj7UR4+hhggDvHIAAQQjCsOj78dOzce3Px8c +BrYRHCHbzBS6vg75DRj3P2KItpeRvwdZfNBaG/F0cPpBP/Yuwma62SdGATLdg6Fj ++mMcYysmJLTrPsN0fu+Q7YasWgkPJthnaIkdxpbpEFkh74ZZaYcpDZZw7HW3FBB7 +qm8DQiMrL5wED9khZtvWNuqrMjlCuIN+/j8d8N7508DMtPtSkVE= +=wEHu +-END PGP SIGNATURE- Added: release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg.sha512 == --- release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg.sha512 (added) +++ release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg.sha512 Fri Aug 10 23:23:27 2018 @@ -0,0 +1 @@ +494c374d272ac707dd503b1c1e33900ca0cca56f48e7ad84a7bed4f01090dbc09155fb09998bfb8db2b448ab84b527e619fbfafc90e3369b4b49cc5a27d4d5aa apache-madlib-1.15-bin-Darwin.dmg Added: release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm == Binary file - no diff available. Propchange: release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm -- svn:mime-type = application/octet-stream Added: release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm.asc == --- release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm.asc (added) +++ release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm.asc Fri Aug 10 23:23:27 2018 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCAAdFiEE+pCcIAZjKnTgT/pQYwVq5BLE49cFAltqAGsACgkQYwVq5BLE +49eFQg//T9SFevZM1eqUwWoM8DhuRdHDNJRaLRuCqTn4RY6cwd4quMLbJ6tMrMcY +Da0JFfKHUk916eDvgDAyVbbYNfLI6+Td2xdXRZKJdkf8ju1XeLK3hx196C/g+DF+ +ldHILlIoizcLFsypSqOxSwqqIzZ4V+ZdHLsoGILsTQKdok5AuLRYmcJFu7bxbLWI +gx4tKTFhTJzzDC00Sq9eBIabsWUQhiR7WpmwswRtuOAcvJQH4rwjPjozeBqLGLt5 +/+554enRlTbQw+2URj5DybIYjEVba58sMN8cj83FPu0745e+2kTDW6oZ5TXXGc15 +Rh4PDkSd0+AoUWX64ccT1n/AINMwm1f3g7CWU1lrzXnwY9H9+eABFwtYNBsoPJQU +bp8QhvjrJMupRKaD89l3JpaRgwb1dxl57V0wKAqpfPBcXS2iElfpq2IZ9DyWOskz +/pIpgXNFt/JNkww6wxFVyPxZJMBpjDzKMY9UBBqtXcrwx7C6J6OlYWeZLFNpSS/+ +4oVoRJEncN25p9pR4mXlzLKnGQW0pjVrKZocAy55g0WXIilwGiauCO6cQO9cufnF +6698eIdj5K0ytmdxSsOiLv75j3tynne55aDF8xQTPsa4IDycpc8t/WlQnBmxT2Cs +y85kVrNoY05+57hxSE1entDMigjbqN0nSrUk2Cp3Mjd47rnrRPA= +=pmUc +-END PGP SIGNATURE- Added: release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm.sha512 == --- release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm.sha512 (added) +++ release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm.sha512 Fri Aug 10 23:23:27 2018 @@ -0,0 +1
[14/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__ordinal.html -- diff --git a/docs/rc/group__grp__ordinal.html b/docs/rc/group__grp__ordinal.html deleted file mode 100644 index 97590d8..000 --- a/docs/rc/group__grp__ordinal.html +++ /dev/null @@ -1,477 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: Ordinal Regression - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('group__grp__ordinal.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -Ordinal RegressionSupervised Learning Regression Models - - -Contents - -Training Function - -Prediction Function - -Examples - -Model Details - -Literature - -Related Topics - -In statistics, ordinal regression is a type of regression analysis used for predicting an ordinal variable, i.e. a variable whose value exists on an arbitrary scale where only the relative ordering between different values is significant. The two most common types of ordinal regression models are ordered logit, which applies to data that meet the proportional odds assumption, and ordered probit. -Training FunctionThe ordinal regression training function has the following syntax: -ordinal(source_table, - model_table, - dependent_varname, - independent_varname, - cat_order, - link_func, - grouping_col, - optim_params, - verbose -) - -Arguments -source_table -VARCHAR. Name of the table containing the training data. - - -model_table -VARCHAR. Name of the generated table containing the model. -The model table produced by ordinal() contains the following columns: - - -... Grouping columns, if provided in input. This could be multiple columns depending on the grouping_col input. - - - -coef_threshold FLOAT8[]. Vector of the threshold coefficients in linear predictor. The threshold coefficients are the intercepts specific to each categorical levels - - - -std_err_threshold FLOAT8[]. Vector of the threshold standard errors of the threshold coefficients. - - - -z_stats_threshold FLOAT8[]. Vector of the threshold z-statistics of the thresholdcoefficients. - - - -p_values_threshold FLOAT8[]. Vector of the threshold p-values of the threshold coefficients. - - - -log_likelihood FLOAT8. The log-likelihood \( l(\boldsymbol \beta) \). The value will be the same across categories within the same group. - - - -coef_feature FLOAT8[]. Vector of the feature coefficients in linear predictor. The feature coefficients are the coefficients for the independent variables. They are the same across categories. - - - -std_err_feature FLOAT8[]. Vector of the feature standard errors of the feature coefficients. - - - -z_stats_feature FLOAT8[]. Vector of the feature z-statistics of the feature coefficients. - - - -p_values_feature FLOAT8[]. Vector of the feature p-values of the feature coefficients. - - - -num_rows_processed BIGINT. Number of rows processed. - - - -num_rows_skipped BIGINT. Number of rows skipped due to missing values or failures. - - - -num_iterations INTEGER. Number of iterations actually completed. This would be different from the nIterations argument if a tolerance parameter is provided and the algorithm converges before all iterations are completed.
[13/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__pca.html -- diff --git a/docs/rc/group__grp__pca.html b/docs/rc/group__grp__pca.html deleted file mode 100644 index 681c6d9..000 --- a/docs/rc/group__grp__pca.html +++ /dev/null @@ -1,149 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: Dimensionality Reduction - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('group__grp__pca.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -Modules - -Dimensionality ReductionUnsupervised Learning - - -Detailed Description -Methods for reducing the number of variables in a dataset to obtain a set of principle variables. - - -Modules -Principal Component Analysis -Produces a model that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components. - -Principal Component Projection -Projects a higher dimensional data point to a lower dimensional subspace spanned by principal components learned through the PCA training procedure. - - - - - - - -Generated on Mon Aug 6 2018 21:55:39 for MADlib by -http://www.doxygen.org/index.html;> - 1.8.14 - - - - http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__pca.js -- diff --git a/docs/rc/group__grp__pca.js b/docs/rc/group__grp__pca.js deleted file mode 100644 index 2863cf8..000 --- a/docs/rc/group__grp__pca.js +++ /dev/null @@ -1,5 +0,0 @@ -var group__grp__pca = -[ -[ "Principal Component Analysis", "group__grp__pca__train.html", null ], -[ "Principal Component Projection", "group__grp__pca__project.html", null ] -]; \ No newline at end of file http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__pca__project.html -- diff --git a/docs/rc/group__grp__pca__project.html b/docs/rc/group__grp__pca__project.html deleted file mode 100644 index d5eda16..000 --- a/docs/rc/group__grp__pca__project.html +++ /dev/null @@ -1,513 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: Principal Component Projection - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); -
[17/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__marginal.html -- diff --git a/docs/rc/group__grp__marginal.html b/docs/rc/group__grp__marginal.html deleted file mode 100644 index d88997f..000 --- a/docs/rc/group__grp__marginal.html +++ /dev/null @@ -1,440 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: Marginal Effects - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('group__grp__marginal.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -Marginal EffectsSupervised Learning Regression Models - - -Contents - -Marginal Effects with Interaction Terms - -Examples - -Notes - -Technical Background - -Literature - -Related Topics - -A marginal effect (ME) or partial effect measures the effect on the conditional mean of \( y \) for a change in one of the regressors, say \(X_k\). In the linear regression model, the ME equals the relevant slope coefficient, greatly simplifying analysis. For nonlinear models, specialized algorithms are required for calculating ME. The marginal effect computed is the average of the marginal effect at every data point present in the source table. -MADlib provides marginal effects regression functions for linear, logistic and multinomial logistic regressions. -WarningThe margins_logregr() and margins_mlogregr() functions have been deprecated in favor of the margins() function. -Marginal Effects with Interaction Terms -margins( model_table, - output_table, - x_design, - source_table, - marginal_vars - ) - Arguments -model_table -VARCHAR. The name of the model table, which is the output of logregr_train() or mlogregr_train(). -output_table -VARCHAR. The name of the result table. The output table has the following columns. - -variables INTEGER[]. The indices of the basis variables. - -margins DOUBLE PRECISION[]. The marginal effects. - -std_err DOUBLE PRECISION[]. An array of the standard errors, computed using the delta method. - -z_stats DOUBLE PRECISION[]. An array of the z-stats of the marginal effects. - -p_values DOUBLE PRECISION[]. An array of the Wald p-values of the marginal effects. - - -x_design (optional) -VARCHAR, default: NULL. The design of independent variables, necessary only if interaction term or indicator (categorical) terms are present. This parameter is necessary since the independent variables in the underlying regression is not parsed to extract the relationship between variables. -Example: The independent_varname in the regression method can be specified in either of the following ways: - âarray[1, color_blue, color_green, gender_female, gpa, gpa^2, gender_female*gpa, gender_female*gpa^2, weight]â - âxâ - -In the second version, the column x is an array containing data identical to that expressed in the first version, computed in a prior data preparation step. Supply an x_design argument to the margins() function in the following way: - â1, i.color_blue.color, i.color_green.color, i.gender_female, gpa, gpa^2, gender_female*gpa, gender_female*gpa^2, weightâ - -The variable names ('gpa', 'weight', ...), referred to here as identifiers, should be unique for each basis variable and need not be the same as the original variable
[30/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__arraysmatrix.html -- diff --git a/docs/rc/group__grp__arraysmatrix.html b/docs/rc/group__grp__arraysmatrix.html deleted file mode 100644 index 520ac21..000 --- a/docs/rc/group__grp__arraysmatrix.html +++ /dev/null @@ -1,182 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: Arrays and Matrices - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('group__grp__arraysmatrix.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -Modules - -Arrays and MatricesData Types and Transformations - - -Detailed Description -These modules provide basic mathematical operations to be run on array and matrices. -For a distributed system, a matrix cannot simply be represented as a 2D array of numbers in memory. We provide two forms of distributed representation of a matrix: - -Dense: The matrix is represented as a distributed collection of 1-D arrays. An example 3x10 matrix would be the below table: - row_id | row_vec -+- - 1| {9,6,5,8,5,6,6,3,10,8} - 2| {8,2,2,6,6,10,2,1,9,9} - 3| {3,9,9,9,8,6,3,9,5,6} - -Sparse: The matrix is represented using the row and column indices for each non-zero entry of the matrix. Example: - row_id | col_id | value -++--- - 1 | 1 | 9 - 1 | 5 | 6 - 1 | 6 | 6 - 2 | 1 | 8 - 3 | 1 | 3 - 3 | 2 | 9 - 4 | 7 | 0 -(6 rows) - All matrix operations work with either form of representation. - -In many cases, a matrix function can be decomposed to vector operations applied independently on each row of a matrix (or corresponding rows of two matrices). We have also provided access to these internal vector operations (Array Operations) for greater flexibility. Matrix operations like matrix_add use the corresponding vector operation (array_add) and also include additional validation and formating. Other functions like matrix_mult are complex and use a combination of such vector operations and other SQL operations. -It's important to note that these array functions are only available for the dense format representation of the matrix. In general, the scope of a single array function invocation is limited to only an array (1-dimensional or 2-dimensional) that fits in memory. When such function is executed on a table of arrays, the function is called multiple times - once for each array (or pair of arrays). On contrary, scope of a single matrix function invocation is the complete matrix stored as a distributed table. - - -Modules -Array Operations -Provides fast array operations supporting other MADlib modules. - -Matrix Operations -Provides fast matrix operations supporting other MADlib modules. - -Matrix Factorization -Linear algebra methods that factorize a matrix into a product of matrices. - -Norms and Distance Functions -Provides utility functions for basic linear algebra operations. - -Sparse Vectors -Implements a sparse vector data type that provides compressed storage of vectors that may have many duplicate elements. - - - - - - - -Generated on Mon Aug 6 2018 21:55:39 for MADlib by -
[15/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__nn.html -- diff --git a/docs/rc/group__grp__nn.html b/docs/rc/group__grp__nn.html deleted file mode 100644 index d7569f6..000 --- a/docs/rc/group__grp__nn.html +++ /dev/null @@ -1,1143 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: Neural Network - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('group__grp__nn.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -Neural NetworkSupervised Learning - - -Contents - -Classification - -Regression - -Optimizer Parameters - -Prediction Functions - -Examples - -Technical Background - -Literature - -Related Topics - -Multilayer Perceptron (MLP) is a type of neural network that can be used for regression and classification. -MLPs consist of several fully connected hidden layers with non-linear activation functions. In the case of classification, the final layer of the neural net has as many nodes as classes, and the output of the neural net can be interpreted as the probability that a given input feature belongs to a specific class. -MLP can be used with or without mini-batching. The advantage of using mini-batching is that it can perform better than stochastic gradient descent (default MADlib optimizer) because it uses more than one training example at a time, typically resulting faster and smoother convergence [3]. -NoteIn order to use mini-batching, you must first run the Mini-Batch Preprocessor, which is a utility that prepares input data for use by models that support mini-batch as an optimization option, such as MLP. This is a one-time operation and you would only need to re-run the preprocessor if your input data has changed, or if you change the grouping parameter. -Classification Training FunctionThe MLP classification training function has the following format: - -mlp_classification( -source_table, -output_table, -independent_varname, -dependent_varname, -hidden_layer_sizes, -optimizer_params, -activation, -weights, -warm_start, -verbose, -grouping_col -) -Arguments -source_table -TEXT. Name of the table containing the training data. If you are using mini-batching, this is the name of the output table from the mini-batch preprocessor. - - -output_table -TEXT. Name of the output table containing the model. Details of the output table are shown below. - - -independent_varname -TEXT. Expression list to evaluate for the independent variables. It should be a numeric array expression. If you are using mini-batching, set this parameter to 'independent_varname' which is the hardcoded name of the column from the mini-batch preprocessor containing the packed independent variables. -NoteIf you are not using mini-batching, please note that an intercept variable should not be included as part of this expression - this is different from other MADlib modules. Also please note that independent variables should be encoded properly. All values are cast to DOUBLE PRECISION, so categorical variables should be one-hot or dummy encoded as appropriate. See Encoding Categorical Variables for more details. - -dependent_varname -TEXT. Name of the dependent variable column. For classification, supported
[18/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__linreg.html -- diff --git a/docs/rc/group__grp__linreg.html b/docs/rc/group__grp__linreg.html deleted file mode 100644 index 9e73ca3..000 --- a/docs/rc/group__grp__linreg.html +++ /dev/null @@ -1,479 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: Linear Regression - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('group__grp__linreg.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -Linear RegressionSupervised Learning Regression Models - - -Contents - -Training Function - -Prediction Function - -Examples - -Technical Background - -Literature - -Related Topics - -Linear regression models a linear relationship of a scalar dependent variable \( y \) to one or more explanatory independent variables \( x \) and builds a model of coefficients. -Training Function -The linear regression training function has the following syntax. -linregr_train( source_table, - out_table, - dependent_varname, - independent_varname, - grouping_cols, - heteroskedasticity_option - ) -Arguments -source_table -TEXT. Name of the table containing the training data. - - -out_table -TEXT. Name of the generated table containing the output model. -The output table contains the following columns: - -... Any grouping columns provided during training. Present only if the grouping option is used. - -coef FLOAT8[]. Vector of the coefficients of the regression. - -r2 FLOAT8. R-squared coefficient of determination of the model. - -std_err FLOAT8[]. Vector of the standard error of the coefficients. - -t_stats FLOAT8[]. Vector of the t-statistics of the coefficients. - -p_values FLOAT8[]. Vector of the p-values of the coefficients. - -condition_no FLOAT8 array. The condition number of the \(X^{*}X\) matrix. A high condition number is usually an indication that there may be some numeric instability in the result yielding a less reliable model. A high condition number often results when there is a significant amount of colinearity in the underlying design matrix, in which case other regression techniques, such as elastic net regression, may be more appropriate. - -bp_stats FLOAT8. The Breush-Pagan statistic of heteroskedacity. Present only if the heteroskedacity argument was set to True when the model was trained. - -bp_p_value FLOAT8. The Breush-Pagan calculated p-value. Present only if the heteroskedacity parameter was set to True when the model was trained. - -num_rows_processed INTEGER. The number of rows that are actually used in each group. - -num_missing_rows_skipped INTEGER. The number of rows that have NULL values in the dependent and independent variables, and were skipped in the computation for each group. - -variance_covariance FLOAT[]. Variance/covariance matrix. - -A summary table named out_table_summary is created together with the output table. It has the following columns: - -method 'linregr' for linear regression. - -source_table The data source table name - -out_table The output table name - -dependent_varname The dependent variable - -independent_varname The independent variables -
[37/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/dir_8f36046b7fd6891397115ddb47a5ee66.html -- diff --git a/docs/rc/dir_8f36046b7fd6891397115ddb47a5ee66.html b/docs/rc/dir_8f36046b7fd6891397115ddb47a5ee66.html deleted file mode 100644 index a5d0284..000 --- a/docs/rc/dir_8f36046b7fd6891397115ddb47a5ee66.html +++ /dev/null @@ -1,143 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: prob Directory Reference - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('dir_8f36046b7fd6891397115ddb47a5ee66.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -prob Directory Reference - - - - -Files -file prob.sql_in -SQL functions for evaluating probability functions. - - - - - - - -madlibsrcportspostgresmodulesprob -Generated on Mon Aug 6 2018 21:55:39 for MADlib by -http://www.doxygen.org/index.html;> - 1.8.14 - - - - http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/dir_a3a6204225c05cbe8d92623799329235.html -- diff --git a/docs/rc/dir_a3a6204225c05cbe8d92623799329235.html b/docs/rc/dir_a3a6204225c05cbe8d92623799329235.html deleted file mode 100644 index bbea54a..000 --- a/docs/rc/dir_a3a6204225c05cbe8d92623799329235.html +++ /dev/null @@ -1,142 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: src Directory Reference - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('dir_a3a6204225c05cbe8d92623799329235.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -src Directory Reference - - - - -Directories -directory pg_gp - - - - - - - -
[05/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__validation.html -- diff --git a/docs/rc/group__grp__validation.html b/docs/rc/group__grp__validation.html deleted file mode 100644 index 632fc47..000 --- a/docs/rc/group__grp__validation.html +++ /dev/null @@ -1,273 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: Cross Validation - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('group__grp__validation.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -Cross ValidationModel Selection - - -Contents - -Cross-Validation Function - -Examples - -Notes - -Technical Background - -Related Topics - -Estimates the fit of a predictive model given a data set and specifications for the training, prediction, and error estimation functions. -Cross validation, sometimes called rotation estimation, is a technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and you want to estimate how accurately a predictive model will perform in practice. -The cross-validation function provided by this module is very flexible and can work with algorithms you want to cross validate, including algorithms you write yourself. Among the inputs to the cross-validation function are specifications of the modelling, prediction, and error metric functions. These three-part specifications include the name of the function, an array of arguments to pass to the function, and an array of the data types of the arguments. This makes it possible to use functions from other MADlib modules or user-defined functions that you supply. - -The modelling (training) function takes in a given data set with independent and dependent variables and produces a model, which is stored in an output table. -The prediction function takes in the model generated by the modelling function and a different data set with independent variables, and produces a prediction of the dependent variables based on the model, which is stored in an output table. The prediction function should take a unique ID column name in the data table as one of the inputs, so that the prediction result can be compared with the validation values. Note: Prediction function in some MADlib modules do not save results into an output table. These prediction functions are not suitable for cross-validation. -The error metric function compares the prediction results with the known values of the dependent variables in the data set that was fed into the prediction function. It computes the error metric using the specified error metric function, storing the results in a table. - -Other inputs include the output table name, k value for the k-fold cross validation, and how many folds to try. For example, you can choose to run a simple validation instead of a full cross validation. -Cross-Validation Function - -cross_validation_general( modelling_func, - modelling_params, - modelling_params_type, - param_explored, - explore_values, - predict_func, - predict_params, -
[51/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
Move v1.15 RC1 to latest released Project: http://git-wip-us.apache.org/repos/asf/madlib-site/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib-site/commit/573d66d8 Tree: http://git-wip-us.apache.org/repos/asf/madlib-site/tree/573d66d8 Diff: http://git-wip-us.apache.org/repos/asf/madlib-site/diff/573d66d8 Branch: refs/heads/asf-site Commit: 573d66d85212546a9c200d9ff396124378fc478f Parents: 9a2b301 Author: Rahul Iyer Authored: Fri Aug 10 15:59:38 2018 -0700 Committer: Rahul Iyer Committed: Fri Aug 10 15:59:38 2018 -0700 -- data/.RData | Bin 0 -> 69 bytes data/.Rhistory |4 + docs/index.html |2 +- docs/latest |2 +- docs/rc/apsp_8sql__in.html | 335 - docs/rc/arima_8sql__in.html | 1070 --- docs/rc/array__ops_8sql__in.html| 1275 docs/rc/assoc__rules_8sql__in.html | 415 -- docs/rc/balance__sample_8sql__in.html | 497 -- docs/rc/bayes_8sql__in.html | 994 --- docs/rc/bc_s.png| Bin 676 -> 0 bytes docs/rc/bdwn.png| Bin 147 -> 0 bytes docs/rc/bfs_8sql__in.html | 443 -- docs/rc/closed.png | Bin 132 -> 0 bytes docs/rc/clustered__variance_8sql__in.html | 1954 -- .../rc/clustered__variance__coxph_8sql__in.html | 496 -- docs/rc/cols2vec_8sql__in.html | 316 - docs/rc/conjugate__gradient_8sql__in.html | 263 - docs/rc/correlation_8sql__in.html | 685 -- docs/rc/cox__prop__hazards_8sql__in.html| 2150 -- docs/rc/create__indicators_8sql__in.html| 340 - docs/rc/crf_8sql__in.html | 559 -- docs/rc/crf__data__loader_8sql__in.html | 342 - docs/rc/crf__feature__gen_8sql__in.html | 305 - docs/rc/cross__validation_8sql__in.html | 717 -- docs/rc/decision__tree_8sql__in.html| 2764 docs/rc/dense__linear__systems_8sql__in.html| 647 -- .../dir_012f026af89a95e7964e87a3db4f3f72.html | 142 - .../dir_080635afba7a03bce9bcf848b744ecef.html | 142 - .../dir_082e548d8897978bd67db5bb10c3f4ca.html | 142 - .../dir_0e2c82fdc38d6347747c84b2495b87bb.html | 143 - .../dir_0f0603029f2766ba6362c0486f42266f.html | 142 - .../dir_1d47b74c56eeb36d1b42c4eefe6268df.html | 143 - .../dir_1f3edc2a41a90b71d908e98c40e8e20f.html | 143 - .../dir_20517c5c235c3aa13e267a0084f413b4.html | 143 - .../dir_212c462ae803c05eae1fe2b1df645c56.html | 142 - .../dir_26cdf48399aa0105c53c7623e443a32b.html | 150 - .../dir_2fd46cdf9feef20c5a1de2fea1748af4.html | 143 - .../dir_31661a94ac35e1e3b8f7fadfa53703b5.html | 146 - .../dir_3c5e27e75c1f20438079b385c860229e.html | 143 - .../dir_3e00766ce7bbd3258084476ece235bc0.html | 152 - .../dir_3e5da3f4b4c531df2ac983d22a9bd897.html | 143 - .../dir_4efa676c70d986e4be6149ce0c1d0b98.html | 149 - .../dir_508e3ba2de19c9cb39df85c09ad79f77.html | 143 - .../dir_51681ee935e6dd3c9bf433c39db08bf4.html | 142 - .../dir_57f83f46582e45fe02cb0209b9cad992.html | 152 - .../dir_6944d646d96379d734d568fa9f457ac2.html | 142 - .../dir_6c8c1662e04f4d84cf895381f3c4ee75.html | 169 - .../dir_7b19f40af17a56bc8266e4b0ec256b61.html | 146 - .../dir_7b71f02250bd83717b51065786bd49f6.html | 143 - .../dir_7bef9b9f49f23083f873cdb4f9aa5595.html | 148 - .../dir_7f0185f98acca08613e6e8b8ed2c9454.html | 142 - .../dir_84af2f6304104e948345b9ffbceda59c.html | 143 - .../dir_8a1630b9e626a27a0fba85144676dd7e.html | 143 - .../dir_8b6eadc8746db3a817149b816651d271.html | 143 - .../dir_8f36046b7fd6891397115ddb47a5ee66.html | 143 - .../dir_a3a6204225c05cbe8d92623799329235.html | 142 - .../dir_a50d939c472d90effb762b784a85c42f.html | 143 - .../dir_abda3f8ccfdd5b50c49da6f23b1283ee.html | 144 - .../dir_ac8432244a3c88336507031a023f4059.html | 151 - .../dir_b18f8a58178adf1282c19b355fc56476.html | 157 - .../dir_bfb6fe26cbfcbd6092eb3bff4002d9b4.html | 142 - .../dir_c8a121080de679af346a38eb58b36514.html | 142 - .../dir_c97a42988cbd79aebe75c02ffb75992a.html | 144 - .../dir_cc740a115287ad80150f497f51742950.html | 143 - .../dir_ce4fa7aad06dd1bbca713eb50be7391b.html | 190 - .../dir_d2a50e0bf8f1a4defa2e0ba0a193a480.html | 142 - .../dir_d3dbcc2e792650c67298228e381c9a26.html | 143 - .../dir_d9f864a50dc114ae327fea67d9326f10.html | 161 - .../dir_efdb815e8132703ae96e54278f654003.html | 142 - .../dir_f22d7b16c4d94fc51216129c2f2d4ca9.html | 143 - .../dir_f6ab7d321b1475f96a73949691e0e1a0.html | 157 - .../dir_fe3b9425dacf2fb6ecd5c85236398360.html | 142 - docs/rc/distribution_8sql__in.html | 330 -
[01/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
Repository: madlib-site Updated Branches: refs/heads/asf-site 9a2b301d3 -> 573d66d85 http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/lda_8sql__in.html -- diff --git a/docs/rc/lda_8sql__in.html b/docs/rc/lda_8sql__in.html deleted file mode 100644 index e2423a7..000 --- a/docs/rc/lda_8sql__in.html +++ /dev/null @@ -1,1422 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: lda.sql_in File Reference - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('lda_8sql__in.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -Functions - -lda.sql_in File Reference - - - -SQL functions for Latent Dirichlet Allocation. -More... - - -Functions -set lda_result lda_train (text data_table, text model_table, text output_data_table, int4 voc_size, int4 topic_num, int4 iter_num, float8 alpha, float8 beta) -This UDF provides an entry for the lda training process. More... - -set lda_result lda_predict (text data_table, text model_table, text output_table) -This UDF provides an entry for the lda predicton process. More... - -set lda_result lda_predict (text data_table, text model_table, text output_table, int4 iter_num) -A overloaded version which allows users to specify iter_num. More... - -set lda_result lda_get_topic_word_count (text model_table, text output_table) -This UDF computes the per-topic word counts. More... - -set lda_result lda_get_word_topic_count (text model_table, text output_table) -This UDF computes the per-word topic counts. More... - -set lda_result lda_get_topic_desc (text model_table, text vocab_table, text desc_table, int4 top_k) -This UDF gets the description for each topic (top-k words) More... - -set lda_result lda_get_word_topic_mapping (text lda_output_table, text mapping_table) -This UDF gets the wordid - topicid mapping from the lda training output table. More... - -int4 []__lda_random_assign (int4 word_count, int4 topic_num) -This UDF assigns topics to words in a document randomly. More... - -int4 []__lda_gibbs_sample (int4[] words, int4[] counts, int4[] doc_topic, int8[] model, float8 alpha, float8 beta, int4 voc_size, int4 topic_num, int4 iter_num) -This UDF learns the topics of words in a document and is the main step of a Gibbs sampling iteration. The model parameter (including the per-word topic counts and corpus-level topic counts) is passed to this function in the first call and then transfered to the rest calls through fcinfo-flinfo-fn_extra to allow the immediate update. More... - -int8 []__lda_count_topic_sfunc (int8[] state, int4[] words, int4[] counts, int4[] topic_assignment, int4 voc_size, int4 topic_num) -This UDF is the sfunc for the aggregator computing the topic counts for each word and the topic count in the whole corpus. It scans the topic assignments in a document and updates the topic counts. More... - -int8 []__lda_count_topic_prefunc (int8[] state1, int8[] state2) -This UDF is the prefunc for the aggregator computing the per-word topic counts. More... - -aggregate int8 []__lda_count_topic_agg (int4[], int4[], int4[], int4, int4) -This uda computes the word topic counts by scanning and summing up topic assignments in each document. More... - -float8lda_get_perplexity (text model_table, text
[08/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__strs.html -- diff --git a/docs/rc/group__grp__strs.html b/docs/rc/group__grp__strs.html deleted file mode 100644 index a3a6e3b..000 --- a/docs/rc/group__grp__strs.html +++ /dev/null @@ -1,269 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: Stratified Sampling - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('group__grp__strs.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -Stratified SamplingSampling - - -Contents - -Stratified Sampling - -Examples - -Stratified sampling is a method for independently sampling subpopulations (strata). It is commonly used to reduce sampling error by ensuring that subgroups are adequately represented in the sample. -Stratified Sampling - -stratified_sample( source_table, -output_table, -proportion, -grouping_cols, -target_cols, -with_replacement - ) -Arguments -source_table -TEXT. Name of the table containing the input data. - - -output_table -TEXT. Name of output table that contains the sampled data. The output table contains all columns present in the source table unless otherwise specified in the 'target_cols' parameter below. - - -proportion -FLOAT8 in the range (0,1). Each stratum is sampled independently. - - -grouping_cols (optional) -TEXT, default: NULL. A single column or a list of comma-separated columns that defines the strata. When this parameter is NULL, no grouping is used so the sampling is non-stratified, that is, the whole table is treated as a single group. - - -target_cols (optional) -TEXT, default NULL. A comma-separated list of columns to appear in the 'output_table'. If NULL or '*', all columns from the 'source_table' will appear in the 'output_table'. -NoteDo not include 'grouping_cols' in the parameter 'target_cols', because they are always included in the 'output_table'. - -with_replacement (optional) -BOOLEAN, default FALSE. Determines whether to sample with replacement or without replacement (default). With replacement means that it is possible that the same row may appear in the sample set more than once. Without replacement means a given row can be selected only once. - -Examples -Please note that due to the random nature of sampling, your results may look different from those below. - -Create an input table: -DROP TABLE IF EXISTS test; -CREATE TABLE test( -id1 INTEGER, -id2 INTEGER, -gr1 INTEGER, -gr2 INTEGER -); -INSERT INTO test VALUES -(1,0,1,1), -(2,0,1,1), -(3,0,1,1), -(4,0,1,1), -(5,0,1,1), -(6,0,1,1), -(7,0,1,1), -(8,0,1,1), -(9,0,1,1), -(9,0,1,1), -(9,0,1,1), -(9,0,1,1), -(0,1,1,2), -(0,2,1,2), -(0,3,1,2), -(0,4,1,2), -(0,5,1,2), -(0,6,1,2), -(10,10,2,2), -(20,20,2,2), -(30,30,2,2), -(40,40,2,2), -(50,50,2,2), -(60,60,2,2), -(70,70,2,2); - -Sample without replacement: -DROP TABLE IF EXISTS out; -SELECT madlib.stratified_sample( -'test',-- Source table -'out', -- Output table -0.5, -- Sample proportion -'gr1,gr2', -- Strata definition -
[26/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__crf.html -- diff --git a/docs/rc/group__grp__crf.html b/docs/rc/group__grp__crf.html deleted file mode 100644 index 20fd7da..000 --- a/docs/rc/group__grp__crf.html +++ /dev/null @@ -1,632 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: Conditional Random Field - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('group__grp__crf.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -Conditional Random FieldSupervised Learning - - -Contents - -Training Feature Generation - -CRF Training Function - -Testing Feature Generation - -Inference using Viterbi - -Using CRF - -Examples - -Technical Background - -Literature - -Related Topics - -A conditional random field (CRF) is a type of discriminative, undirected probabilistic graphical model. A linear-chain CRF is a special type of CRF that assumes the current state depends only on the previous state. -Feature extraction modules are provided for text-analysis tasks such as part-of-speech (POS) tagging and named-entity resolution (NER). Currently, six feature types are implemented: - -Edge Feature: transition feature that encodes the transition feature weight from current label to next label. -Start Feature: fired when the current token is the first token in a sequence. -End Feature: fired when the current token is the last token in a sequence. -Word Feature: fired when the current token is observed in the trained dictionary. -Unknown Feature: fired when the current token is not observed in the trained dictionary for at least a certain number of times (default 1). -Regex Feature: fired when the current token can be matched by a regular expression. - -A Viterbi implementation is also provided to get the best label sequence and the conditional probability \( \Pr( \text{best label sequence} \mid \text{sequence}) \). -Following steps are required for CRF Learning and Inference: -Training Feature Generation -CRF Training -Testing Feature Generation -Inference using Viterbi - -Training Feature GenerationThe function takes train_segment_tbl and regex_tbl as input and does feature generation generating three tables dictionary_tbl, train_feature_tbl and train_featureset_tbl, that are required as an input for CRF training. -crf_train_fgen(train_segment_tbl, - regex_tbl, - label_tbl, - dictionary_tbl, - train_feature_tbl, - train_featureset_tbl) - Arguments -train_segment_tbl -TEXT. Name of the training segment table. The table is expected to have the following columns: - -doc_id INTEGER. Document id column - -start_pos INTEGER. Index of a particular term in the respective document - -seg_text TEXT. Term at the respective start_pos in the document - -label INTEGER. Label id for the term corresponding to the actual label from label_tbl - - -regex_tbl -TEXT. Name of the regular expression table. The table is expected to have the following columns: - -pattern TEXT. Regular Expression - -name TEXT. Regular Expression name - - -label_tbl -TEXT. Name of the table containing unique labels and their id's. The table is expected to have the following columns: - -id INTEGER. Unique
[29/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__balance__sampling.html -- diff --git a/docs/rc/group__grp__balance__sampling.html b/docs/rc/group__grp__balance__sampling.html deleted file mode 100644 index 20a971d..000 --- a/docs/rc/group__grp__balance__sampling.html +++ /dev/null @@ -1,607 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: Balanced Sampling - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('group__grp__balance__sampling.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -Balanced SamplingSampling - - -Contents - -Balanced Sampling - -Examples - -Literature - -Related Topics - -Some classification algorithms only perform optimally when the number of samples in each class is roughly the same. Highly skewed datasets are common in many domains (e.g., fraud detection), so resampling to offset this imbalance can produce a better decision boundary. -This module offers a number of resampling techniques including undersampling majority classes, oversampling minority classes, and combinations of the two. -Balanced Sampling - -balance_sample( source_table, -output_table, -class_col, -class_sizes, -output_table_size, -grouping_cols, -with_replacement, -keep_null - ) -Arguments -source_table -TEXT. Name of the table containing the input data. - - -output_table -TEXT. Name of output table that contains the sampled data. The output table contains all columns present in the source table, plus a new generated id called "__madlib_id__" added as the first column. - - -class_col -TEXT, Name of the column containing the class to be balanced. - - -class_sizes (optional) -VARCHAR, default âuniformâ. Parameter to define the size of the different class values. (Class values are sometimes also called levels). Can be set to the following: - - -âuniformâ: All class values will be resampled to have the same number of rows. - -'undersample': Undersample such that all class values end up with the same number of observations as the minority class. Done without replacement by default unless the parameter âwith_replacementâ is set to TRUE. - -'oversample': Oversample with replacement such that all class values end up with the same number of observations as the majority class. Not affected by the parameter âwith_replacementâ since oversampling is always done with replacement. Short forms of the above will work too, e.g., 'uni' works the same as 'uniform'. - -Alternatively, you can also explicitly set class size in a string containing a comma-delimited list. Order does not matter and all class values do not need to be specified. Use the format âclass_value_1=x, class_value_2=y, â¦â where 'class_value' in the list must exist in the column 'class_col'. Set to an integer representing the desired number of observations. E.g., âred=3000, blue=4000â means you want to resample the dataset to result in exactly 3000 red and 4000 blue rows in the âoutput_tableâ. -NoteThe allowed names for class values follows object naming rules in PostgreSQL [1]. Quoted identifiers are allowed and should be enclosed
[48/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/balance__sample_8sql__in.html -- diff --git a/docs/rc/balance__sample_8sql__in.html b/docs/rc/balance__sample_8sql__in.html deleted file mode 100644 index 1e22170..000 --- a/docs/rc/balance__sample_8sql__in.html +++ /dev/null @@ -1,497 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: balance_sample.sql_in File Reference - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('balance__sample_8sql__in.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -Functions - -balance_sample.sql_in File Reference - - - -SQL functions for balanced data sets sampling. -More... - - -Functions -voidbalance_sample (text source_table, text output_table, text class_col, varchar class_sizes, integer output_table_size, text grouping_cols, boolean with_replacement, boolean keep_null) - -voidbalance_sample (text source_table, text output_table, text class_col, varchar class_sizes, integer output_table_size, text grouping_cols, boolean with_replacement) - -voidbalance_sample (text source_table, text output_table, text class_col, varchar class_sizes, integer output_table_size, text grouping_cols) - -voidbalance_sample (text source_table, text output_table, text class_col, varchar class_sizes, integer output_table_size) - -voidbalance_sample (text source_table, text output_table, text class_col, varchar class_sizes) - -voidbalance_sample (text source_table, text output_table, text class_col) - -varcharbalance_sample (varchar message) - -varcharbalance_sample () - - -Detailed Description -Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at -http://www.apache.org/licenses/LICENSE-2.0;>http://www.apache.org/licenses/LICENSE-2.0 -Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -Date12/14/2017 -See alsoGiven a table, balanced sampling returns a sampled data set with specified proportions for each class (defaults to uniform sampling). -Function Documentation - -balance_sample() [1/8] - - - - - - void balance_sample - ( - text - source_table, - - - - - text - output_table, - - - - - text - class_col, - - - - - varchar - class_sizes, - - - - - integer - output_table_size, - - - - - text - grouping_cols, - - - - - boolean - with_replacement, - - -
[49/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/array__ops_8sql__in.html -- diff --git a/docs/rc/array__ops_8sql__in.html b/docs/rc/array__ops_8sql__in.html deleted file mode 100644 index c6140a5..000 --- a/docs/rc/array__ops_8sql__in.html +++ /dev/null @@ -1,1275 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: array_ops.sql_in File Reference - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('array__ops_8sql__in.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -Functions - -array_ops.sql_in File Reference - - - -implementation of array operations in SQL -More... - - -Functions -anyarrayarray_add (anyarray x, anyarray y) -Adds two arrays. It requires that all the values are NON-NULL. Return type is the same as the input type. More... - -aggregate anyarraysum (anyarray) -Aggregate, element-wise sum of arrays. It requires that all the values are NON-NULL. Return type is the same as the input type. More... - -anyarrayarray_sub (anyarray x, anyarray y) -Subtracts two arrays. It requires that all the values are NON-NULL. Return type is the same as the input type. More... - -anyarrayarray_mult (anyarray x, anyarray y) -Element-wise product of two arrays. It requires that all the values are NON-NULL. Return type is the same as the input type. More... - -anyarrayarray_div (anyarray x, anyarray y) -Element-wise division of two arrays. It requires that all the values are NON-NULL. Return type is the same as the input type. More... - -float8array_dot (anyarray x, anyarray y) -Dot-product of two arrays. It requires that all the values are NON-NULL. Return type is the same as the input type. More... - -boolarray_contains (anyarray x, anyarray y) -Checks whether one array contains the other. This function returns TRUE if each non-zero element in the right array equals to the element with the same index in the left array. More... - -anyelementarray_max (anyarray x) -This function finds the maximum value in the array. NULLs are ignored. Return type is the same as the input type. More... - -float8 []array_max_index (anyarray x) -This function finds the maximum value and corresponding index in the array. NULLs are ignored. Return type is the same as the input type. More... - -anyelementarray_min (anyarray x) -This function finds the minimum value in the array. NULLs are ignored. Return type is the same as the input type. More... - -float8 []array_min_index (anyarray x) -This function finds the minimum value and corresponding index in the array. NULLs are ignored. Return type is the same as the input type. More... - -anyelementarray_sum (anyarray x) -This function finds the sum of the values in the array. NULLs are ignored. Return type is the same as the input type. More... - -float8array_sum_big (anyarray x) -This function finds the sum of the values in the array. NULLs are ignored. Return type is always FLOAT8 regardless of input. This function is meant to replace array_sum() in the cases when sum may overflow the element type. More... - -anyelementarray_abs_sum (anyarray x) -This function finds the sum of abs of the values in the array. NULLs are ignored. Return type is the same as the input type. More... - -anyarrayarray_abs
[43/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/create__indicators_8sql__in.html -- diff --git a/docs/rc/create__indicators_8sql__in.html b/docs/rc/create__indicators_8sql__in.html deleted file mode 100644 index 51316e0..000 --- a/docs/rc/create__indicators_8sql__in.html +++ /dev/null @@ -1,340 +0,0 @@ - -http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> -http://www.w3.org/1999/xhtml;> - - - - - -MADlib: create_indicators.sql_in File Reference - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(initResizable); -/* @license-end */ - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ - $(document).ready(function() { init_search(); }); -/* @license-end */ - - - MathJax.Hub.Config({ -extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], -jax: ["input/TeX","output/HTML-CSS"], -}); -https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> - - - - - - - (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ - (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), - m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) - })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); - ga('create', 'UA-45382226-1', 'madlib.apache.org'); - ga('send', 'pageview'); - - - - - - - - - http://madlib.apache.org;> - - - 1.15 - - User Documentation for Apache MADlib - - - - - - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -var searchBox = new SearchBox("searchBox", "search",false,'Search'); -/* @license-end */ - - - - - - - - - - - - -/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ -$(document).ready(function(){initNavTree('create__indicators_8sql__in.html','');}); -/* @license-end */ - - - - - - - - - - - - - - -Functions - -create_indicators.sql_in File Reference - - - -SQL functions for dummy coding categorical variables. -More... - - -Functions -voidcreate_indicator_variables (text source_table, text out_table, text categorical_cols, boolean keep_null, text distributed_by) -Create new table containing dummy coded variables for categorical variables. More... - -voidcreate_indicator_variables (text source_table, text out_table, text categorical_cols, boolean keep_null) -Create new table containing dummy coded variables for categorical variables. More... - -voidcreate_indicator_variables (text source_table, text out_table, text categorical_cols) - -varcharcreate_indicator_variables (varchar message) - -varcharcreate_indicator_variables () - - -Detailed Description -Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at -http://www.apache.org/licenses/LICENSE-2.0;>http://www.apache.org/licenses/LICENSE-2.0 -Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -DateJune 2014 -See alsoCalculates dummy-coded indicator variables for categorical variables -Function Documentation - -create_indicator_variables() [1/5] - - - - - - void create_indicator_variables - ( - text - source_table, - - - - - text - out_table, - - - - - text - categorical_cols, - - - - - boolean - keep_null, - - - - - text - distributed_by - - - - ) - - - - -Parameters - -source_tableName of table containing categorical variable -out_tableName of table to output dummy variables -categorical_colsComma-separated list of column names to dummy code -keep_nullBoolean to determine the behavior for rows with NULL value -distributed_byComma-separated list of column names to use for distribution of output - - - -ReturnsVoid - - - -
[40/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/dense__linear__systems_8sql__in.html -- diff --git a/docs/rc/dense__linear__systems_8sql__in.html b/docs/rc/dense__linear__systems_8sql__in.html new file mode 100644 index 000..4b9a16f --- /dev/null +++ b/docs/rc/dense__linear__systems_8sql__in.html @@ -0,0 +1,647 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: dense_linear_systems.sql_in File Reference + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('dense__linear__systems_8sql__in.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Functions + +dense_linear_systems.sql_in File Reference + + + +SQL functions for linear systems. +More... + + +Functions +bytea8dense_residual_norm_transition (bytea8 state, float8[] a, float8 b, float8[] x) + +bytea8dense_residual_norm_merge_states (bytea8 state1, bytea8 state2) + +residual_norm_resultdense_residual_norm_final (bytea8 state) + +aggregate residual_norm_resultdense_residual_norm (float8[] left_hand_side, float8 right_hand_side, float8[] solution) +Compute the residual after solving the dense linear systems. More... + +float8 []dense_direct_linear_system_transition (float8[] state, integer row_id, float8[] a, float8 b, integer num_rows, integer algorithm) + +float8 []dense_direct_linear_system_merge_states (float8[] state1, float8[] state2) + +dense_linear_solver_resultdense_direct_linear_system_final (float8[] state) + +aggregate dense_linear_solver_resultdense_direct_linear_system (integer row_id, float8[] left_hand_side, float8 right_hand_side, integer numEquations, integer algorithm) +Solve a system of linear equations using the direct method. More... + +varcharlinear_solver_dense (varchar input_string) +Help function, to print out the supported families. More... + +varcharlinear_solver_dense () + +voidlinear_solver_dense (varchar source_table, varchar out_table, varchar row_id, varchar left_hand_side, varchar right_hand_side, varchar grouping_cols, varchar optimizer, varchar optimizer_options) +A wrapper function for the various marginal linear_systemsion analyzes. More... + +voidlinear_solver_dense (varchar source_table, varchar out_table, varchar row_id, varchar left_hand_side, varchar right_hand_side) +Marginal effects with default variables. More... + + +Detailed Description +DateJuly 2013 +See alsoComputes the solution of a consistent linear system, for more details see the module description at Dense Linear Systems +Function Documentation + +dense_direct_linear_system() + + + + + + aggregate dense_linear_solver_result dense_direct_linear_system + ( + integer + row_id, + + + + + float8 [] + left_hand_side, + + + + + float8 + right_hand_side, + + + + + integer + numEquations, + + + + + integer + algorithm + + + + ) + + + + +Parameters + +row_idColumn containing the row_id +left_hand_sideColumn containing the left hand side of the system +
[45/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/clustered__variance__coxph_8sql__in.html -- diff --git a/docs/rc/clustered__variance__coxph_8sql__in.html b/docs/rc/clustered__variance__coxph_8sql__in.html new file mode 100644 index 000..46fe8d4 --- /dev/null +++ b/docs/rc/clustered__variance__coxph_8sql__in.html @@ -0,0 +1,496 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: clustered_variance_coxph.sql_in File Reference + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('clustered__variance__coxph_8sql__in.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Functions + +clustered_variance_coxph.sql_in File Reference + + + +SQL functions for clustered robust cox proportional hazards regression. +More... + + +Functions +varcharclustered_variance_coxph () + +varcharclustered_variance_coxph (varchar message) + +voidclustered_variance_coxph (text model_table, text output_table, text clustervar) + +float8 []coxph_a_b_transition (float8[], integer, boolean, float8[], float8) + +float8 []coxph_a_b_merge (float8[], float8[]) + +__coxph_a_b_resultcoxph_a_b_final (float8[]) + +aggregate __coxph_a_b_resultcoxph_a_b (integer, boolean, float8[], float8) + +float8 []coxph_compute_w (float8[] x, boolean status, float8[] coef, float8[] h, float8 s, float8 a, float8[] b) + +__coxph_cl_var_resultcoxph_compute_clustered_stats (float8[] coef, float8[] hessian, float8[] a) + +voidrobust_variance_coxph (varchar model_table, varchar output_table, varchar clustervar) + + +Detailed Description +DateOct 2013 +See alsoFor a brief introduction to clustered robust cox regression, see the module description Clustered Variance +Function Documentation + +clustered_variance_coxph() [1/3] + + + + + + varchar clustered_variance_coxph + ( + ) + + + + + + + + +clustered_variance_coxph() [2/3] + + + + + + varchar clustered_variance_coxph + ( + varchar + message) + + + + + + + + +clustered_variance_coxph() [3/3] + + + + + + void clustered_variance_coxph + ( + text + model_table, + + + + + text + output_table, + + + + + text + clustervar + + + + ) + + + + + + + + +coxph_a_b() + + + + + + aggregate __coxph_a_b_result coxph_a_b + ( + integer + , + + + + + boolean + , + + + + + float8 + [], + + + + + float8 + + + + + ) + + + + + + + + +coxph_a_b_final() + + + + + + __coxph_a_b_result coxph_a_b_final + ( + float8 + []) + + + + + + + + +coxph_a_b_merge() + + + + + + float8 [] coxph_a_b_merge + ( + float8 + [], + + + + +
[48/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/balance__sample_8sql__in.html -- diff --git a/docs/rc/balance__sample_8sql__in.html b/docs/rc/balance__sample_8sql__in.html new file mode 100644 index 000..1e22170 --- /dev/null +++ b/docs/rc/balance__sample_8sql__in.html @@ -0,0 +1,497 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: balance_sample.sql_in File Reference + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('balance__sample_8sql__in.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Functions + +balance_sample.sql_in File Reference + + + +SQL functions for balanced data sets sampling. +More... + + +Functions +voidbalance_sample (text source_table, text output_table, text class_col, varchar class_sizes, integer output_table_size, text grouping_cols, boolean with_replacement, boolean keep_null) + +voidbalance_sample (text source_table, text output_table, text class_col, varchar class_sizes, integer output_table_size, text grouping_cols, boolean with_replacement) + +voidbalance_sample (text source_table, text output_table, text class_col, varchar class_sizes, integer output_table_size, text grouping_cols) + +voidbalance_sample (text source_table, text output_table, text class_col, varchar class_sizes, integer output_table_size) + +voidbalance_sample (text source_table, text output_table, text class_col, varchar class_sizes) + +voidbalance_sample (text source_table, text output_table, text class_col) + +varcharbalance_sample (varchar message) + +varcharbalance_sample () + + +Detailed Description +Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0;>http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. +Date12/14/2017 +See alsoGiven a table, balanced sampling returns a sampled data set with specified proportions for each class (defaults to uniform sampling). +Function Documentation + +balance_sample() [1/8] + + + + + + void balance_sample + ( + text + source_table, + + + + + text + output_table, + + + + + text + class_col, + + + + + varchar + class_sizes, + + + + + integer + output_table_size, + + + + + text + grouping_cols, + + + + + boolean + with_replacement, + + + +
[27/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__correlation.html -- diff --git a/docs/rc/group__grp__correlation.html b/docs/rc/group__grp__correlation.html new file mode 100644 index 000..742985a --- /dev/null +++ b/docs/rc/group__grp__correlation.html @@ -0,0 +1,397 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: Covariance and Correlation + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('group__grp__correlation.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Covariance and CorrelationStatistics Descriptive Statistics + + +Contents + +Covariance and Correlation Functions + +Examples + +Literature + +Related Topics + +A correlation function is the degree and direction of association of two variableshow well one random variable can be predicted from the other. It is a normalized version of covariance. The Pearson correlation coefficient is used here, which has a value between -1 and 1, where 1 implies total positive linear correlation, 0 means no linear correlation, and -1 means total negative linear correlation. +This function generates an \(N\)x \(N\) cross correlation matrix for pairs of numeric columns in a source_table. It is square symmetrical with the \( (i,j) \)th element equal to the correlation coefficient between the \(i\)th and the \(j\)th variable. The diagonal elements (correlations of variables with themselves) are always equal to 1.0. +We also provide a covariance function which is similar in nature to correlation, and is a measure of the joint variability of two random variables. +Covariance and Correlation Functions +The correlation function has the following syntax: +correlation( source_table, + output_table, + target_cols, + verbose, + grouping_cols + ) +The covariance function has a similar syntax: +covariance( source_table, +output_table, +target_cols, +verbose, +grouping_cols + ) + +source_table +TEXT. Name of the table containing the input data. + + +output_table +TEXT. Name of the table containing the cross correlation matrix. The output table has N rows, where N is the number of 'target_cols' in the 'source_table' for which correlation or covariance is being computed. It has the following columns: + +column_position An automatically generated sequential counter indicating the order of the variable in the 'output_table'. + +variable Contains the row header for the variables of interest. + +grouping_cols Contains the grouping columns, if any. + +... The remainder of the table is the NxN correlation matrix for the pairs of variables of interest. + +The output table is arranged as a lower-triangular matrix with the upper triangle set to NULL and the diagonal elements set to 1.0. To obtain the result from the 'output_table' order by 'column_position': +SELECT * FROM output_table ORDER BY column_position; +In addition to output table, a summary table named output_table_summary is also created, which has the following columns: + +method'Correlation' or 'Covariance' + +source_tableVARCHAR. Data source table name. + +output_tableVARCHAR. Output table name. + +column_namesVARCHAR. Column names
[11/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__random__forest.html -- diff --git a/docs/rc/group__grp__random__forest.html b/docs/rc/group__grp__random__forest.html new file mode 100644 index 000..42255a6 --- /dev/null +++ b/docs/rc/group__grp__random__forest.html @@ -0,0 +1,1157 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: Random Forest + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('group__grp__random__forest.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Random ForestSupervised Learning Tree Methods + + +Contents + +Training Function + +Prediction Function + +Tree Display + +Importance Display + +Examples + +Literature + +Related Topics + +Random forest builds an ensemble of classifiers, each of which is a tree model constructed using bootstrapped samples from the input data. The results of these models are then combined to yield a single prediction, which, at the expense of some loss in interpretation, can be highly accurate. Refer to Breiman et al. [1][2][3] for details on the implementation used here. +Also refer to the decision tree user documentation since many parameters and examples are similar to random forest. +Training FunctionRandom forest training function has the following format: +forest_train(training_table_name, + output_table_name, + id_col_name, + dependent_variable, + list_of_features, + list_of_features_to_exclude, + grouping_cols, + num_trees, + num_random_features, + importance, + num_permutations, + max_tree_depth, + min_split, + min_bucket, + num_splits, + null_handling_params, + verbose, + sample_ratio + ) + +Arguments +training_table_name +text. Name of the table containing the training data. + + +output_table_name +TEXT. Name of the generated table containing the model. If a table with the same name already exists, an error will be returned. A summary table named output_table_name_summary and a grouping table named output_table_name_group are also created. These are described later on this page. + + +id_col_name +TEXT. Name of the column containing id information in the training data. This is a mandatory argument and is used for prediction and other purposes. The values are expected to be unique for each row. + + +dependent_variable +TEXT. Name of the column that contains the output (response) for training. Boolean, integer and text types are considered to be classification outputs, while double precision values are considered to be regression outputs. The response variable for a classification tree can be multinomial, but the time and space complexity of the training function increases linearly as the number of response classes increases. + + +list_of_features +TEXT. Comma-separated string of column names or expressions to use as predictors. Can also be a '*' implying all columns are to be used as predictors (except for the ones included in the next argument that lists exclusions). The types of the features can be mixed: boolean, integer, and text columns are considered
[17/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__marginal.html -- diff --git a/docs/rc/group__grp__marginal.html b/docs/rc/group__grp__marginal.html new file mode 100644 index 000..d88997f --- /dev/null +++ b/docs/rc/group__grp__marginal.html @@ -0,0 +1,440 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: Marginal Effects + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('group__grp__marginal.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Marginal EffectsSupervised Learning Regression Models + + +Contents + +Marginal Effects with Interaction Terms + +Examples + +Notes + +Technical Background + +Literature + +Related Topics + +A marginal effect (ME) or partial effect measures the effect on the conditional mean of \( y \) for a change in one of the regressors, say \(X_k\). In the linear regression model, the ME equals the relevant slope coefficient, greatly simplifying analysis. For nonlinear models, specialized algorithms are required for calculating ME. The marginal effect computed is the average of the marginal effect at every data point present in the source table. +MADlib provides marginal effects regression functions for linear, logistic and multinomial logistic regressions. +WarningThe margins_logregr() and margins_mlogregr() functions have been deprecated in favor of the margins() function. +Marginal Effects with Interaction Terms +margins( model_table, + output_table, + x_design, + source_table, + marginal_vars + ) + Arguments +model_table +VARCHAR. The name of the model table, which is the output of logregr_train() or mlogregr_train(). +output_table +VARCHAR. The name of the result table. The output table has the following columns. + +variables INTEGER[]. The indices of the basis variables. + +margins DOUBLE PRECISION[]. The marginal effects. + +std_err DOUBLE PRECISION[]. An array of the standard errors, computed using the delta method. + +z_stats DOUBLE PRECISION[]. An array of the z-stats of the marginal effects. + +p_values DOUBLE PRECISION[]. An array of the Wald p-values of the marginal effects. + + +x_design (optional) +VARCHAR, default: NULL. The design of independent variables, necessary only if interaction term or indicator (categorical) terms are present. This parameter is necessary since the independent variables in the underlying regression is not parsed to extract the relationship between variables. +Example: The independent_varname in the regression method can be specified in either of the following ways: + âarray[1, color_blue, color_green, gender_female, gpa, gpa^2, gender_female*gpa, gender_female*gpa^2, weight]â + âxâ + +In the second version, the column x is an array containing data identical to that expressed in the first version, computed in a prior data preparation step. Supply an x_design argument to the margins() function in the following way: + â1, i.color_blue.color, i.color_green.color, i.gender_female, gpa, gpa^2, gender_female*gpa, gender_female*gpa^2, weightâ + +The variable names ('gpa', 'weight', ...), referred to here as identifiers, should be unique for each basis variable and need not be the same as the original variable name
[06/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__text__utilities.html -- diff --git a/docs/rc/group__grp__text__utilities.html b/docs/rc/group__grp__text__utilities.html new file mode 100644 index 000..c4326a0 --- /dev/null +++ b/docs/rc/group__grp__text__utilities.html @@ -0,0 +1,368 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: Term Frequency + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('group__grp__text__utilities.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Term FrequencyUtilities + + +Contents + +Function Syntax + +Examples + +Related Topics + +Term frequency computes the number of times that a word or term occurs in a document. Term frequency is often used as part of a larger text processing pipeline, which may include operations such as stemming, stop word removal and topic modelling. +Function Syntax + +term_frequency(input_table, + doc_id_col, + word_col, + output_table, + compute_vocab) +Arguments: +input_table +TEXT. The name of the table containing the documents, with one document per row. Each row is in the form doc_id, word_vector where doc_id is an id unique to each document, and word_vector is a text array containing the words in the document. The word_vector should contain multiple entries of a word if the document contains multiple occurrence of that word. + + +doc_id_col +TEXT. The name of the column containing the document id. + + +word_col +TEXT. The name of the column containing the vector of words/terms in the document. This column should be of type that can be cast to TEXT[]. + + +output_table +TEXT. The name of the table to store the term frequency output. The output table contains the following columns: +doc_id_col: This the document id column (name will be same as the one provided as input). +word: Word/term present in a document. Depending on the value of compute_vocab below, this is either the original word as it appears in word_col, or an id representing the word. Note that word id's start from 0 not 1. +count: The number of times this word is found in the document. + + + +compute_vocab +BOOLEAN. (Optional, Default=FALSE) Flag to indicate if a vocabulary table is to be created. If TRUE, an additional output table is created containing the vocabulary of all words, with an id assigned to each word in alphabetical order. The table is called output_table_vocabulary (i.e., suffix added to the output_table name) and contains the following columns: +wordid: An id for each word in alphabetical order. +word: The word/term corresponding to the id. + + + +Examples + +First we create a document table with one document per row: +DROP TABLE IF EXISTS documents; +CREATE TABLE documents(docid INT4, contents TEXT); +INSERT INTO documents VALUES +(0, 'I like to eat broccoli and bananas. I ate a banana and spinach smoothie for breakfast.'), +(1, 'Chinchillas and kittens are cute.'), +(2, 'My sister adopted two kittens yesterday.'), +(3, 'Look at this cute hamster munching on a piece of broccoli.'); + You can apply stemming, stop word removal and tokenization at this point in order to prepare the documents for text processing. Depending
[03/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/jquery.js -- diff --git a/docs/rc/jquery.js b/docs/rc/jquery.js new file mode 100644 index 000..2771c74 --- /dev/null +++ b/docs/rc/jquery.js @@ -0,0 +1,115 @@ +/* + @licstart The following is the entire license notice for the + JavaScript code in this file. + + Copyright (C) 1997-2017 by Dimitri van Heesch + + Permission is hereby granted, free of charge, to any person obtaining + a copy of this software and associated documentation files (the + "Software"), to deal in the Software without restriction, including + without limitation the rights to use, copy, modify, merge, publish, + distribute, sublicense, and/or sell copies of the Software, and to + permit persons to whom the Software is furnished to do so, subject to + the following conditions: + + The above copyright notice and this permission notice shall be included + in all copies or substantial portions of the Software. + + THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. + IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY + CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, + TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE + SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + + @licend The above is the entire license notice + for the JavaScript code in this file + */ +/*! + * jQuery JavaScript Library v1.7.1 + * http://jquery.com/ + * + * Copyright 2011, John Resig + * Dual licensed under the MIT or GPL Version 2 licenses. + * http://jquery.org/license + * + * Includes Sizzle.js + * http://sizzlejs.com/ + * Copyright 2011, The Dojo Foundation + * Released under the MIT, BSD, and GPL Licenses. + * + * Date: Mon Nov 21 21:11:03 2011 -0500 + */ +(function(bb,L){var av=bb.document,bu=bb.navigator,bl=bb.location;var b=(function(){var bF=function(b0,b1){return new bF.fn.init(b0,b1,bD)},bU=bb.jQuery,bH=bb.$,bD,bY=/^(?:[^#<]*(<[\w\W]+>)[^>]*$|#([\w\-]*)$)/,bM=/\S/,bI=/^\s+/,bE=/\s+$/,bA=/^<(\w+)\s*\/?>(?:<\/\1>)?$/,bN=/^[\],:{}\s]*$/,bW=/\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4})/g,bP=/"[^"\\\n\r]*"|true|false|null|-?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/g,bJ=/(?:^|:|,)(?:\s*\[)+/g,by=/(webkit)[ \/]([\w.]+)/,bR=/(opera)(?:.*version)?[ \/]([\w.]+)/,bQ=/(msie) ([\w.]+)/,bS=/(mozilla)(?:.*? rv:([\w.]+))?/,bB=/-([a-z]|[0-9])/ig,bZ=/^-ms-/,bT=function(b0,b1){return(b1+"").toUpperCase()},bX=bu.userAgent,bV,bC,e,bL=Object.prototype.toString,bG=Object.prototype.hasOwnProperty,bz=Array.prototype.push,bK=Array.prototype.slice,bO=String.prototype.trim,bv=Array.prototype.indexOf,bx={};bF.fn=bF.prototype={constructor:bF,init:function(b0,b4,b3){var b2,b5,b1,b6;if(!b0){return this}if(b0.nodeType){this.context=this[0]=b0;this.length=1;return this}if(b0==="bo dy"&&!b4&){this.context=av;this[0]=av.body;this.selector=b0;this.length=1;return this}if(typeof b0==="string"){if(b0.charAt(0)==="<"&(b0.length-1)===">"&>=3){b2=[null,b0,null]}else{b2=bY.exec(b0)}if(b2&&(b2[1]||!b4)){if(b2[1]){b4=b4 instanceof bF?b4[0]:b4;b6=(b4?b4.ownerDocument||b4:av);b1=bA.exec(b0);if(b1){if(bF.isPlainObject(b4)){b0=[av.createElement(b1[1])];bF.fn.attr.call(b0,b4,true)}else{b0=[b6.createElement(b1[1])]}}else{b1=bF.buildFragment([b2[1]],[b6]);b0=(b1.cacheable?bF.clone(b1.fragment):b1.fragment).childNodes}return bF.merge(this,b0)}else{b5=av.getElementById(b2[2]);if(b5&){if(b5.id!==b2[2]){return b3.find(b0)}this.length=1;this[0]=b5}this.context=av;this.selector=b0;return this}}else{if(!b4||b4.jquery){return(b4||b3).find(b0)}else{return this.constructor(b4).find(b0)}}}else{if(bF.isFunction(b0)){return b3.ready(b0)}}if(b0.selector!==L){this.selector=b0.selector;this.context=b0.context}return bF.makeArray(b0,this)},selector:"", jquery:"1.7.1",length:0,size:function(){return this.length},toArray:function(){return bK.call(this,0)},get:function(b0){return b0==null?this.toArray():(b0<0?this[this.length+b0]:this[b0])},pushStack:function(b1,b3,b0){var b2=this.constructor();if(bF.isArray(b1)){bz.apply(b2,b1)}else{bF.merge(b2,b1)}b2.prevObject=this;b2.context=this.context;if(b3==="find"){b2.selector=this.selector+(this.selector?" ":"")+b0}else{if(b3){b2.selector=this.selector+"."+b3+"("+b0+")"}}return b2},each:function(b1,b0){return bF.each(this,b1,b0)},ready:function(b0){bF.bindReady();bC.add(b0);return this},eq:function(b0){b0=+b0;return b0===-1?this.slice(b0):this.slice(b0,b0+1)},first:function(){return this.eq(0)},last:function(){return this.eq(-1)},slice:function(){return this.pushStack(bK.apply(this,arguments),"slice",bK.call(arguments).join(","))},map:function(b0){return this.pushStack(bF.map(this,function(b2,b1){return b0.call(b2,b1,b2)}))},end:function(){return this.prevObject||this.constructor(null)},pus
[21/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__graph__measures.html -- diff --git a/docs/rc/group__grp__graph__measures.html b/docs/rc/group__grp__graph__measures.html new file mode 100644 index 000..9339d92 --- /dev/null +++ b/docs/rc/group__grp__graph__measures.html @@ -0,0 +1,155 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: Measures + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('group__grp__graph__measures.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Modules + +MeasuresGraph + + +Detailed Description +A collection of metrics computed on a graph. + + +Modules +Average Path Length +Computes the average shortest-path length of a graph. + +Closeness +Computes the closeness centrality value of each node in the graph. + +Graph Diameter +Computes the diameter of a graph. + +In-Out Degree +Computes the degrees for each vertex. + + + + + + + +Generated on Mon Aug 6 2018 21:55:39 for MADlib by +http://www.doxygen.org/index.html;> + 1.8.14 + + + + http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__graph__measures.js -- diff --git a/docs/rc/group__grp__graph__measures.js b/docs/rc/group__grp__graph__measures.js new file mode 100644 index 000..6272fba --- /dev/null +++ b/docs/rc/group__grp__graph__measures.js @@ -0,0 +1,7 @@ +var group__grp__graph__measures = +[ +[ "Average Path Length", "group__grp__graph__avg__path__length.html", null ], +[ "Closeness", "group__grp__graph__closeness.html", null ], +[ "Graph Diameter", "group__grp__graph__diameter.html", null ], +[ "In-Out Degree", "group__grp__graph__vertex__degrees.html", null ] +]; \ No newline at end of file http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__graph__vertex__degrees.html -- diff --git a/docs/rc/group__grp__graph__vertex__degrees.html b/docs/rc/group__grp__graph__vertex__degrees.html new file mode 100644 index 000..9d8a2f5 --- /dev/null +++ b/docs/rc/group__grp__graph__vertex__degrees.html @@ -0,0 +1,273 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: In-Out Degree + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) +
[18/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__linreg.html -- diff --git a/docs/rc/group__grp__linreg.html b/docs/rc/group__grp__linreg.html new file mode 100644 index 000..9e73ca3 --- /dev/null +++ b/docs/rc/group__grp__linreg.html @@ -0,0 +1,479 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: Linear Regression + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('group__grp__linreg.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Linear RegressionSupervised Learning Regression Models + + +Contents + +Training Function + +Prediction Function + +Examples + +Technical Background + +Literature + +Related Topics + +Linear regression models a linear relationship of a scalar dependent variable \( y \) to one or more explanatory independent variables \( x \) and builds a model of coefficients. +Training Function +The linear regression training function has the following syntax. +linregr_train( source_table, + out_table, + dependent_varname, + independent_varname, + grouping_cols, + heteroskedasticity_option + ) +Arguments +source_table +TEXT. Name of the table containing the training data. + + +out_table +TEXT. Name of the generated table containing the output model. +The output table contains the following columns: + +... Any grouping columns provided during training. Present only if the grouping option is used. + +coef FLOAT8[]. Vector of the coefficients of the regression. + +r2 FLOAT8. R-squared coefficient of determination of the model. + +std_err FLOAT8[]. Vector of the standard error of the coefficients. + +t_stats FLOAT8[]. Vector of the t-statistics of the coefficients. + +p_values FLOAT8[]. Vector of the p-values of the coefficients. + +condition_no FLOAT8 array. The condition number of the \(X^{*}X\) matrix. A high condition number is usually an indication that there may be some numeric instability in the result yielding a less reliable model. A high condition number often results when there is a significant amount of colinearity in the underlying design matrix, in which case other regression techniques, such as elastic net regression, may be more appropriate. + +bp_stats FLOAT8. The Breush-Pagan statistic of heteroskedacity. Present only if the heteroskedacity argument was set to True when the model was trained. + +bp_p_value FLOAT8. The Breush-Pagan calculated p-value. Present only if the heteroskedacity parameter was set to True when the model was trained. + +num_rows_processed INTEGER. The number of rows that are actually used in each group. + +num_missing_rows_skipped INTEGER. The number of rows that have NULL values in the dependent and independent variables, and were skipped in the computation for each group. + +variance_covariance FLOAT[]. Variance/covariance matrix. + +A summary table named out_table_summary is created together with the output table. It has the following columns: + +method 'linregr' for linear regression. + +source_table The data source table name + +out_table The output table name + +dependent_varname The dependent variable + +independent_varname The independent variables + +num_rows_processed
[44/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/cox__prop__hazards_8sql__in.html -- diff --git a/docs/rc/cox__prop__hazards_8sql__in.html b/docs/rc/cox__prop__hazards_8sql__in.html new file mode 100644 index 000..aff8dbb --- /dev/null +++ b/docs/rc/cox__prop__hazards_8sql__in.html @@ -0,0 +1,2150 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: cox_prop_hazards.sql_in File Reference + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('cox__prop__hazards_8sql__in.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Functions + +cox_prop_hazards.sql_in File Reference + + + +SQL functions for cox proportional hazards. +More... + + +Functions +float8 []array_avg_transition (float8[] state, float8[] x, boolean use_abs) + +float8 []array_avg_merge (float8[] left, float8[] right) + +float8 []array_avg_final (float8[] state) + +aggregate float8 []array_avg (float8[], boolean) + +voidcoxph_train (varchar source_table, varchar output_table, varchar dependent_varname, varchar independent_varname, varchar right_censoring_status, varchar strata, varchar optimizer_params) +Compute cox-regression coefficients and diagnostic statistics. More... + +varcharcoxph_train () + +varcharcoxph_train (varchar message) + +voidcoxph_train (varchar source_table, varchar output_table, varchar dependent_variable, varchar independent_variable, varchar right_censoring_status, varchar strata) +Cox regression training function. More... + +voidcoxph_train (varchar source_table, varchar output_table, varchar dependent_variable, varchar independent_variable, varchar right_censoring_status) +Cox regression training function. More... + +voidcoxph_train (varchar source_table, varchar output_table, varchar dependent_variable, varchar independent_variable) +Cox regression training function. More... + +voidcoxph_predict (text model_table, text source_table, text id_col_name, text output_table, text pred_type, text reference) +Predict the linear Predictor or the risk for the given data. More... + +voidcoxph_predict (text model_table, text source_table, text id_col_name, text output_table, text pred_type) + +voidcoxph_predict (text model_table, text source_table, text id_col_name, text output_table) + +float8_coxph_predict_resp (float8[] coef, float8[] col_ind_var, float8[] mean_ind_var, text pred_type) + +float8 []_coxph_predict_terms (float8[] coef, float8[] col_ind_var, float8[] mean_ind_var) + +varcharcoxph_predict (varchar message) + +varcharcoxph_predict () + +float8 []_split_transition (float8[], float8, integer, integer) + +float8 []_split_merge (float8[], float8[]) + +float8 []_split_final (float8[]) + +aggregate float8 []_compute_splits (float8, integer, integer) + +integer_compute_grpid (float8[] splits, float8 split_col, boolean reverse) + +integer_compute_grpid (float8[] splits, float8 split_col) + +coxph_resultcompute_coxph_result (float8[] coef, float8 l, float8[] d2l, integer niter, float8[] stds) + +coxph_step_resultcoxph_improved_step_final (float8[] state) + +float8 []coxph_improved_step_transition (float8[] state, float8[] x, float8[] y, integer[] status, float8[] coef, float8[] max_coef) + +float8 []coxph_step_inner_final (float8[] state) + +float8
[26/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__crf.html -- diff --git a/docs/rc/group__grp__crf.html b/docs/rc/group__grp__crf.html new file mode 100644 index 000..20fd7da --- /dev/null +++ b/docs/rc/group__grp__crf.html @@ -0,0 +1,632 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: Conditional Random Field + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('group__grp__crf.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Conditional Random FieldSupervised Learning + + +Contents + +Training Feature Generation + +CRF Training Function + +Testing Feature Generation + +Inference using Viterbi + +Using CRF + +Examples + +Technical Background + +Literature + +Related Topics + +A conditional random field (CRF) is a type of discriminative, undirected probabilistic graphical model. A linear-chain CRF is a special type of CRF that assumes the current state depends only on the previous state. +Feature extraction modules are provided for text-analysis tasks such as part-of-speech (POS) tagging and named-entity resolution (NER). Currently, six feature types are implemented: + +Edge Feature: transition feature that encodes the transition feature weight from current label to next label. +Start Feature: fired when the current token is the first token in a sequence. +End Feature: fired when the current token is the last token in a sequence. +Word Feature: fired when the current token is observed in the trained dictionary. +Unknown Feature: fired when the current token is not observed in the trained dictionary for at least a certain number of times (default 1). +Regex Feature: fired when the current token can be matched by a regular expression. + +A Viterbi implementation is also provided to get the best label sequence and the conditional probability \( \Pr( \text{best label sequence} \mid \text{sequence}) \). +Following steps are required for CRF Learning and Inference: +Training Feature Generation +CRF Training +Testing Feature Generation +Inference using Viterbi + +Training Feature GenerationThe function takes train_segment_tbl and regex_tbl as input and does feature generation generating three tables dictionary_tbl, train_feature_tbl and train_featureset_tbl, that are required as an input for CRF training. +crf_train_fgen(train_segment_tbl, + regex_tbl, + label_tbl, + dictionary_tbl, + train_feature_tbl, + train_featureset_tbl) + Arguments +train_segment_tbl +TEXT. Name of the training segment table. The table is expected to have the following columns: + +doc_id INTEGER. Document id column + +start_pos INTEGER. Index of a particular term in the respective document + +seg_text TEXT. Term at the respective start_pos in the document + +label INTEGER. Label id for the term corresponding to the actual label from label_tbl + + +regex_tbl +TEXT. Name of the regular expression table. The table is expected to have the following columns: + +pattern TEXT. Regular Expression + +name TEXT. Regular Expression name + + +label_tbl +TEXT. Name of the table containing unique labels and their id's. The table is expected to have the following columns: + +id INTEGER. Unique label
[01/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
Repository: madlib-site Updated Branches: refs/heads/asf-site acd339f65 -> 9a2b301d3 http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/lda_8sql__in.html -- diff --git a/docs/rc/lda_8sql__in.html b/docs/rc/lda_8sql__in.html new file mode 100644 index 000..e2423a7 --- /dev/null +++ b/docs/rc/lda_8sql__in.html @@ -0,0 +1,1422 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: lda.sql_in File Reference + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('lda_8sql__in.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Functions + +lda.sql_in File Reference + + + +SQL functions for Latent Dirichlet Allocation. +More... + + +Functions +set lda_result lda_train (text data_table, text model_table, text output_data_table, int4 voc_size, int4 topic_num, int4 iter_num, float8 alpha, float8 beta) +This UDF provides an entry for the lda training process. More... + +set lda_result lda_predict (text data_table, text model_table, text output_table) +This UDF provides an entry for the lda predicton process. More... + +set lda_result lda_predict (text data_table, text model_table, text output_table, int4 iter_num) +A overloaded version which allows users to specify iter_num. More... + +set lda_result lda_get_topic_word_count (text model_table, text output_table) +This UDF computes the per-topic word counts. More... + +set lda_result lda_get_word_topic_count (text model_table, text output_table) +This UDF computes the per-word topic counts. More... + +set lda_result lda_get_topic_desc (text model_table, text vocab_table, text desc_table, int4 top_k) +This UDF gets the description for each topic (top-k words) More... + +set lda_result lda_get_word_topic_mapping (text lda_output_table, text mapping_table) +This UDF gets the wordid - topicid mapping from the lda training output table. More... + +int4 []__lda_random_assign (int4 word_count, int4 topic_num) +This UDF assigns topics to words in a document randomly. More... + +int4 []__lda_gibbs_sample (int4[] words, int4[] counts, int4[] doc_topic, int8[] model, float8 alpha, float8 beta, int4 voc_size, int4 topic_num, int4 iter_num) +This UDF learns the topics of words in a document and is the main step of a Gibbs sampling iteration. The model parameter (including the per-word topic counts and corpus-level topic counts) is passed to this function in the first call and then transfered to the rest calls through fcinfo-flinfo-fn_extra to allow the immediate update. More... + +int8 []__lda_count_topic_sfunc (int8[] state, int4[] words, int4[] counts, int4[] topic_assignment, int4 voc_size, int4 topic_num) +This UDF is the sfunc for the aggregator computing the topic counts for each word and the topic count in the whole corpus. It scans the topic assignments in a document and updates the topic counts. More... + +int8 []__lda_count_topic_prefunc (int8[] state1, int8[] state2) +This UDF is the prefunc for the aggregator computing the per-word topic counts. More... + +aggregate int8 []__lda_count_topic_agg (int4[], int4[], int4[], int4, int4) +This uda computes the word topic counts by scanning and summing up topic assignments in each document. More... + +float8lda_get_perplexity (text model_table, text
[08/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__strs.html -- diff --git a/docs/rc/group__grp__strs.html b/docs/rc/group__grp__strs.html new file mode 100644 index 000..a3a6e3b --- /dev/null +++ b/docs/rc/group__grp__strs.html @@ -0,0 +1,269 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: Stratified Sampling + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('group__grp__strs.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Stratified SamplingSampling + + +Contents + +Stratified Sampling + +Examples + +Stratified sampling is a method for independently sampling subpopulations (strata). It is commonly used to reduce sampling error by ensuring that subgroups are adequately represented in the sample. +Stratified Sampling + +stratified_sample( source_table, +output_table, +proportion, +grouping_cols, +target_cols, +with_replacement + ) +Arguments +source_table +TEXT. Name of the table containing the input data. + + +output_table +TEXT. Name of output table that contains the sampled data. The output table contains all columns present in the source table unless otherwise specified in the 'target_cols' parameter below. + + +proportion +FLOAT8 in the range (0,1). Each stratum is sampled independently. + + +grouping_cols (optional) +TEXT, default: NULL. A single column or a list of comma-separated columns that defines the strata. When this parameter is NULL, no grouping is used so the sampling is non-stratified, that is, the whole table is treated as a single group. + + +target_cols (optional) +TEXT, default NULL. A comma-separated list of columns to appear in the 'output_table'. If NULL or '*', all columns from the 'source_table' will appear in the 'output_table'. +NoteDo not include 'grouping_cols' in the parameter 'target_cols', because they are always included in the 'output_table'. + +with_replacement (optional) +BOOLEAN, default FALSE. Determines whether to sample with replacement or without replacement (default). With replacement means that it is possible that the same row may appear in the sample set more than once. Without replacement means a given row can be selected only once. + +Examples +Please note that due to the random nature of sampling, your results may look different from those below. + +Create an input table: +DROP TABLE IF EXISTS test; +CREATE TABLE test( +id1 INTEGER, +id2 INTEGER, +gr1 INTEGER, +gr2 INTEGER +); +INSERT INTO test VALUES +(1,0,1,1), +(2,0,1,1), +(3,0,1,1), +(4,0,1,1), +(5,0,1,1), +(6,0,1,1), +(7,0,1,1), +(8,0,1,1), +(9,0,1,1), +(9,0,1,1), +(9,0,1,1), +(9,0,1,1), +(0,1,1,2), +(0,2,1,2), +(0,3,1,2), +(0,4,1,2), +(0,5,1,2), +(0,6,1,2), +(10,10,2,2), +(20,20,2,2), +(30,30,2,2), +(40,40,2,2), +(50,50,2,2), +(60,60,2,2), +(70,70,2,2); + +Sample without replacement: +DROP TABLE IF EXISTS out; +SELECT madlib.stratified_sample( +'test',-- Source table +'out', -- Output table +0.5, -- Sample proportion +'gr1,gr2', -- Strata definition +
[15/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__nn.html -- diff --git a/docs/rc/group__grp__nn.html b/docs/rc/group__grp__nn.html new file mode 100644 index 000..d7569f6 --- /dev/null +++ b/docs/rc/group__grp__nn.html @@ -0,0 +1,1143 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: Neural Network + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('group__grp__nn.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Neural NetworkSupervised Learning + + +Contents + +Classification + +Regression + +Optimizer Parameters + +Prediction Functions + +Examples + +Technical Background + +Literature + +Related Topics + +Multilayer Perceptron (MLP) is a type of neural network that can be used for regression and classification. +MLPs consist of several fully connected hidden layers with non-linear activation functions. In the case of classification, the final layer of the neural net has as many nodes as classes, and the output of the neural net can be interpreted as the probability that a given input feature belongs to a specific class. +MLP can be used with or without mini-batching. The advantage of using mini-batching is that it can perform better than stochastic gradient descent (default MADlib optimizer) because it uses more than one training example at a time, typically resulting faster and smoother convergence [3]. +NoteIn order to use mini-batching, you must first run the Mini-Batch Preprocessor, which is a utility that prepares input data for use by models that support mini-batch as an optimization option, such as MLP. This is a one-time operation and you would only need to re-run the preprocessor if your input data has changed, or if you change the grouping parameter. +Classification Training FunctionThe MLP classification training function has the following format: + +mlp_classification( +source_table, +output_table, +independent_varname, +dependent_varname, +hidden_layer_sizes, +optimizer_params, +activation, +weights, +warm_start, +verbose, +grouping_col +) +Arguments +source_table +TEXT. Name of the table containing the training data. If you are using mini-batching, this is the name of the output table from the mini-batch preprocessor. + + +output_table +TEXT. Name of the output table containing the model. Details of the output table are shown below. + + +independent_varname +TEXT. Expression list to evaluate for the independent variables. It should be a numeric array expression. If you are using mini-batching, set this parameter to 'independent_varname' which is the hardcoded name of the column from the mini-batch preprocessor containing the packed independent variables. +NoteIf you are not using mini-batching, please note that an intercept variable should not be included as part of this expression - this is different from other MADlib modules. Also please note that independent variables should be encoded properly. All values are cast to DOUBLE PRECISION, so categorical variables should be one-hot or dummy encoded as appropriate. See Encoding Categorical Variables for more details. + +dependent_varname +TEXT. Name of the dependent variable column. For classification, supported types
[42/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/cross__validation_8sql__in.html -- diff --git a/docs/rc/cross__validation_8sql__in.html b/docs/rc/cross__validation_8sql__in.html new file mode 100644 index 000..c5acec3 --- /dev/null +++ b/docs/rc/cross__validation_8sql__in.html @@ -0,0 +1,717 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: cross_validation.sql_in File Reference + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('cross__validation_8sql__in.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Functions + +cross_validation.sql_in File Reference + + + +SQL functions for cross validation. +More... + + +Functions +voidcross_validation_general (varchar modelling_func, varchar[] modelling_params, varchar[] modelling_params_type, varchar param_explored, varchar[] explore_values, varchar predict_func, varchar[] predict_params, varchar[] predict_params_type, varchar metric_func, varchar[] metric_params, varchar[] metric_params_type, varchar data_tbl, varchar data_id, boolean id_is_random, varchar validation_result, varchar[] data_cols, integer n_folds) + +voidcross_validation_general (varchar modelling_func, varchar[] modelling_params, varchar[] modelling_params_type, varchar param_explored, varchar[] explore_values, varchar predict_func, varchar[] predict_params, varchar[] predict_params_type, varchar metric_func, varchar[] metric_params, varchar[] metric_params_type, varchar data_tbl, varchar data_id, boolean id_is_random, varchar validation_result, varchar[] data_cols) + +voidcv_linregr_train (varchar tbl_source, varchar col_ind_var, varchar col_dep_var, varchar tbl_result) +A wrapper for linear regression. More... + +voidcv_linregr_predict (varchar tbl_model, varchar tbl_newdata, varchar col_ind_var, varchar col_id, varchar tbl_predict) +A wrapper for linear regression prediction. More... + +voidmse_error (varchar tbl_prediction, varchar tbl_actual, varchar id_actual, varchar values_actual, varchar tbl_error) + +voidmisclassification_avg (varchar tbl_prediction, varchar tbl_actual, varchar id_actual, varchar values_actual, varchar tbl_error) + +voidcv_logregr_predict (varchar tbl_model, varchar tbl_newdata, varchar col_ind_var, varchar col_id, varchar tbl_predict) +A prediction function for logistic regression The result is stored in the table of tbl_predict. More... + +integerlogregr_accuracy (float8[] coef, float8[] col_ind, boolean col_dep) +Metric function for logistic regression. More... + +voidcv_logregr_accuracy (varchar tbl_predict, varchar tbl_source, varchar col_id, varchar col_dep_var, varchar tbl_accuracy) +Metric function for logistic regression. More... + + +Detailed Description +DateJanuary 2011 +See alsoFor a brief introduction to the usage of cross validation, see the module description Cross Validation. +Function Documentation + +cross_validation_general() [1/2] + + + + + + void cross_validation_general + ( + varchar + modelling_func, + + + + + varchar [] + modelling_params, + + + + + varchar [] + modelling_params_type, + + + +
[23/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__encode__categorical.html -- diff --git a/docs/rc/group__grp__encode__categorical.html b/docs/rc/group__grp__encode__categorical.html new file mode 100644 index 000..61f6f45 --- /dev/null +++ b/docs/rc/group__grp__encode__categorical.html @@ -0,0 +1,700 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: Encoding Categorical Variables + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('group__grp__encode__categorical.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Encoding Categorical VariablesData Types and Transformations + + +Contents + +Coding Systems for Categorical Variables + +Examples + +Literature + +Coding Systems for Categorical VariablesCategorical variables [1] require special attention in regression analysis because, unlike dichotomous or continuous variables, they cannot be entered into the regression equation just as they are. For example, if you have a variable called race that is coded with 1=Hispanic, 2=Asian, 3=Black, 4=White, then entering race in your regression will look at the linear effect of the race variable, which is probably not what you intended. Instead, categorical variables like this need to be coded into a series of indicator variables which can then be entered into the regression model. There are a variety of coding systems that can be used for coding categorical variables, including one-hot, dummy, effects, orthogonal, and Helmert. +We currently support one-hot and dummy coding techniques. +Dummy coding is used when a researcher wants to compare other groups of the predictor variable with one specific group of the predictor variable. Often, the specific group to compare with is called the reference group. +One-hot encoding is similar to dummy coding except it builds indicator (0/1) columns (cast as numeric) for each value of each category. Only one of these columns could take on the value 1 for each row (data point). There is no reference category for this function. + +encode_categorical_variables ( +source_table, +output_table, +categorical_cols, +categorical_cols_to_exclude,-- Optional +row_id, -- Optional +top,-- Optional +value_to_drop, -- Optional +encode_null,-- Optional +output_type,-- Optional +output_dictionary, -- Optional +distributed_by -- Optional +) + Arguments +source_table +VARCHAR. Name of the table containing the source categorical data to encode. + + +output_table +VARCHAR. Name of the result table. +NoteIf there are index columns in the 'source_table' specified by the parameter 'row_id' (see below), then the output table will contain only the index columns 'row_id' and the encoded columns. If the parameter 'row_id' is not specified, then all columns from the 'source_table', with the exception of the original columns that have been encoded, will be included in the 'output_table'. + +categorical_cols +VARCHAR. Comma-separated string of column names of categorical variables to encode. Can also be '*' meaning
[20/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__kmeans.html -- diff --git a/docs/rc/group__grp__kmeans.html b/docs/rc/group__grp__kmeans.html new file mode 100644 index 000..0f40769 --- /dev/null +++ b/docs/rc/group__grp__kmeans.html @@ -0,0 +1,492 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: k-Means Clustering + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('group__grp__kmeans.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +k-Means ClusteringUnsupervised Learning Clustering + + +Contents + +Training Function + +Output Format + +Cluster Assignment + +Examples + +Notes + +Technical Background + +Literature + +Related Topics + +Clustering refers to the problem of partitioning a set of objects according to some problem-dependent measure of similarity. In the k-means variant, given \( n \) points \( x_1, \dots, x_n \in \mathbb R^d \), the goal is to position \( k \) centroids \( c_1, \dots, c_k \in \mathbb R^d \) so that the sum of distances between each point and its closest centroid is minimized. Each centroid represents a cluster that consists of all points to which this centroid is closest. +Training Function +The k-means algorithm can be invoked in four ways, depending on the source of the initial set of centroids: + +Use the random centroid seeding method. +kmeans_random( rel_source, + expr_point, + k, + fn_dist, + agg_centroid, + max_num_iterations, + min_frac_reassigned + ) + +Use the kmeans++ centroid seeding method. +kmeanspp( rel_source, + expr_point, + k, + fn_dist, + agg_centroid, + max_num_iterations, + min_frac_reassigned, + seeding_sample_ratio +) + +Supply an initial centroid set in a relation identified by the rel_initial_centroids argument. +kmeans( rel_source, +expr_point, +rel_initial_centroids, +expr_centroid, +fn_dist, +agg_centroid, +max_num_iterations, +min_frac_reassigned + ) + +Provide an initial centroid set as an array expression in the initial_centroids argument. +kmeans( rel_source, +expr_point, +initial_centroids, +fn_dist, +agg_centroid, +max_num_iterations, +min_frac_reassigned + ) + Arguments +rel_source +TEXT. The name of the table containing the input data points. +Data points and predefined centroids (if used) are expected to be stored row-wise, in a column of type SVEC (or any type convertible to SVEC, like FLOAT[] or INTEGER[]). Data points with non-finite values (NULL, NaN, infinity) in any component are skipped during analysis. + + +expr_point +TEXT. The name of the column with point coordinates or an array expression. + + +k +INTEGER. The number of centroids to calculate. + + +fn_dist (optional) +TEXT, default: squared_dist_norm2'. The name of the function to use to calculate the distance from a data point to a centroid. +The following distance functions can be used (computation of barycenter/mean in parentheses): + +dist_norm1: 1-norm/Manhattan (element-wise median [Note that MADlib does not provide a median
[07/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__svec.html -- diff --git a/docs/rc/group__grp__svec.html b/docs/rc/group__grp__svec.html new file mode 100644 index 000..efdf975 --- /dev/null +++ b/docs/rc/group__grp__svec.html @@ -0,0 +1,455 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: Sparse Vectors + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('group__grp__svec.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Sparse VectorsData Types and Transformations Arrays and Matrices + + +Contents + +Using Sparse Vectors + +Document Vectorization into Sparse Vectors + +Examples + +Related Topics + +This module implements a sparse vector data type, named "svec", which provides compressed storage of vectors that have many duplicate elements. +Arrays of floating point numbers for various calculations sometimes have long runs of zeros (or some other default value). This is common in applications like scientific computing, retail optimization, and text processing. Each floating point number takes 8 bytes of storage in memory and/or disk, so saving those zeros is often worthwhile. There are also many computations that can benefit from skipping over the zeros. +Consider, for example, the following array of doubles stored as a Postgres/Greenplum "float8[]" data type: + +'{0, 33,...40,000 zeros..., 12, 22 }'::float8[] +This array would occupy slightly more than 320KB of memory or disk, most of it zeros. Even if we were to exploit the null bitmap and store the zeros as nulls, we would still end up with a 5KB null bitmap, which is still not nearly as memory efficient as we'd like. Also, as we perform various operations on the array, we do work on 40,000 fields that turn out to be unimportant. +To solve the problems associated with the processing of vectors discussed above, the svec type employs a simple Run Length Encoding (RLE) scheme to represent sparse vectors as pairs of count-value arrays. For example, the array above would be represented as + +'{1,1,4,1,1}:{0,33,0,12,22}'::madlib.svec +which says there is 1 occurrence of 0, followed by 1 occurrence of 33, followed by 40,000 occurrences of 0, etc. This uses just 5 integers and 5 floating point numbers to store the array. Further, it is easy to implement vector operations that can take advantage of the RLE representation to make computations faster. The SVEC module provides a library of such functions. +The current version only supports sparse vectors of float8 values. Future versions will support other base types. +Using Sparse Vectors +An SVEC can be constructed directly with a constant expression, as follows: +SELECT '{n1,n2,...,nk}:{v1,v2,...vk}'::madlib.svec; + where n1,n2,...,nk specifies the counts for the values v1,v2,...,vk. +A float array can be cast to an SVEC: +SELECT ('{v1,v2,...vk}'::float[])::madlib.svec; +An SVEC can be created with an aggregation: +SELECT madlib.svec_agg(v1) FROM generate_series(1,k); +An SVEC can be created using the madlib.svec_cast_positions_float8arr() function by supplying an array of positions and an array of values at those positions: +SELECT madlib.svec_cast_positions_float8arr( +array[n1,n2,...nk],-- positions of values in vector +
[49/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/array__ops_8sql__in.html -- diff --git a/docs/rc/array__ops_8sql__in.html b/docs/rc/array__ops_8sql__in.html new file mode 100644 index 000..c6140a5 --- /dev/null +++ b/docs/rc/array__ops_8sql__in.html @@ -0,0 +1,1275 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: array_ops.sql_in File Reference + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('array__ops_8sql__in.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Functions + +array_ops.sql_in File Reference + + + +implementation of array operations in SQL +More... + + +Functions +anyarrayarray_add (anyarray x, anyarray y) +Adds two arrays. It requires that all the values are NON-NULL. Return type is the same as the input type. More... + +aggregate anyarraysum (anyarray) +Aggregate, element-wise sum of arrays. It requires that all the values are NON-NULL. Return type is the same as the input type. More... + +anyarrayarray_sub (anyarray x, anyarray y) +Subtracts two arrays. It requires that all the values are NON-NULL. Return type is the same as the input type. More... + +anyarrayarray_mult (anyarray x, anyarray y) +Element-wise product of two arrays. It requires that all the values are NON-NULL. Return type is the same as the input type. More... + +anyarrayarray_div (anyarray x, anyarray y) +Element-wise division of two arrays. It requires that all the values are NON-NULL. Return type is the same as the input type. More... + +float8array_dot (anyarray x, anyarray y) +Dot-product of two arrays. It requires that all the values are NON-NULL. Return type is the same as the input type. More... + +boolarray_contains (anyarray x, anyarray y) +Checks whether one array contains the other. This function returns TRUE if each non-zero element in the right array equals to the element with the same index in the left array. More... + +anyelementarray_max (anyarray x) +This function finds the maximum value in the array. NULLs are ignored. Return type is the same as the input type. More... + +float8 []array_max_index (anyarray x) +This function finds the maximum value and corresponding index in the array. NULLs are ignored. Return type is the same as the input type. More... + +anyelementarray_min (anyarray x) +This function finds the minimum value in the array. NULLs are ignored. Return type is the same as the input type. More... + +float8 []array_min_index (anyarray x) +This function finds the minimum value and corresponding index in the array. NULLs are ignored. Return type is the same as the input type. More... + +anyelementarray_sum (anyarray x) +This function finds the sum of the values in the array. NULLs are ignored. Return type is the same as the input type. More... + +float8array_sum_big (anyarray x) +This function finds the sum of the values in the array. NULLs are ignored. Return type is always FLOAT8 regardless of input. This function is meant to replace array_sum() in the cases when sum may overflow the element type. More... + +anyelementarray_abs_sum (anyarray x) +This function finds the sum of abs of the values in the array. NULLs are ignored. Return type is the same as the input type. More... + +anyarrayarray_abs (anyarray
[36/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/dir_ce4fa7aad06dd1bbca713eb50be7391b.html -- diff --git a/docs/rc/dir_ce4fa7aad06dd1bbca713eb50be7391b.html b/docs/rc/dir_ce4fa7aad06dd1bbca713eb50be7391b.html new file mode 100644 index 000..d149a74 --- /dev/null +++ b/docs/rc/dir_ce4fa7aad06dd1bbca713eb50be7391b.html @@ -0,0 +1,190 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: modules Directory Reference + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('dir_ce4fa7aad06dd1bbca713eb50be7391b.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +modules Directory Reference + + + + +Directories +directory assoc_rules + +directory bayes + +directory conjugate_gradient + +directory convex + +directory crf + +directory elastic_net + +directory glm + +directory graph + +directory kmeans + +directory knn + +directory lda + +directory linalg + +directory linear_systems + +directory pca + +directory pmml + +directory prob + +directory recursive_partitioning + +directory regress + +directory sample + +directory stats + +directory summary + +directory svm + +directory tsa + +directory utilities + +directory validation + + + + + + + +madlibsrcportspostgresmodules +Generated on Mon Aug 6 2018 21:55:39 for MADlib by +http://www.doxygen.org/index.html;> + 1.8.14 + + + + http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/dir_d2a50e0bf8f1a4defa2e0ba0a193a480.html -- diff --git a/docs/rc/dir_d2a50e0bf8f1a4defa2e0ba0a193a480.html b/docs/rc/dir_d2a50e0bf8f1a4defa2e0ba0a193a480.html new file mode 100644 index 000..dc25b29 --- /dev/null +++ b/docs/rc/dir_d2a50e0bf8f1a4defa2e0ba0a193a480.html @@ -0,0 +1,142 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: pmml Directory Reference + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new
[04/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/hypothesis__tests_8sql__in.html -- diff --git a/docs/rc/hypothesis__tests_8sql__in.html b/docs/rc/hypothesis__tests_8sql__in.html new file mode 100644 index 000..e172197 --- /dev/null +++ b/docs/rc/hypothesis__tests_8sql__in.html @@ -0,0 +1,1262 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: hypothesis_tests.sql_in File Reference + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('hypothesis__tests_8sql__in.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Functions + +hypothesis_tests.sql_in File Reference + + + +SQL functions for statistical hypothesis tests. +More... + + +Functions +float8 []t_test_one_transition (float8[] state, float8 value) + +float8 []t_test_merge_states (float8[] state1, float8[] state2) + +t_test_resultt_test_one_final (float8[] state) + +f_test_resultf_test_final (float8[] state) + +aggregate float8 []t_test_one (float8 value) +Perform one-sample or dependent paired Student t-test. More... + +float8 []t_test_two_transition (float8[] state, boolean first, float8 value) + +t_test_resultt_test_two_pooled_final (float8[] state) + +aggregate float8 []t_test_two_pooled (boolean first, float8 value) +Perform two-sample pooled (i.e., equal variances) Student t-test. More... + +t_test_resultt_test_two_unpooled_final (float8[] state) + +aggregate float8 []t_test_two_unpooled (boolean first, float8 value) +Perform unpooled (i.e., unequal variances) t-test (also known as Welch's t-test) More... + +aggregate float8 []f_test (boolean first, float8 value) +Perform Fisher F-test. More... + +float8 []chi2_gof_test_transition (float8[] state, bigint observed, float8 expected, bigint df) + +float8 []chi2_gof_test_transition (float8[] state, bigint observed, float8 expected) + +float8 []chi2_gof_test_transition (float8[] state, bigint observed) + +float8 []chi2_gof_test_merge_states (float8[] state1, float8[] state2) + +chi2_test_resultchi2_gof_test_final (float8[] state) + +aggregate float8 []chi2_gof_test (bigint observed, float8 expected=1, bigint df=0) +Perform Pearson's chi-squared goodness-of-fit test. More... + +aggregate float8 []chi2_gof_test (bigint observed, float8 expected) + +aggregate float8 []chi2_gof_test (bigint observed) + +float8 []ks_test_transition (float8[] state, boolean first, float8 value, bigint numFirst, bigint numSecond) + +ks_test_resultks_test_final (float8[] state) + +float8 []mw_test_transition (float8[] state, boolean first, float8 value) +Perform Kolmogorov-Smirnov test. More... + +mw_test_resultmw_test_final (float8[] state) + +float8 []wsr_test_transition (float8[] state, float8 value, float8 precision) +Perform Mann-Whitney test. More... + +float8 []wsr_test_transition (float8[] state, float8 value) + +wsr_test_resultwsr_test_final (float8[] state) + +float8 []one_way_anova_transition (float8[] state, integer group, float8 value) +Perform Wilcoxon-Signed-Rank test. More... + +float8 []one_way_anova_merge_states (float8[] state1, float8[] state2) + +one_way_anova_resultone_way_anova_final (float8[] state) + +aggregate float8 []one_way_anova (integer group, float8 value) +Perform one-way analysis of variance. More... + + +Detailed
[29/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__balance__sampling.html -- diff --git a/docs/rc/group__grp__balance__sampling.html b/docs/rc/group__grp__balance__sampling.html new file mode 100644 index 000..20a971d --- /dev/null +++ b/docs/rc/group__grp__balance__sampling.html @@ -0,0 +1,607 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: Balanced Sampling + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(initResizable); +/* @license-end */ + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ + $(document).ready(function() { init_search(); }); +/* @license-end */ + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.15 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +var searchBox = new SearchBox("searchBox", "search",false,'Search'); +/* @license-end */ + + + + + + + + + + + + +/* @license magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt GPL-v2 */ +$(document).ready(function(){initNavTree('group__grp__balance__sampling.html','');}); +/* @license-end */ + + + + + + + + + + + + + + +Balanced SamplingSampling + + +Contents + +Balanced Sampling + +Examples + +Literature + +Related Topics + +Some classification algorithms only perform optimally when the number of samples in each class is roughly the same. Highly skewed datasets are common in many domains (e.g., fraud detection), so resampling to offset this imbalance can produce a better decision boundary. +This module offers a number of resampling techniques including undersampling majority classes, oversampling minority classes, and combinations of the two. +Balanced Sampling + +balance_sample( source_table, +output_table, +class_col, +class_sizes, +output_table_size, +grouping_cols, +with_replacement, +keep_null + ) +Arguments +source_table +TEXT. Name of the table containing the input data. + + +output_table +TEXT. Name of output table that contains the sampled data. The output table contains all columns present in the source table, plus a new generated id called "__madlib_id__" added as the first column. + + +class_col +TEXT, Name of the column containing the class to be balanced. + + +class_sizes (optional) +VARCHAR, default âuniformâ. Parameter to define the size of the different class values. (Class values are sometimes also called levels). Can be set to the following: + + +âuniformâ: All class values will be resampled to have the same number of rows. + +'undersample': Undersample such that all class values end up with the same number of observations as the minority class. Done without replacement by default unless the parameter âwith_replacementâ is set to TRUE. + +'oversample': Oversample with replacement such that all class values end up with the same number of observations as the majority class. Not affected by the parameter âwith_replacementâ since oversampling is always done with replacement. Short forms of the above will work too, e.g., 'uni' works the same as 'uniform'. + +Alternatively, you can also explicitly set class size in a string containing a comma-delimited list. Order does not matter and all class values do not need to be specified. Use the format âclass_value_1=x, class_value_2=y, â¦â where 'class_value' in the list must exist in the column 'class_col'. Set to an integer representing the desired number of observations. E.g., âred=3000, blue=4000â means you want to resample the dataset to result in exactly 3000 red and 4000 blue rows in the âoutput_tableâ. +NoteThe allowed names for class values follows object naming rules in PostgreSQL [1]. Quoted identifiers are allowed and should be enclosed in
svn commit: r28605 - /dev/madlib/1.15-RC1/
Author: riyer Date: Tue Aug 7 21:03:59 2018 New Revision: 28605 Log: Add 1.15 RC1 files Added: dev/madlib/1.15-RC1/ dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg (with props) dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg.asc dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg.sha512 dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm (with props) dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm.asc dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm.sha512 dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux.rpm (with props) dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux.rpm.asc dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux.rpm.sha512 dev/madlib/1.15-RC1/apache-madlib-1.15-src.tar.gz (with props) dev/madlib/1.15-RC1/apache-madlib-1.15-src.tar.gz.asc dev/madlib/1.15-RC1/apache-madlib-1.15-src.tar.gz.sha512 Added: dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg == Binary file - no diff available. Propchange: dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg -- svn:mime-type = application/octet-stream Added: dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg.asc == --- dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg.asc (added) +++ dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg.asc Tue Aug 7 21:03:59 2018 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCAAdFiEE+pCcIAZjKnTgT/pQYwVq5BLE49cFAltpL1QACgkQYwVq5BLE +49cDJA/+IhMzkgxv6zX1Omuo8ofNMCetHJC4RmB8rwxem7DnLVUgwYNn+xK7lpAU +Yn9nm/XtFGXVqJ4CWGzaDL/iW2fsUqI5LX22CgeRaRD/iXasYB5TWMKvspaYY5RW +23Y7lYv3ea/+Gxnjj3uG7BwqxJ5YvtNiWoKWpq8PhSgo1souBivMGLGVS1DK55Wy +gnZuGULY9qq3cr0n5N7HDRS0e3bzKWqpm5xcGAtz2O5hW7tVDqT2FBrJmOG8mkPQ +GZ7cRPbeIeAi+CQzuvm522DtqPepJJW99UAl+0oksHgB6ag+iS80bufF27Fr9P0n +18Lq59/mJwdeUIxK95ak2AWjjmuuFzLY5QB06kJ5Mze96m4SA/VFJ9qdGljcDesX +BkwKNboi/zQSrUY5xVWNPWn3Qe5v0FUH8H0K1laqkczkeN+TGh8BlmOUF9DGbZ3l +L8spewzlbjuUAVUX9Q5Sren4qiliTj7UR4+hhggDvHIAAQQjCsOj78dOzce3Px8c +BrYRHCHbzBS6vg75DRj3P2KItpeRvwdZfNBaG/F0cPpBP/Yuwma62SdGATLdg6Fj ++mMcYysmJLTrPsN0fu+Q7YasWgkPJthnaIkdxpbpEFkh74ZZaYcpDZZw7HW3FBB7 +qm8DQiMrL5wED9khZtvWNuqrMjlCuIN+/j8d8N7508DMtPtSkVE= +=wEHu +-END PGP SIGNATURE- Added: dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg.sha512 == --- dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg.sha512 (added) +++ dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg.sha512 Tue Aug 7 21:03:59 2018 @@ -0,0 +1 @@ +494c374d272ac707dd503b1c1e33900ca0cca56f48e7ad84a7bed4f01090dbc09155fb09998bfb8db2b448ab84b527e619fbfafc90e3369b4b49cc5a27d4d5aa apache-madlib-1.15-bin-Darwin.dmg Added: dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm == Binary file - no diff available. Propchange: dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm -- svn:mime-type = application/octet-stream Added: dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm.asc == --- dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm.asc (added) +++ dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm.asc Tue Aug 7 21:03:59 2018 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCAAdFiEE+pCcIAZjKnTgT/pQYwVq5BLE49cFAltqAGsACgkQYwVq5BLE +49eFQg//T9SFevZM1eqUwWoM8DhuRdHDNJRaLRuCqTn4RY6cwd4quMLbJ6tMrMcY +Da0JFfKHUk916eDvgDAyVbbYNfLI6+Td2xdXRZKJdkf8ju1XeLK3hx196C/g+DF+ +ldHILlIoizcLFsypSqOxSwqqIzZ4V+ZdHLsoGILsTQKdok5AuLRYmcJFu7bxbLWI +gx4tKTFhTJzzDC00Sq9eBIabsWUQhiR7WpmwswRtuOAcvJQH4rwjPjozeBqLGLt5 +/+554enRlTbQw+2URj5DybIYjEVba58sMN8cj83FPu0745e+2kTDW6oZ5TXXGc15 +Rh4PDkSd0+AoUWX64ccT1n/AINMwm1f3g7CWU1lrzXnwY9H9+eABFwtYNBsoPJQU +bp8QhvjrJMupRKaD89l3JpaRgwb1dxl57V0wKAqpfPBcXS2iElfpq2IZ9DyWOskz +/pIpgXNFt/JNkww6wxFVyPxZJMBpjDzKMY9UBBqtXcrwx7C6J6OlYWeZLFNpSS/+ +4oVoRJEncN25p9pR4mXlzLKnGQW0pjVrKZocAy55g0WXIilwGiauCO6cQO9cufnF +6698eIdj5K0ytmdxSsOiLv75j3tynne55aDF8xQTPsa4IDycpc8t/WlQnBmxT2Cs +y85kVrNoY05+57hxSE1entDMigjbqN0nSrUk2Cp3Mjd47rnrRPA= +=pmUc +-END PGP SIGNATURE- Added: dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm.sha512 == --- dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm.sha512 (added) +++ dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm.sha512 Tue Aug 7 21:03:59 2018 @@ -0,0 +1 @@ +0fceb36d99e1364bddb5398c08c5fc99bc0b9e897948a7ae259827f8b2924e3840d160cae2150a258db1a6361a095ccc4ae311ccfe8201c29d00ee329ae0ab4d apache
svn commit: r28587 - /dev/madlib/KEYS
Author: riyer Date: Tue Aug 7 05:42:30 2018 New Revision: 28587 Log: Update KEYS with key for Rahul Iyer Modified: dev/madlib/KEYS Modified: dev/madlib/KEYS == --- dev/madlib/KEYS (original) +++ dev/madlib/KEYS Tue Aug 7 05:42:30 2018 @@ -178,64 +178,6 @@ oDf5rTpHlVJwfO8Trw== =Dqcb -END PGP PUBLIC KEY BLOCK- -pub 4096R/6C725F40 2016-08-30 [expires: 2018-08-30] -uid [ultimate] Rahul Iyer -sig 36C725F40 2016-08-30 Rahul Iyer -sub 4096R/92694331 2016-08-30 [expires: 2018-08-30] -sig 6C725F40 2016-08-30 Rahul Iyer - --BEGIN PGP PUBLIC KEY BLOCK- -Version: GnuPG v2 - -mQINBFfGHG0BEADwQsuVp75Msqp7z1qiRj1IPC+HVtVA/M8sljTrSGLixtrhtNW9 -Qgj8xISz5AEv7bz8r+qT1xIlKfkFujJkWsrngKKwN7/ausa5AaBTn2KzG8/2KL30 -08uNbBV2vZ901S/zcELe2X0aDU2V0v3LNG3mLMyTqB1/k/D8Y2dRMRYo6TaPdnXi -2FyFPkWWRvG8TtlZCzUPBxxq/gGc7Xs8Dy2p2QwdII+TBLQBAfmAxbGkwMlDUO53 -VTig6BDIsd6wOYL+ZuV0dkNesksdRfpLlBUv8Q7AZfbs03HdpDCzjCOCY69kFOa6 -P4biGHWepbf6cE2GxtU8XY89cbN0Wt7pKzj6c5bAuGRwlrGJ6g8ZdVsGR9XZfe3N -5Oe6gL/oDRCe49DPp8o92j58K6f2HY2wnr80LVeKBdKE5dmZ4Z3twL7aw9i+HeXQ -tjlBbXUdoXR4ESVJvTP6/cYgL2wxsKUVqd7Dzj+Yoy1xfVyfI/DcHOdlQ8ztlZkn -lkEDh9aQr+GulkplySgyQpIB3Xumc34hkDdg3LT0natG/+ZZEkbeUsEcPOrTTaTW -0y3GDtK4g6EGiMD084yC8fct5B8J6ePmttIxZIIveHz2VgqKdtxnC6rSwVU9XG2F -fZfr8SWMwHWX/QrJX5dDhilV4sild3UPt1kBrv2wJ1+RUt6H8dfCc0akEwARAQAB -tB1SYWh1bCBJeWVyIDxyaXllckBhcGFjaGUub3JnPokCPwQTAQgAKQUCV8YcbQIb -AwUJA8JnAAcLCQgHAwIBBhUIAgkKCwQWAgMBAh4BAheAAAoJEFK3lclscl9APrgP -/iniiFZyU3Rs+9ORZGtNeXPzfmTleHwzs2tQt5DGaCx+V7TFa7lEnXd0JTcfO6aJ -NskmKNLXLnhDJFWDB3p/BoboTtkU8HmnzffJdrECEpTZE49dMlpbmLNbWo0ZooWw -UbEWxJFQ/RX3poQ09OAK+2F7rhxe9NNZwIW0z9RtcCr26ttoOxgsAstPuPGjjxyx -QZKVSXA7ArBkt31YB3v2Vl7f6XrZfyVYDDO+dtUQxnwwJrxJi8LB+v8yMoOW7pkx -LO5WMMouQYyCE9rsZjT5Q0a3ji+01pdMbcnc9tGBGBpS0ftKd9H5Q1JzfpFwPXYI -CiYBefaSxej+3okulqLE4Mg7PEFEY2tWs3Vruz3ivfWxAiSwR8ggXWD0GpPiQt/X -HiCh0HTshjW0G3irA13ji9iRtzquOdEAmXmtBRIUNgvhx97oxSOaFvWaz9bumZKW -QDDVvvkGcAMwh4m22Kni9fiSATHP4xjxF+Dq87EEZFpG4hiDblyJFwVuzsPs3/lq -fxAefh9BrcEoArxYQna20FP9lKEsnaTSIeQhPkd5x+8MTLTOH4ehBDdBoy2yVMqn -F5yh9UQ52zdIqFOymUuqD7z1MZzPet27IrvpezgDiFaY2PXLtxL4ulUOfvDbQwV5 -u8y7s1JomFmFilOMu5dEqMtXWDZJOafMG1ZTDygEzpU1uQINBFfGHG0BEADcmAZN -bYU+LbSuKW2MmeW+iVkjX21B+8lRKtSevssa5q5xd2ug9Aw8QWIJFSUl0IA7ZV3l -esnV251gJUf0gDFcAMZs4zjxAA6Pfhh1+M7vpVNyZCp1g25eE1fc156miPVHOd+1 -xLTfuGY9fwfPhpyAW1nWRaM+ZYyfcU61fvZ8DxSgGnFTY7iPXUIy2bU84F/QyM7c -+tPyHbIBnmn9CvrFP+1tAOPOOvomjsgGzCN2q681fKnzQVSfnGq4QKV9Bp98Myww -yoVf54DqLTzoJgApeashCDwJwKDkfbhXCxhAv7yh3cco+vz6kadQlo244uGwF0ID -WP9PfG/+PEFiZwyC+fQ6MOolnMHd2XrQ3T7nZ6su1mLNnuLC3fwBGO1dT05B5TPs -mPGS6r3SGXNlbq2Db5A6CbnLxUZtcFtw4BGeH40JAMMhjkkKWPTVif0zkYBeeil3 -r6TWgm5Q7Y9x6Wa+VSQmCVkt6sdfV2oHVZBj/gtFdVS9shs54TP+neOexvvm8Th0 -DwcZCsBcxzQY4D+cmI/O2XwNp/bwUrD0yzDCET3MqJwl52rgMan7wi3/PEj3kIoz -ncbYj8ja2rAPQjWKg4ffzx247q95KUKi4XGc4qWCUKDndxARHl7+bALNLpTIbjzR -0ZlLtnAwusnY+Szi200z0lkq/2D6Xl5Ea5jm8QARAQABiQIlBBgBCAAPBQJXxhxt -AhsMBQkDwmcAAAoJEFK3lclscl9A4G4P/j+lACZ5uInAz8qdN6UijYeMGQ5JWXe8 -Kt9Ja59KuR3e7Rc+6vSo9qOOCNrJgm+WEqeIMwQyLxwp9Oi4nkpTZCvB2zJRK48I -hvckn8q/dWiBwWL+mnwYHI5JpM2bVsttfXhhRs5Y9XJxXATglFDG8sjZ/uU5xy/+ -R8zIrtVFXRlBD1faDEVVTDOrnHdxIgA9vV2THyls8HSiSpXNfoZorye5T/Srg7fn -6gELbLQUplqhh1l5/8WRwJvkV1m+REK5UDJzj+vEWXdavArWLW0E1Sq3k3jdiU48 -sxkbEVj3HbIL7cMQRBTjb/mI3wYnAKr97lPpi+D6wP93GIwMubnzAaSIkZpV1z2y -bcnaDwLF+FxH6Eli+3JjOc34Wfg0swUbHt3kJfRn0YTQnSoWV7wVGJE+mg1JhCTN -a9DnMofVYNTt5IUqCI+xFa+N4ytIOOuTGno/qof36QYzjXNQ1Bx0i+Nb9+OdHwLT -lgs1twXUptUKVH6OwWelydTSmUySuyuRDfqz/l+kSy6nPSO5Xnovsr1rUFcqOsiD -zBKqOUGiTyapUmTLIXzYhumH8iwLCpAbek//d9mbnpGV2X+k1gPcWNlZbooX9zLV -aRfQK1e3kTz70Q2hKGKLcKO+LHjXsG+OHZCYsS9eNea8gPPVtkppUT1jatFVkfmW -LULzSorJ3Mjh -=QHs0 --END PGP PUBLIC KEY BLOCK- pub 4096R/28D2C789 2017-05-01 uid Rashmi Raghu (CODE SIGNING KEY) sig 328D2C789 2017-05-01 Rashmi Raghu (CODE SIGNING KEY) @@ -412,3 +354,62 @@ MmpdxJaKiBbKOa7Yh58uhPPk0IKt0v7bkOonp+on 52fY8qaQ7r/SZnc= =HiuU -END PGP PUBLIC KEY BLOCK- + +pub rsa4096 2018-08-07 [SC] + FA909C2006632A74E04FFA5063056AE412C4E3D7 +uid [ultimate] Rahul Iyer +sig 363056AE412C4E3D7 2018-08-07 Rahul Iyer +sub rsa4096 2018-08-07 [E] +sig 63056AE412C4E3D7 2018-08-07 Rahul Iyer + +-BEGIN PGP PUBLIC KEY BLOCK- + +mQINBFtpLsABEACg7KSDhekwPSjcS9iuWRVSfQJFzQa1dBIGaoACDGAfpP4zF/OS +zjYe+RCAtttNIdmXRxmPw4xRrm27Sl1CMpogfe98I+nIMDhSAV2z4kJMDZwRzWY9 +OR0bgOl26yQolnQ77MKaBImkR6i7p0SQMSyEEtM/qwt0y4X/3msXF260X9pZgUwl +31ls5sdPPk+yttkvCJeubJ5kMo+oI+SebVe47OUmcT0h8zbAHr5oTAhaQ+lA8Tzf +tLalJfdcYDe4LkCQ5XyOPtIQgnhGAQXBbVKc2c1kX4UqpMFIQv8DaOnyTAnrviWG +chIx0Tk3a3RMDSMd9Inye5hjT8sRxT/4Ya4dFGMFXVDA+0gIElxaE2s1b8dgKyCN +48THNnXLvsql7O+gM2CibIb1lmQ+H9alxJDg/hE58SI8lZcl9qdD+lvSydSEVXij +0VrEhwgPDWY/3AulIW1XMR0Hsiy9OKqsvVGlbUyNkr8ndm5DQVICFBeCdhaO1xt9 +CNTsBWJSnd1aw0Q/Yk4YME0YhUGY5Z/NzB2vMP/MUbeU8VXK21qSRgPWPZM1qY/T
svn commit: r28586 - /release/madlib/KEYS
Author: riyer Date: Tue Aug 7 05:42:14 2018 New Revision: 28586 Log: Update KEYS with key for Rahul Iyer Modified: release/madlib/KEYS Modified: release/madlib/KEYS == --- release/madlib/KEYS (original) +++ release/madlib/KEYS Tue Aug 7 05:42:14 2018 @@ -178,64 +178,6 @@ oDf5rTpHlVJwfO8Trw== =Dqcb -END PGP PUBLIC KEY BLOCK- -pub 4096R/6C725F40 2016-08-30 [expires: 2018-08-30] -uid [ultimate] Rahul Iyer -sig 36C725F40 2016-08-30 Rahul Iyer -sub 4096R/92694331 2016-08-30 [expires: 2018-08-30] -sig 6C725F40 2016-08-30 Rahul Iyer - --BEGIN PGP PUBLIC KEY BLOCK- -Version: GnuPG v2 - -mQINBFfGHG0BEADwQsuVp75Msqp7z1qiRj1IPC+HVtVA/M8sljTrSGLixtrhtNW9 -Qgj8xISz5AEv7bz8r+qT1xIlKfkFujJkWsrngKKwN7/ausa5AaBTn2KzG8/2KL30 -08uNbBV2vZ901S/zcELe2X0aDU2V0v3LNG3mLMyTqB1/k/D8Y2dRMRYo6TaPdnXi -2FyFPkWWRvG8TtlZCzUPBxxq/gGc7Xs8Dy2p2QwdII+TBLQBAfmAxbGkwMlDUO53 -VTig6BDIsd6wOYL+ZuV0dkNesksdRfpLlBUv8Q7AZfbs03HdpDCzjCOCY69kFOa6 -P4biGHWepbf6cE2GxtU8XY89cbN0Wt7pKzj6c5bAuGRwlrGJ6g8ZdVsGR9XZfe3N -5Oe6gL/oDRCe49DPp8o92j58K6f2HY2wnr80LVeKBdKE5dmZ4Z3twL7aw9i+HeXQ -tjlBbXUdoXR4ESVJvTP6/cYgL2wxsKUVqd7Dzj+Yoy1xfVyfI/DcHOdlQ8ztlZkn -lkEDh9aQr+GulkplySgyQpIB3Xumc34hkDdg3LT0natG/+ZZEkbeUsEcPOrTTaTW -0y3GDtK4g6EGiMD084yC8fct5B8J6ePmttIxZIIveHz2VgqKdtxnC6rSwVU9XG2F -fZfr8SWMwHWX/QrJX5dDhilV4sild3UPt1kBrv2wJ1+RUt6H8dfCc0akEwARAQAB -tB1SYWh1bCBJeWVyIDxyaXllckBhcGFjaGUub3JnPokCPwQTAQgAKQUCV8YcbQIb -AwUJA8JnAAcLCQgHAwIBBhUIAgkKCwQWAgMBAh4BAheAAAoJEFK3lclscl9APrgP -/iniiFZyU3Rs+9ORZGtNeXPzfmTleHwzs2tQt5DGaCx+V7TFa7lEnXd0JTcfO6aJ -NskmKNLXLnhDJFWDB3p/BoboTtkU8HmnzffJdrECEpTZE49dMlpbmLNbWo0ZooWw -UbEWxJFQ/RX3poQ09OAK+2F7rhxe9NNZwIW0z9RtcCr26ttoOxgsAstPuPGjjxyx -QZKVSXA7ArBkt31YB3v2Vl7f6XrZfyVYDDO+dtUQxnwwJrxJi8LB+v8yMoOW7pkx -LO5WMMouQYyCE9rsZjT5Q0a3ji+01pdMbcnc9tGBGBpS0ftKd9H5Q1JzfpFwPXYI -CiYBefaSxej+3okulqLE4Mg7PEFEY2tWs3Vruz3ivfWxAiSwR8ggXWD0GpPiQt/X -HiCh0HTshjW0G3irA13ji9iRtzquOdEAmXmtBRIUNgvhx97oxSOaFvWaz9bumZKW -QDDVvvkGcAMwh4m22Kni9fiSATHP4xjxF+Dq87EEZFpG4hiDblyJFwVuzsPs3/lq -fxAefh9BrcEoArxYQna20FP9lKEsnaTSIeQhPkd5x+8MTLTOH4ehBDdBoy2yVMqn -F5yh9UQ52zdIqFOymUuqD7z1MZzPet27IrvpezgDiFaY2PXLtxL4ulUOfvDbQwV5 -u8y7s1JomFmFilOMu5dEqMtXWDZJOafMG1ZTDygEzpU1uQINBFfGHG0BEADcmAZN -bYU+LbSuKW2MmeW+iVkjX21B+8lRKtSevssa5q5xd2ug9Aw8QWIJFSUl0IA7ZV3l -esnV251gJUf0gDFcAMZs4zjxAA6Pfhh1+M7vpVNyZCp1g25eE1fc156miPVHOd+1 -xLTfuGY9fwfPhpyAW1nWRaM+ZYyfcU61fvZ8DxSgGnFTY7iPXUIy2bU84F/QyM7c -+tPyHbIBnmn9CvrFP+1tAOPOOvomjsgGzCN2q681fKnzQVSfnGq4QKV9Bp98Myww -yoVf54DqLTzoJgApeashCDwJwKDkfbhXCxhAv7yh3cco+vz6kadQlo244uGwF0ID -WP9PfG/+PEFiZwyC+fQ6MOolnMHd2XrQ3T7nZ6su1mLNnuLC3fwBGO1dT05B5TPs -mPGS6r3SGXNlbq2Db5A6CbnLxUZtcFtw4BGeH40JAMMhjkkKWPTVif0zkYBeeil3 -r6TWgm5Q7Y9x6Wa+VSQmCVkt6sdfV2oHVZBj/gtFdVS9shs54TP+neOexvvm8Th0 -DwcZCsBcxzQY4D+cmI/O2XwNp/bwUrD0yzDCET3MqJwl52rgMan7wi3/PEj3kIoz -ncbYj8ja2rAPQjWKg4ffzx247q95KUKi4XGc4qWCUKDndxARHl7+bALNLpTIbjzR -0ZlLtnAwusnY+Szi200z0lkq/2D6Xl5Ea5jm8QARAQABiQIlBBgBCAAPBQJXxhxt -AhsMBQkDwmcAAAoJEFK3lclscl9A4G4P/j+lACZ5uInAz8qdN6UijYeMGQ5JWXe8 -Kt9Ja59KuR3e7Rc+6vSo9qOOCNrJgm+WEqeIMwQyLxwp9Oi4nkpTZCvB2zJRK48I -hvckn8q/dWiBwWL+mnwYHI5JpM2bVsttfXhhRs5Y9XJxXATglFDG8sjZ/uU5xy/+ -R8zIrtVFXRlBD1faDEVVTDOrnHdxIgA9vV2THyls8HSiSpXNfoZorye5T/Srg7fn -6gELbLQUplqhh1l5/8WRwJvkV1m+REK5UDJzj+vEWXdavArWLW0E1Sq3k3jdiU48 -sxkbEVj3HbIL7cMQRBTjb/mI3wYnAKr97lPpi+D6wP93GIwMubnzAaSIkZpV1z2y -bcnaDwLF+FxH6Eli+3JjOc34Wfg0swUbHt3kJfRn0YTQnSoWV7wVGJE+mg1JhCTN -a9DnMofVYNTt5IUqCI+xFa+N4ytIOOuTGno/qof36QYzjXNQ1Bx0i+Nb9+OdHwLT -lgs1twXUptUKVH6OwWelydTSmUySuyuRDfqz/l+kSy6nPSO5Xnovsr1rUFcqOsiD -zBKqOUGiTyapUmTLIXzYhumH8iwLCpAbek//d9mbnpGV2X+k1gPcWNlZbooX9zLV -aRfQK1e3kTz70Q2hKGKLcKO+LHjXsG+OHZCYsS9eNea8gPPVtkppUT1jatFVkfmW -LULzSorJ3Mjh -=QHs0 --END PGP PUBLIC KEY BLOCK- pub 4096R/28D2C789 2017-05-01 uid Rashmi Raghu (CODE SIGNING KEY) sig 328D2C789 2017-05-01 Rashmi Raghu (CODE SIGNING KEY) @@ -412,3 +354,62 @@ MmpdxJaKiBbKOa7Yh58uhPPk0IKt0v7bkOonp+on 52fY8qaQ7r/SZnc= =HiuU -END PGP PUBLIC KEY BLOCK- + +pub rsa4096 2018-08-07 [SC] + FA909C2006632A74E04FFA5063056AE412C4E3D7 +uid [ultimate] Rahul Iyer +sig 363056AE412C4E3D7 2018-08-07 Rahul Iyer +sub rsa4096 2018-08-07 [E] +sig 63056AE412C4E3D7 2018-08-07 Rahul Iyer + +-BEGIN PGP PUBLIC KEY BLOCK- + +mQINBFtpLsABEACg7KSDhekwPSjcS9iuWRVSfQJFzQa1dBIGaoACDGAfpP4zF/OS +zjYe+RCAtttNIdmXRxmPw4xRrm27Sl1CMpogfe98I+nIMDhSAV2z4kJMDZwRzWY9 +OR0bgOl26yQolnQ77MKaBImkR6i7p0SQMSyEEtM/qwt0y4X/3msXF260X9pZgUwl +31ls5sdPPk+yttkvCJeubJ5kMo+oI+SebVe47OUmcT0h8zbAHr5oTAhaQ+lA8Tzf +tLalJfdcYDe4LkCQ5XyOPtIQgnhGAQXBbVKc2c1kX4UqpMFIQv8DaOnyTAnrviWG +chIx0Tk3a3RMDSMd9Inye5hjT8sRxT/4Ya4dFGMFXVDA+0gIElxaE2s1b8dgKyCN +48THNnXLvsql7O+gM2CibIb1lmQ+H9alxJDg/hE58SI8lZcl9qdD+lvSydSEVXij +0VrEhwgPDWY/3AulIW1XMR0Hsiy9OKqsvVGlbUyNkr8ndm5DQVICFBeCdhaO1xt9 +CNTsBWJSnd1aw0Q/Yk4YME0YhUGY5Z/NzB2vMP/MUbeU8VXK21qSRgPWPZM1qY/T
[1/2] madlib git commit: DT/RF: Add function to report importance scores
Repository: madlib Updated Branches: refs/heads/master e2534e44e -> 186390f7c DT/RF: Add function to report importance scores JIRA: MADLIB-925 This commit adds a new MADlib function (get_var_importance) to report the importance scores in decision tree and random forest by unnesting the importance values along with corresponding features. Closes #295 Co-authored-by: Rahul Iyer Co-authored-by: Jingyi Mei Co-authored-by: Orhan Kislal Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/1aac377f Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/1aac377f Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/1aac377f Branch: refs/heads/master Commit: 1aac377f68d20290374c004a3a8bb2da82ab1fcc Parents: e2534e4 Author: Nandish Jayaram Authored: Tue Jul 3 12:22:07 2018 -0700 Committer: Rahul Iyer Committed: Wed Aug 1 12:58:22 2018 -0700 -- .../recursive_partitioning/decision_tree.cpp| 11 +- .../recursive_partitioning/decision_tree.hpp| 2 +- .../recursive_partitioning/random_forest.cpp| 15 ++ .../recursive_partitioning/random_forest.hpp| 1 + .../recursive_partitioning/decision_tree.py_in | 10 +- .../recursive_partitioning/decision_tree.sql_in | 102 +++--- .../recursive_partitioning/random_forest.py_in | 187 ++- .../recursive_partitioning/random_forest.sql_in | 168 + .../test/decision_tree.ic.sql_in| 3 +- .../test/decision_tree.sql_in | 46 - .../test/random_forest.sql_in | 20 +- .../test/unit_tests/plpy_mock.py_in | 43 + .../test/unit_tests/test_random_forest.py_in| 173 + 13 files changed, 697 insertions(+), 84 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/1aac377f/src/modules/recursive_partitioning/decision_tree.cpp -- diff --git a/src/modules/recursive_partitioning/decision_tree.cpp b/src/modules/recursive_partitioning/decision_tree.cpp index d249946..0a7f7a5 100644 --- a/src/modules/recursive_partitioning/decision_tree.cpp +++ b/src/modules/recursive_partitioning/decision_tree.cpp @@ -488,7 +488,7 @@ print_decision_tree::run(AnyType ){ } AnyType -get_variable_importance::run(AnyType ){ +compute_variable_importance::run(AnyType ){ Tree dt = args[0].getAs(); const int n_cat_features = args[1].getAs(); const int n_con_features = args[2].getAs(); @@ -497,19 +497,12 @@ get_variable_importance::run(AnyType ){ ColumnVector con_var_importance = ColumnVector::Zero(n_con_features); dt.computeVariableImportance(cat_var_importance, con_var_importance); -// Variable importance is scaled to represent a percentage. Even though -// the importance values are split between categorical and continuous, the -// percentages are relative to the combined set. ColumnVector combined_var_imp(n_cat_features + n_con_features); combined_var_imp << cat_var_importance, con_var_importance; - -// Avoid divide by zero by adding a small number -double total_var_imp = combined_var_imp.sum(); -double VAR_IMP_EPSILON = 1e-6; -combined_var_imp *= (100.0 / (total_var_imp + VAR_IMP_EPSILON)); return combined_var_imp; } + AnyType display_text_tree::run(AnyType ){ Tree dt = args[0].getAs(); http://git-wip-us.apache.org/repos/asf/madlib/blob/1aac377f/src/modules/recursive_partitioning/decision_tree.hpp -- diff --git a/src/modules/recursive_partitioning/decision_tree.hpp b/src/modules/recursive_partitioning/decision_tree.hpp index ae62bfa..8cb6703 100644 --- a/src/modules/recursive_partitioning/decision_tree.hpp +++ b/src/modules/recursive_partitioning/decision_tree.hpp @@ -14,7 +14,7 @@ DECLARE_UDF(recursive_partitioning, compute_surr_stats_transition) DECLARE_UDF(recursive_partitioning, dt_surr_apply) DECLARE_UDF(recursive_partitioning, print_decision_tree) -DECLARE_UDF(recursive_partitioning, get_variable_importance) +DECLARE_UDF(recursive_partitioning, compute_variable_importance) DECLARE_UDF(recursive_partitioning, predict_dt_response) DECLARE_UDF(recursive_partitioning, predict_dt_prob) http://git-wip-us.apache.org/repos/asf/madlib/blob/1aac377f/src/modules/recursive_partitioning/random_forest.cpp -- diff --git a/src/modules/recursive_partitioning/random_forest.cpp b/src/modules/recursive_partitioning/random_forest.cpp index 70ebbaa..a12f095 100644 --- a/src/modules/recursive_partitioning/random_forest.cpp +++ b/src/modules/recursive_partitioning/random_forest.cpp @@ -204,6 +204,21 @@ rf_con_imp_score::run(AnyType ) { //
[2/2] madlib git commit: DT/RF: Fix user doc examples
DT/RF: Fix user doc examples Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/186390f7 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/186390f7 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/186390f7 Branch: refs/heads/master Commit: 186390f7c2af5ad886a4d5b77d0792b68cd3414d Parents: 1aac377 Author: Frank McQuillan Authored: Wed Aug 1 12:49:10 2018 -0700 Committer: Rahul Iyer Committed: Wed Aug 1 12:58:44 2018 -0700 -- .../recursive_partitioning/decision_tree.sql_in | 16 ++-- .../recursive_partitioning/random_forest.sql_in | 12 +++- 2 files changed, 17 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/186390f7/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in -- diff --git a/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in b/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in index 469f1b2..5926152 100644 --- a/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in +++ b/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in @@ -284,14 +284,17 @@ tree_train( impurity_var_importance DOUBLE PRECISION[]. Impurity importance of each variable. The order of the variables is the same as - that of 'independent_varnames' column in the summary table (see below). + that of the 'independent_varnames' column in the summary table (see below). The impurity importance of any feature is the decrease in impurity by a node containing the feature as a primary split, summed over the whole tree. If surrogates are used, then the importance value includes the impurity decrease scaled by the adjusted surrogate agreement. - Reported importance values are normalized to sum to 100 across - all variables. + Importance values are displayed as raw values as per the 'split_criterion' + parameter. + To see importance values normalized to sum to 100 across + all variables, use the importance display helper function + described later on this page. Please refer to [1] for more information on variable importance. @@ -727,7 +730,7 @@ independent_var_types | text, boolean, double precision n_folds | 0 null_proxy | -View the impurity importance table using the helper function: +View the normalized impurity importance table using the helper function: \\x off DROP TABLE IF EXISTS imp_output; @@ -,10 +1114,11 @@ which shows ordering of levels of categorical variables 'vs' and 'cyl': SELECT pruning_cp, cat_levels_in_text, cat_n_levels, impurity_var_importance, tree_depth FROM train_output; +-[ RECORD 1 ]---+ pruning_cp | 0 cat_levels_in_text | {0,1,4,6,8} cat_n_levels| {2,3} -impurity_var_importance | {0,51.8593201959496,10.976977929129,5.31897402755374,31.8447278473677} +impurity_var_importance | {0,22.6309172500675,4.79024943310651,2.321153,13.8967382920111} tree_depth | 4 View the summary table: @@ -1147,7 +1151,7 @@ independent_var_types | integer, integer, double precision, double precisi n_folds | 0 null_proxy | -View the impurity importance table using the helper function: +View the normalized impurity importance table using the helper function: \\x off DROP TABLE IF EXISTS imp_output; http://git-wip-us.apache.org/repos/asf/madlib/blob/186390f7/src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in -- diff --git a/src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in b/src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in index 39b6f5d..5b5a0f0 100644 --- a/src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in +++ b/src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in @@ -164,7 +164,9 @@ forest_train(training_table_name, Due to nature of permutation, the importance value can end up being negative if the number of levels for a categorical variable is small and is unbalanced. In such a scenario, the importance values are shifted to ensure -that the lowest importance value is 0. +that the lowest importance value is 0. To see importance values normalized +to sum to 100 across all variables, use the importance display helper function +described later on this page. @@ -758,7 +760,7 @@ the variables in 'independent_varnames'
madlib git commit: DT/RF: Don't eliminate single-level cat variable
Repository: madlib Updated Branches: refs/heads/master 20f95b33b -> e2534e44e DT/RF: Don't eliminate single-level cat variable JIRA: MADLIB-1258 When DT/RF is run with grouping, a subset of the groups could eliminate a categorical variable leading to multiple issues downstream, including invalid importance values and incorrect prediction. This commit keeps all categorical variables (even if it contains just one level). The accumulator state would use additional space during tree_train for this categorical variable, even though the variable is never consumed by the tree. This inefficiency is still preferred since it yields clean code and error-free prediction/importance reporting. Additional changes: - get_expr_type (validate_args.py) has been updated to return type for multiple expressions at the same time. This prevents calling a separate query for each expression, thus saving time. - Cat features are not stored per tree (in the grouping case) anymore since the features are now consistent across trees. Closes #301 Co-authored-by: Nandish Jayaram Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/e2534e44 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/e2534e44 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/e2534e44 Branch: refs/heads/master Commit: e2534e44ea36aedec843a3a7c48236d0e1104e2c Parents: 20f95b3 Author: Rahul Iyer Authored: Thu Jul 26 12:17:58 2018 -0700 Committer: Rahul Iyer Committed: Wed Aug 1 12:51:13 2018 -0700 -- src/modules/recursive_partitioning/DT_impl.hpp | 91 .../recursive_partitioning/decision_tree.cpp| 21 +- .../recursive_partitioning/decision_tree.py_in | 217 +-- .../recursive_partitioning/random_forest.py_in | 120 +- .../test/decision_tree.sql_in | 83 +++ .../test/random_forest.sql_in | 46 ++-- .../modules/utilities/validate_args.py_in | 49 +++-- 7 files changed, 319 insertions(+), 308 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/e2534e44/src/modules/recursive_partitioning/DT_impl.hpp -- diff --git a/src/modules/recursive_partitioning/DT_impl.hpp b/src/modules/recursive_partitioning/DT_impl.hpp index 69bdc88..75e4ce4 100644 --- a/src/modules/recursive_partitioning/DT_impl.hpp +++ b/src/modules/recursive_partitioning/DT_impl.hpp @@ -518,6 +518,7 @@ DecisionTree::expand(const Accumulator , double gain = impurityGain( state.cat_stats.row(stats_i). segment(fv_index, sps * 2), sps); + if (gain > max_impurity_gain){ max_impurity_gain = gain; max_feat = f; @@ -665,21 +666,29 @@ DecisionTree::pickSurrogates( // 1. Compute the max count and corresponding split threshold for // each categorical and continuous feature + ColumnVector cat_max_thres = ColumnVector::Zero(n_cats); ColumnVector cat_max_count = ColumnVector::Zero(n_cats); IntegerVector cat_max_is_reverse = IntegerVector::Zero(n_cats); Index prev_cum_levels = 0; for (Index each_cat=0; each_cat < n_cats; each_cat++){ Index n_levels = state.cat_levels_cumsum(each_cat) - prev_cum_levels; -Index max_label; -(cat_stats_counts.row(stats_i).segment( -prev_cum_levels * 2, n_levels * 2)).maxCoeff(_label); -cat_max_thres(each_cat) = static_cast(max_label / 2); -cat_max_count(each_cat) = -cat_stats_counts(stats_i, prev_cum_levels*2 + max_label); -// every odd col is for reverse, hence i % 2 == 1 for reverse index i -cat_max_is_reverse(each_cat) = (max_label % 2 == 1) ? 1 : 0; -prev_cum_levels = state.cat_levels_cumsum(each_cat); +if (n_levels > 0){ +Index max_label; +(cat_stats_counts.row(stats_i).segment( +prev_cum_levels * 2, n_levels * 2)).maxCoeff(_label); + +// For each split, there are two stats => +// max_label / 2 gives the split index. A floor +// operation is unnecessary since the threshold will yield +// the same results for n and n+0.5. +cat_max_thres(each_cat) = static_cast(max_label / 2); +cat_max_count(each_cat) = +cat_stats_counts(stats_i, prev_cum_levels*2 + max_label); +// every odd col is for reverse, hence i % 2 == 1 for reverse index i +
madlib git commit: Madpack: Fix missing test logs bug.
Repository: madlib Updated Branches: refs/heads/master 836759e69 -> a0cfcf8f7 Madpack: Fix missing test logs bug. Due to a recent commit, madpack cleaned log files of test operations as well as the atomic operations. As a result, log files are missing even after install/dev check fails. This commit fixes this issue. Closes #300 Co-authored-by: Jingyi Mei Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/a0cfcf8f Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/a0cfcf8f Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/a0cfcf8f Branch: refs/heads/master Commit: a0cfcf8f7fc31179ce0b22b18ca77bad2e65a0e4 Parents: 836759e Author: Orhan Kislal Authored: Wed Jul 25 15:05:08 2018 -0700 Committer: Rahul Iyer Committed: Thu Jul 26 12:18:07 2018 -0700 -- src/madpack/madpack.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/a0cfcf8f/src/madpack/madpack.py -- diff --git a/src/madpack/madpack.py b/src/madpack/madpack.py index 5382bd8..385ab36 100755 --- a/src/madpack/madpack.py +++ b/src/madpack/madpack.py @@ -712,8 +712,8 @@ def _process_py_sql_files_in_modules(modset, args_dict): cur_tmpdir) else: error_(this, "Something is wrong, shouldn't be here: %s" % src_file, True) -shutil.rmtree(cur_tmpdir) - +if calling_operation == DB_CREATE_OBJECTS: +shutil.rmtree(cur_tmpdir) # -- def _execute_per_module_db_create_obj_algo(schema, maddir_mod_py, module, sqlfile, algoname, cur_tmpdir,
madlib git commit: Multiple: Clean and update documentation
Repository: madlib Updated Branches: refs/heads/master 2aac41897 -> 836759e69 Multiple: Clean and update documentation Closes #298 Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/836759e6 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/836759e6 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/836759e6 Branch: refs/heads/master Commit: 836759e69ae617ffc0cd7640cb7ca76b25e69c1d Parents: 2aac418 Author: Frank McQuillan Authored: Tue Jul 24 17:20:18 2018 -0700 Committer: Rahul Iyer Committed: Thu Jul 26 11:33:34 2018 -0700 -- doc/mainpage.dox.in | 174 +-- src/ports/postgres/modules/convex/mlp.sql_in| 3 +- .../modules/utilities/encode_categorical.sql_in | 3 +- .../postgres/modules/utilities/path.sql_in | 3 +- .../postgres/modules/utilities/pivot.sql_in | 3 +- .../modules/utilities/sessionize.sql_in | 3 +- .../postgres/modules/utilities/utilities.sql_in | 5 +- 7 files changed, 94 insertions(+), 100 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/836759e6/doc/mainpage.dox.in -- diff --git a/doc/mainpage.dox.in b/doc/mainpage.dox.in index c2c9a7a..8f97491 100644 --- a/doc/mainpage.dox.in +++ b/doc/mainpage.dox.in @@ -2,13 +2,9 @@ @mainpage Apache MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of -mathematical, statistical and machine learning methods for structured +mathematical, statistical, graph and machine learning methods for structured and unstructured data. -The MADlib mission: to foster widespread development of scalable analytic -skills, by harnessing efforts from commercial practice, academic research, -and open-source development. - Useful links: http://madlib.apache.org;>MADlib web site @@ -21,32 +17,22 @@ Useful links: v1.13, v1.12, v1.11, -v1.10.0, -v1.9.1, -v1.9, -v1.8, -v1.7.1, -v1.7, -v1.6, -v1.5, -v1.4, -v1.3, -v1.2 +v1.10 Please refer to the https://github.com/apache/madlib/blob/master/README.md;>ReadMe file for information about incorporated third-party material. License information -regarding MADlib and included third-party libraries can be found inside the +regarding MADlib and included third-party libraries can be found in the https://github.com/apache/madlib/blob/master/LICENSE;> License directory. @defgroup grp_datatrans Data Types and Transformations -@{Data types and transformation operations @} +@details Data types and operations that transform and shape data. @defgroup grp_arraysmatrix Arrays and Matrices @ingroup grp_datatrans -@brief Mathematical operations for arrays and matrices +@brief Mathematical operations for arrays and matrices. @details These modules provide basic mathematical operations to be run on array and matrices. @@ -100,13 +86,14 @@ complete matrix stored as a distributed table. @defgroup grp_matrix Matrix Operations @defgroup grp_matrix_factorization Matrix Factorization -@brief Matrix Factorization methods including Singular Value Decomposition and Low-rank Matrix Factorization +@brief Linear algebra methods that factorize a matrix into a product of matrices. +@details Linear algebra methods that factorize a matrix into a product of matrices. @{ @defgroup grp_lmf Low-Rank Matrix Factorization @defgroup grp_svd Singular Value Decomposition @} -@defgroup grp_linalg Norms and Distance functions +@defgroup grp_linalg Norms and Distance Functions @defgroup grp_svec Sparse Vectors @} @@ -126,49 +113,58 @@ complete matrix stored as a distributed table. @ingroup grp_datatrans @defgroup grp_graph Graph -Contains graph algorithms. +@brief Graph algorithms and measures associated with graphs. +@details Graph algorithms and measures associated with graphs. @{ @defgroup grp_apsp All Pairs Shortest Path @defgroup grp_bfs Breadth-First Search @defgroup grp_hits HITS + @defgroup grp_graph_measures Measures -Graph Measures +@brief A collection of metrics computed on a graph. +@details A collection of metrics computed on a graph. @{ @defgroup grp_graph_avg_path_length Average Path Length @defgroup grp_graph_closeness Closeness @defgroup grp_graph_diameter Graph Diameter @defgroup grp_graph_vertex_degrees In-Out Degree @} + @defgroup grp_pagerank PageRank @defgroup grp_sssp Single Source Shortest Path @defgroup grp_wcc Weakly Connected Components @} @defgroup grp_mdl Model
madlib git commit: Cols2Vec: Add Apache License header
Repository: madlib Updated Branches: refs/heads/master 4349e7722 -> ebd453cbb Cols2Vec: Add Apache License header Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/ebd453cb Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/ebd453cb Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/ebd453cb Branch: refs/heads/master Commit: ebd453cbbfaaed1f06308d8f10f108337da5a783 Parents: 4349e77 Author: Rahul Iyer Authored: Wed Jul 18 16:30:57 2018 -0700 Committer: Rahul Iyer Committed: Wed Jul 18 16:30:57 2018 -0700 -- .../postgres/modules/utilities/cols2vec.py_in| 19 +++ 1 file changed, 19 insertions(+) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/ebd453cb/src/ports/postgres/modules/utilities/cols2vec.py_in -- diff --git a/src/ports/postgres/modules/utilities/cols2vec.py_in b/src/ports/postgres/modules/utilities/cols2vec.py_in index b38b3d6..4f2b1c9 100644 --- a/src/ports/postgres/modules/utilities/cols2vec.py_in +++ b/src/ports/postgres/modules/utilities/cols2vec.py_in @@ -1,3 +1,22 @@ +# coding=utf-8 +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + """ @file cols2vec.py_in
[1/2] madlib git commit: Utilities: Add cols2vec() to convert columns to array
Repository: madlib Updated Branches: refs/heads/master 3b527b82a -> 950114ccd Utilities: Add cols2vec() to convert columns to array JIRA: MADLIB-1239 This commit adds a new function called cols2vec that can be used to convert features from multiple columns of an input table into a feature array in a single column. Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/2828d86a Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/2828d86a Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/2828d86a Branch: refs/heads/master Commit: 2828d86a64bedddb6849913eaaf7734042922e6e Parents: 3b527b8 Author: Himanshu Pandey Authored: Fri Jun 15 01:33:27 2018 -0700 Committer: Rahul Iyer Committed: Sun Jul 15 23:35:14 2018 -0700 -- doc/mainpage.dox.in | 3 + .../postgres/modules/utilities/cols2vec.py_in | 111 +++ .../postgres/modules/utilities/cols2vec.sql_in | 191 +++ .../modules/utilities/test/cols2vec.sql_in | 89 + 4 files changed, 394 insertions(+) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/2828d86a/doc/mainpage.dox.in -- diff --git a/doc/mainpage.dox.in b/doc/mainpage.dox.in index e41e6c9..341f115 100644 --- a/doc/mainpage.dox.in +++ b/doc/mainpage.dox.in @@ -284,6 +284,9 @@ Contains graph algorithms. @defgroup @grp_utilities Utilities @ingroup grp_other_functions +@defgroup grp_cols2vec Columns to Vector +@ingroup grp_utility_functions + @defgroup grp_early_stage Early Stage Development @brief A collection of implementations which are in early stage of development. There may be some issues that will be addressed in a future version. http://git-wip-us.apache.org/repos/asf/madlib/blob/2828d86a/src/ports/postgres/modules/utilities/cols2vec.py_in -- diff --git a/src/ports/postgres/modules/utilities/cols2vec.py_in b/src/ports/postgres/modules/utilities/cols2vec.py_in new file mode 100644 index 000..ced53e9 --- /dev/null +++ b/src/ports/postgres/modules/utilities/cols2vec.py_in @@ -0,0 +1,111 @@ +""" +@file cols2vec.py_in + +@brief Utility to convert Columns to array + +""" + +import plpy +from utilities.control import MinWarning +from utilities.utilities import split_quoted_delimited_str +from utilities.utilities import _string_to_array +from utilities.utilities import _assert +from utilities.validate_args import columns_exist_in_table +from utilities.validate_args import is_var_valid +from utilities.validate_args import get_cols +from utilities.validate_args import quote_ident +from utilities.utilities import py_list_to_sql_string + + +m4_changequote(`') + + +def validate_cols2vec_args(source_table, output_table, + list_of_features, list_of_features_to_exclude, cols_to_output, **kwargs): +""" +Function to validate input parameters +""" +if list_of_features.strip() != '*': +if not (list_of_features and list_of_features.strip()): +plpy.error("Features to include is empty") +_assert( +columns_exist_in_table( +source_table, split_quoted_delimited_str(list_of_features)), +"Invalid columns to list of features {0}".format(list_of_features)) + +if cols_to_output and cols_to_output.strip() != '*': +_assert( +columns_exist_in_table( +source_table, _string_to_array(cols_to_output)), +"Invalid columns to output list {0}".format(cols_to_output)) + + +def cols2vec(schema_madlib, source_table, output_table, list_of_features, + list_of_features_to_exclude=None, cols_to_output=None, **kwargs): +""" +Args: +@param schema_madlib: Name of MADlib schema +@param model: Name of table containing the tree model +@param source_table:Name of table containing prediction data +@param output_table:Name of table to output the results +@param list_of_features:Comma-separated string of column names or +expressions to put into feature array. +Can also be a '*' implying all columns +are to be put into feature array. +@param list_of_features_to_exclude: Comma-separated string of column names +to exclude from the feature array +@param cols_to_output: Comma-separated string of column names +from the source table to keep in the output
[2/2] madlib git commit: Utilties: Refactor and clean cols2vec from 2828d86
Utilties: Refactor and clean cols2vec from 2828d86 JIRA: MADLIB-1239 Closes #288 Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/950114cc Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/950114cc Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/950114cc Branch: refs/heads/master Commit: 950114ccdbbdd81750624a41390d5a35d11c008a Parents: 2828d86 Author: Rahul Iyer Authored: Thu Jul 12 16:44:57 2018 -0700 Committer: Rahul Iyer Committed: Sun Jul 15 23:36:01 2018 -0700 -- doc/mainpage.dox.in | 5 +- .../postgres/modules/utilities/cols2vec.py_in | 110 ++-- .../postgres/modules/utilities/cols2vec.sql_in | 173 ++- .../modules/utilities/test/cols2vec.sql_in | 54 +++--- .../postgres/modules/utilities/utilities.py_in | 6 +- 5 files changed, 183 insertions(+), 165 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/950114cc/doc/mainpage.dox.in -- diff --git a/doc/mainpage.dox.in b/doc/mainpage.dox.in index 341f115..c2c9a7a 100644 --- a/doc/mainpage.dox.in +++ b/doc/mainpage.dox.in @@ -262,6 +262,9 @@ Contains graph algorithms. @defgroup grp_other_functions Other Functions +@defgroup grp_cols2vec Columns to Vector +@ingroup grp_other_functions + @defgroup grp_linear_solver Linear Solvers @ingroup grp_other_functions @{A collection of methods that implement solutions for systems of consistent linear equations. @} @@ -284,8 +287,6 @@ Contains graph algorithms. @defgroup @grp_utilities Utilities @ingroup grp_other_functions -@defgroup grp_cols2vec Columns to Vector -@ingroup grp_utility_functions @defgroup grp_early_stage Early Stage Development @brief A collection of implementations which are in early stage of development. http://git-wip-us.apache.org/repos/asf/madlib/blob/950114cc/src/ports/postgres/modules/utilities/cols2vec.py_in -- diff --git a/src/ports/postgres/modules/utilities/cols2vec.py_in b/src/ports/postgres/modules/utilities/cols2vec.py_in index ced53e9..b38b3d6 100644 --- a/src/ports/postgres/modules/utilities/cols2vec.py_in +++ b/src/ports/postgres/modules/utilities/cols2vec.py_in @@ -6,15 +6,17 @@ """ import plpy -from utilities.control import MinWarning -from utilities.utilities import split_quoted_delimited_str -from utilities.utilities import _string_to_array -from utilities.utilities import _assert -from utilities.validate_args import columns_exist_in_table -from utilities.validate_args import is_var_valid -from utilities.validate_args import get_cols -from utilities.validate_args import quote_ident -from utilities.utilities import py_list_to_sql_string +from control import MinWarning +from internal.db_utils import quote_literal +from utilities import split_quoted_delimited_str +from utilities import _string_to_array +from utilities import _assert +from utilities import add_postfix +from validate_args import columns_exist_in_table +from validate_args import is_var_valid +from validate_args import get_cols +from validate_args import quote_ident +from utilities import py_list_to_sql_string m4_changequote(`') @@ -31,12 +33,12 @@ def validate_cols2vec_args(source_table, output_table, _assert( columns_exist_in_table( source_table, split_quoted_delimited_str(list_of_features)), -"Invalid columns to list of features {0}".format(list_of_features)) +"Invalid columns in list_of_features {0}".format(list_of_features)) if cols_to_output and cols_to_output.strip() != '*': _assert( columns_exist_in_table( -source_table, _string_to_array(cols_to_output)), +source_table, split_quoted_delimited_str(cols_to_output)), "Invalid columns to output list {0}".format(cols_to_output)) @@ -44,68 +46,64 @@ def cols2vec(schema_madlib, source_table, output_table, list_of_features, list_of_features_to_exclude=None, cols_to_output=None, **kwargs): """ Args: -@param schema_madlib: Name of MADlib schema -@param model: Name of table containing the tree model -@param source_table:Name of table containing prediction data -@param output_table:Name of table to output the results -@param list_of_features:Comma-separated string of column names or -expressions to put into feature array. -Can also be a '*' implying all columns -
madlib git commit: Utils: Simplify proxy quote function
Repository: madlib Updated Branches: refs/heads/master 5e47c8e4c -> 3b527b82a Utils: Simplify proxy quote function Commit 5e47c8e added a wrapper quote_literal function that called plpy.quote_literal if available, else returned dollar-quoted string. We can use Python's introspection to switch between these two options at runtime instead of a compile-time preprocessor switch. Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/3b527b82 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/3b527b82 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/3b527b82 Branch: refs/heads/master Commit: 3b527b82a316fc893ad0695e8805307387351634 Parents: 5e47c8e Author: Rahul Iyer Authored: Fri Jul 13 10:59:33 2018 -0700 Committer: Rahul Iyer Committed: Fri Jul 13 10:59:33 2018 -0700 -- src/ports/greenplum/cmake/GreenplumUtils.cmake | 3 +-- src/ports/postgres/cmake/PostgreSQLUtils.cmake | 4 src/ports/postgres/modules/internal/db_utils.py_in | 8 3 files changed, 5 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/3b527b82/src/ports/greenplum/cmake/GreenplumUtils.cmake -- diff --git a/src/ports/greenplum/cmake/GreenplumUtils.cmake b/src/ports/greenplum/cmake/GreenplumUtils.cmake index 5ec271e..0fc1637 100644 --- a/src/ports/greenplum/cmake/GreenplumUtils.cmake +++ b/src/ports/greenplum/cmake/GreenplumUtils.cmake @@ -9,9 +9,8 @@ function(define_greenplum_features IN_VERSION OUT_FEATURES) list(APPEND ${OUT_FEATURES} __HAS_FUNCTION_PROPERTIES__) endif() -if(NOT ${IN_VERSION} VERSION_LESS "6.0") +if(${IN_VERSION} VERSION_GREATER "4.3") list(APPEND ${OUT_FEATURES} __HAS_BOOL_TO_TEXT_CAST__) -list(APPEND ${OUT_FEATURES} __HAS_PLPY_QUOTE_FUNCTIONS__) endif() # Pass values to caller http://git-wip-us.apache.org/repos/asf/madlib/blob/3b527b82/src/ports/postgres/cmake/PostgreSQLUtils.cmake -- diff --git a/src/ports/postgres/cmake/PostgreSQLUtils.cmake b/src/ports/postgres/cmake/PostgreSQLUtils.cmake index e08effe..0139015 100644 --- a/src/ports/postgres/cmake/PostgreSQLUtils.cmake +++ b/src/ports/postgres/cmake/PostgreSQLUtils.cmake @@ -6,10 +6,6 @@ function(define_postgresql_features IN_VERSION OUT_FEATURES) list(APPEND ${OUT_FEATURES} __HAS_BOOL_TO_TEXT_CAST__) endif() -if(NOT ${IN_VERSION} VERSION_LESS "9.1") -list(APPEND ${OUT_FEATURES} __HAS_PLPY_QUOTE_FUNCTIONS__) -endif() - # Pass values to caller set(${OUT_FEATURES} "${${OUT_FEATURES}}" PARENT_SCOPE) endfunction(define_postgresql_features) http://git-wip-us.apache.org/repos/asf/madlib/blob/3b527b82/src/ports/postgres/modules/internal/db_utils.py_in -- diff --git a/src/ports/postgres/modules/internal/db_utils.py_in b/src/ports/postgres/modules/internal/db_utils.py_in index 4c41515..c75babf 100644 --- a/src/ports/postgres/modules/internal/db_utils.py_in +++ b/src/ports/postgres/modules/internal/db_utils.py_in @@ -24,8 +24,6 @@ from utilities.validate_args import get_expr_type m4_changequote(`') QUOTE_DELIMITER="$__madlib__$" -HAS_PLPY_QUOTE_FUNCTIONS = m4_ifdef(, -, ); def get_distinct_col_levels(source_table, col_name, col_type=None): @@ -73,9 +71,11 @@ def quote_literal(input_str): provided as a proxy for that platform. For all other platforms this function, forwards the argument to plpy.quote_literal. """ -if HAS_PLPY_QUOTE_FUNCTIONS: +try: return plpy.quote_literal(input_str) -else: +except AttributeError: +# plpy.quote_literal is not supported, we work around by returning +# dollar-quoted string with obscure tag return "{qd}{input_str}{qd}".format(qd=QUOTE_DELIMITER, input_str=input_str) # --
madlib git commit: Utils: Add a Python quote_literal for GP platforms
Repository: madlib Updated Branches: refs/heads/master e64dba4eb -> 5e47c8e4c Utils: Add a Python quote_literal for GP platforms Versions prior to GPBD 6 or Postgresql 9.1 do not provide plpy.quote_literal which is necessary for building a SQL text array from a Python list of strings. We work around this limitation by creating our own quote_literal function that just returns plpy.quote_literal output for platforms that provide the function. For other platforms, we compromise by using dollar-quoting (with a obscure tag between the dollars). Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/5e47c8e4 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/5e47c8e4 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/5e47c8e4 Branch: refs/heads/master Commit: 5e47c8e4cce205c5ecfda5e2e1d6bdc0a7330603 Parents: e64dba4 Author: Rahul Iyer Authored: Thu Jul 12 22:46:07 2018 -0700 Committer: Rahul Iyer Committed: Fri Jul 13 00:40:41 2018 -0700 -- src/ports/greenplum/cmake/GreenplumUtils.cmake | 3 +- src/ports/postgres/cmake/PostgreSQLUtils.cmake | 4 + src/ports/postgres/modules/convex/mlp_igd.py_in | 4 +- .../postgres/modules/internal/db_utils.py_in| 77 +++- 4 files changed, 50 insertions(+), 38 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/5e47c8e4/src/ports/greenplum/cmake/GreenplumUtils.cmake -- diff --git a/src/ports/greenplum/cmake/GreenplumUtils.cmake b/src/ports/greenplum/cmake/GreenplumUtils.cmake index 0fc1637..5ec271e 100644 --- a/src/ports/greenplum/cmake/GreenplumUtils.cmake +++ b/src/ports/greenplum/cmake/GreenplumUtils.cmake @@ -9,8 +9,9 @@ function(define_greenplum_features IN_VERSION OUT_FEATURES) list(APPEND ${OUT_FEATURES} __HAS_FUNCTION_PROPERTIES__) endif() -if(${IN_VERSION} VERSION_GREATER "4.3") +if(NOT ${IN_VERSION} VERSION_LESS "6.0") list(APPEND ${OUT_FEATURES} __HAS_BOOL_TO_TEXT_CAST__) +list(APPEND ${OUT_FEATURES} __HAS_PLPY_QUOTE_FUNCTIONS__) endif() # Pass values to caller http://git-wip-us.apache.org/repos/asf/madlib/blob/5e47c8e4/src/ports/postgres/cmake/PostgreSQLUtils.cmake -- diff --git a/src/ports/postgres/cmake/PostgreSQLUtils.cmake b/src/ports/postgres/cmake/PostgreSQLUtils.cmake index 0139015..e08effe 100644 --- a/src/ports/postgres/cmake/PostgreSQLUtils.cmake +++ b/src/ports/postgres/cmake/PostgreSQLUtils.cmake @@ -6,6 +6,10 @@ function(define_postgresql_features IN_VERSION OUT_FEATURES) list(APPEND ${OUT_FEATURES} __HAS_BOOL_TO_TEXT_CAST__) endif() +if(NOT ${IN_VERSION} VERSION_LESS "9.1") +list(APPEND ${OUT_FEATURES} __HAS_PLPY_QUOTE_FUNCTIONS__) +endif() + # Pass values to caller set(${OUT_FEATURES} "${${OUT_FEATURES}}" PARENT_SCOPE) endfunction(define_postgresql_features) http://git-wip-us.apache.org/repos/asf/madlib/blob/5e47c8e4/src/ports/postgres/modules/convex/mlp_igd.py_in -- diff --git a/src/ports/postgres/modules/convex/mlp_igd.py_in b/src/ports/postgres/modules/convex/mlp_igd.py_in index 3ab7f45..7df44ec 100644 --- a/src/ports/postgres/modules/convex/mlp_igd.py_in +++ b/src/ports/postgres/modules/convex/mlp_igd.py_in @@ -33,7 +33,7 @@ from convex.utils_regularization import __utils_normalize_data_grouping from internal.db_utils import get_distinct_col_levels from internal.db_utils import get_one_hot_encoded_expr -from internal.db_utils import quote_literal_python_list +from internal.db_utils import quote_literal from utilities.control import MinWarning from utilities.in_mem_group_control import GroupIterationController from utilities.utilities import _array_to_string @@ -145,7 +145,7 @@ def mlp(schema_madlib, source_table, output_table, independent_varname, dim=2) if is_classification: if pp_summary_dict["class_values"]: -classes = quote_literal_python_list(pp_summary_dict["class_values"]) +classes = [quote_literal(c) for c in pp_summary_dict["class_values"]] num_output_nodes = len(classes) else: # Assume that the dependent variable is already one-hot-encoded http://git-wip-us.apache.org/repos/asf/madlib/blob/5e47c8e4/src/ports/postgres/modules/internal/db_utils.py_in -- diff --git a/src/ports/postgres/modules/internal/db_utils.py_in b/src/ports/postgres/modules/internal/db_utils.py_in index e82ba91..4c41515 100644 --- a/src/ports/postgres/modules/internal/db_utils.py_in
madlib git commit: Madpack: Fix glob expansion for dev-check
Repository: madlib Updated Branches: refs/heads/master a47cd1ff5 -> e64dba4eb Madpack: Fix glob expansion for dev-check Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/e64dba4e Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/e64dba4e Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/e64dba4e Branch: refs/heads/master Commit: e64dba4ebe2c2918a1c3a54cb83e55e1875a7261 Parents: a47cd1f Author: Rahul Iyer Authored: Thu Jul 12 17:11:54 2018 -0700 Committer: Rahul Iyer Committed: Thu Jul 12 22:18:02 2018 -0700 -- src/madpack/madpack.py | 13 - 1 file changed, 8 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/e64dba4e/src/madpack/madpack.py -- diff --git a/src/madpack/madpack.py b/src/madpack/madpack.py index f21f2c0..1444c26 100755 --- a/src/madpack/madpack.py +++ b/src/madpack/madpack.py @@ -329,10 +329,11 @@ def _parse_result_logfile(retval, logfile, sql_abspath, "|Time: %d milliseconds" % (milliseconds) if result == 'FAIL': -info_(this, "Failed executing %s" % sql_abspath, True) -info_(this, "Check the log at %s" % logfile, True) +error_(this, "Failed executing %s" % sql_abspath, stop=False) +error_(this, "Check the log at %s" % logfile, stop=False) return result + def _check_db_port(portid): """ Make sure we are connected to the expected DB platform @@ -888,11 +889,13 @@ def run_install_check(args, testcase, madpack_cmd): % (test_user, test_schema, schema) # Loop through all test SQL files for this module +ic_sql_files = set(glob.glob(maddir_mod_sql + '/' + module + '/test/*.ic.sql_in')) if is_install_check: -sql_files = maddir_mod_sql + '/' + module + '/test/*.ic.sql_in' +sql_files = ic_sql_files else: -sql_files = maddir_mod_sql + '/' + module + '/test/*[!ic].sql_in' -for sqlfile in sorted(glob.glob(sql_files), reverse=True): +all_sql_files = set(glob.glob(maddir_mod_sql + '/' + module + '/test/*.sql_in')) +sql_files = all_sql_files - ic_sql_files +for sqlfile in sorted(sql_files): algoname = os.path.basename(sqlfile).split('.')[0] # run only algo specified if (modset and modset[module] and
[2/5] madlib git commit: SVM: Compute average loss per row instead of total loss
SVM: Compute average loss per row instead of total loss Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/ceab57f3 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/ceab57f3 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/ceab57f3 Branch: refs/heads/master Commit: ceab57f31ddf15a1de8621a22633e052ba0028ff Parents: ac4a51f Author: Rahul Iyer Authored: Tue Jul 10 13:47:39 2018 -0700 Committer: Rahul Iyer Committed: Thu Jul 12 13:31:22 2018 -0700 -- src/modules/convex/linear_svm_igd.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/ceab57f3/src/modules/convex/linear_svm_igd.cpp -- diff --git a/src/modules/convex/linear_svm_igd.cpp b/src/modules/convex/linear_svm_igd.cpp index f396250..79dc496 100644 --- a/src/modules/convex/linear_svm_igd.cpp +++ b/src/modules/convex/linear_svm_igd.cpp @@ -192,7 +192,7 @@ internal_linear_svm_igd_result::run(AnyType ) { AnyType tuple; tuple << state.task.model -<< static_cast(state.algo.loss) +<< static_cast(state.algo.loss / state.algo.numRows) << state.algo.gradient.norm() << static_cast(state.algo.numRows);
[4/5] madlib git commit: Multiple: Update docs related to CV
Multiple: Update docs related to CV JIRA: MADLIB-1250 This commit updates documentation to reflect latest changes in cross validation. An additional minor change is made to MLP docs to use 'AVG' instead of 'SUM/COUNT'. Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/11ecdc7e Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/11ecdc7e Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/11ecdc7e Branch: refs/heads/master Commit: 11ecdc7e6309c6ebdb070ffeda6ac2cbaafa18c2 Parents: 834f543 Author: Frank McQuillan Authored: Wed Jul 11 11:02:02 2018 -0700 Committer: Rahul Iyer Committed: Thu Jul 12 13:39:31 2018 -0700 -- src/ports/postgres/modules/convex/mlp.sql_in| 4 +- .../modules/elastic_net/elastic_net.sql_in | 34 +- src/ports/postgres/modules/svm/svm.sql_in | 622 +-- 3 files changed, 453 insertions(+), 207 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/11ecdc7e/src/ports/postgres/modules/convex/mlp.sql_in -- diff --git a/src/ports/postgres/modules/convex/mlp.sql_in b/src/ports/postgres/modules/convex/mlp.sql_in index 13ae4a0..9fba404 100644 --- a/src/ports/postgres/modules/convex/mlp.sql_in +++ b/src/ports/postgres/modules/convex/mlp.sql_in @@ -1164,7 +1164,7 @@ SELECT * FROM lin_housing JOIN mlp_regress_prediction USING (id) ORDER BY id; RMS error: -SELECT SQRT(SUM(ABS(y-estimated_y))/COUNT(y)) as rms_error FROM lin_housing +SELECT SQRT(AVG((y-estimated_y)*(y-estimated_y))) as rms_error FROM lin_housing JOIN mlp_regress_prediction USING (id); @@ -1256,7 +1256,7 @@ SELECT *, ABS(y-estimated_y) as abs_diff FROM lin_housing JOIN mlp_regress_predi RMS error: -SELECT SQRT(SUM(ABS(y-estimated_y))/COUNT(y)) as rms_error FROM lin_housing +SELECT SQRT(AVG((y-estimated_y)*(y-estimated_y))) as rms_error FROM lin_housing JOIN mlp_regress_prediction USING (id); http://git-wip-us.apache.org/repos/asf/madlib/blob/11ecdc7e/src/ports/postgres/modules/elastic_net/elastic_net.sql_in -- diff --git a/src/ports/postgres/modules/elastic_net/elastic_net.sql_in b/src/ports/postgres/modules/elastic_net/elastic_net.sql_in index 5ea2efb..838a6bd 100644 --- a/src/ports/postgres/modules/elastic_net/elastic_net.sql_in +++ b/src/ports/postgres/modules/elastic_net/elastic_net.sql_in @@ -231,8 +231,14 @@ cross validation is used. Also, cross validation is not supported if grouping i Hyperparameter optimization can be carried out using the built-in cross validation mechanism, which is activated by assigning a value greater than 1 to -the parameter \e n_folds. Negative misclassification error is used -for classification and negative root mean squared error is used for regression. +the parameter \e n_folds. + +The cross validation scores are the mean and standard deviation +of the accuracy when predicted on the validation fold, +averaged over all folds and all rows. For classification, the accuracy +metric used is the ratio of correct classifications. For regression, the +accuracy metric used is the negative of mean squared error (negative to +make it a concave problem, thus selecting \e max means the highest accuracy). The values of a parameter to cross validate should be provided in a list. For example, to regularize with the L1 norm and use a lambda value @@ -775,20 +781,20 @@ iteration_run | 1 -# Details of the cross validation: -SELECT * FROM houses_en3_cv ORDER BY lambda_value DESC, alpha ASC; +SELECT * FROM houses_en3_cv ORDER BY mean_neg_loss DESC; - alpha | lambda_value |mean_neg_loss | std_neg_loss +--++--- - 0.0 | 10.0 | -1.617365261170+55 | 1.26711815498+55 - 0.0 |100.0 | -63555.0502789 |3973.78527042 - 0.0 | 0.1 | -37136.5397256 |9022.78236248 - 0.1 | 10.0 | -3.260479720340+53 | 9.10745448826+53 - 0.1 |100.0 | -63445.8310011 |3965.83900962 - 0.1 | 0.1 | -37192.0390897 |9058.79757772 - 1.0 | 10.0 | -64569.8882099 | 4051.1856361 - 1.0 |100.0 | -38121.9154268 |9332.65800111 - 1.0 | 0.1 | -38117.5477067 |9384.36765881 + alpha | lambda_value | mean_neg_loss | std_neg_loss +---+--+--+ + 0.0 | 0.1 | -36094.4685768 | 10524.4473253 + 0.1 | 0.1 | -36136.2448004 | 10682.4136993 + 1.0 |100.0 | -37007.9496501 | 12679.3781975 + 1.0 | 0.1 | -37018.1019927 | 12716.7438015 + 0.1 |100.0 |
[3/5] madlib git commit: CV: Fix incorrect dict index + change output columns
CV: Fix incorrect dict index + change output columns JIRA: MADLIB-1250 Cross validation had a minor bug that didn't fully index into a two-level nested dictionary. This led to a KeyError while writing CV results to an output table. This has been fixed in this commit. Additionally, the CV output table columns are called 'mean_score' and 'std_dev_score', instead of 'mean_neg_loss' and 'std_neg_loss' to not confuse with the loss function used in the primary modeling technique. Closes #287 Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/834f543e Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/834f543e Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/834f543e Branch: refs/heads/master Commit: 834f543eefdbf321c2ce014d64f909138559c357 Parents: ceab57f Author: Rahul Iyer Authored: Tue Jul 3 14:28:21 2018 -0700 Committer: Rahul Iyer Committed: Thu Jul 12 13:39:31 2018 -0700 -- src/ports/postgres/modules/svm/test/svm.sql_in | 4 +++- .../validation/internal/cross_validation.py_in | 16 2 files changed, 11 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/834f543e/src/ports/postgres/modules/svm/test/svm.sql_in -- diff --git a/src/ports/postgres/modules/svm/test/svm.sql_in b/src/ports/postgres/modules/svm/test/svm.sql_in index ad4b9ac..cba370c 100644 --- a/src/ports/postgres/modules/svm/test/svm.sql_in +++ b/src/ports/postgres/modules/svm/test/svm.sql_in @@ -581,7 +581,9 @@ SELECT svm_classification( 'gaussian', 'n_components=3, fit_intercept=true', NULL, - 'max_iter=2, n_folds=3, lambda=[0.01, 0.1, 0.5]'); + 'init_stepsize=[0.01, 0.1], max_iter=2, n_folds=3, lambda=[0.01, 0.1, 0.5], validation_result=m7_cv'); + +SELECT * FROM m7_cv; SELECT svm_predict('m7','svm_test_data', 'id', 'svm_test_7'); SELECT http://git-wip-us.apache.org/repos/asf/madlib/blob/834f543e/src/ports/postgres/modules/validation/internal/cross_validation.py_in -- diff --git a/src/ports/postgres/modules/validation/internal/cross_validation.py_in b/src/ports/postgres/modules/validation/internal/cross_validation.py_in index 84e52e9..c173533 100644 --- a/src/ports/postgres/modules/validation/internal/cross_validation.py_in +++ b/src/ports/postgres/modules/validation/internal/cross_validation.py_in @@ -67,8 +67,8 @@ class ValidationResult(object): List of dictionaries. Each dictionary contains the following three keys: - - mean_neg_loss: float, average of scores using sub_args - - std_neg_loss: float, standard deviation of scores using sub_args + - mean_score: float, average of scores using sub_args + - std_dev_score: float, standard deviation of scores using sub_args - sub_args: dict, the values of arguments being validated """ def __init__(self, cv_history=None): @@ -98,12 +98,12 @@ class ValidationResult(object): def add_one(self, mean, std, sub_args): """Add one record to the history""" -record = dict(mean_neg_loss=mean, std_neg_loss=std, sub_args=sub_args) +record = dict(mean_score=mean, std_dev_score=std, sub_args=sub_args) self._cv_history.append(record) def sorted(self): """Sort the history w.r.t. mean value and return a new ValidationResult object""" -ch = sorted(self._cv_history, reverse=True, key=itemgetter('mean_neg_loss')) +ch = sorted(self._cv_history, reverse=True, key=itemgetter('mean_score')) return ValidationResult(ch) def first(self, attr=None): @@ -112,7 +112,7 @@ class ValidationResult(object): Parameters == attr : string, optional - Any string in {'mean_neg_loss', 'std_neg_loss', 'sub_args'} or None + Any string in {'mean_score', 'std_dev_score', 'sub_args'} or None Returns === @@ -133,13 +133,14 @@ class ValidationResult(object): def output_tbl(self, tbl_name): """Create a table tbl_name that contains the history -The columns of tbl_name are mean_neg_loss, std_neg_loss and the leaf keys in sub_args. +The columns of tbl_name are mean_score, std_dev_score and the leaf keys in sub_args. All column types are assumed to be double precision. """ if not tbl_name or not str(tbl_name).strip(): return -header = self._cv_history[0]['sub_args'].keys() + ['mean_neg_loss', 'std_neg_loss'] +header = (self._cv_history[0]['sub_args']['params_dict'].keys() + +
[5/5] madlib git commit: Utilities: Add check for any array type
Utilities: Add check for any array type Co-authored-by: Nikhil Kak Closes #293 Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/a47cd1ff Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/a47cd1ff Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/a47cd1ff Branch: refs/heads/master Commit: a47cd1ff533a271e32470074986872e7bd278cbe Parents: 11ecdc7 Author: Arvind Sridhar Authored: Mon Jul 9 16:14:48 2018 -0700 Committer: Rahul Iyer Committed: Thu Jul 12 14:03:08 2018 -0700 -- .../test/unit_tests/test_utilities.py_in| 3 +++ .../postgres/modules/utilities/utilities.py_in | 22 +--- 2 files changed, 17 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/a47cd1ff/src/ports/postgres/modules/utilities/test/unit_tests/test_utilities.py_in -- diff --git a/src/ports/postgres/modules/utilities/test/unit_tests/test_utilities.py_in b/src/ports/postgres/modules/utilities/test/unit_tests/test_utilities.py_in index 407a3c0..2d2c481 100644 --- a/src/ports/postgres/modules/utilities/test/unit_tests/test_utilities.py_in +++ b/src/ports/postgres/modules/utilities/test/unit_tests/test_utilities.py_in @@ -243,6 +243,9 @@ class UtilitiesTestCase(unittest.TestCase): self.assertFalse(s.is_valid_psql_type('boolean[]', s.INCLUDE_ARRAY | s.ONLY_ARRAY)) self.assertFalse(s.is_valid_psql_type('boolean', s.ONLY_ARRAY)) self.assertFalse(s.is_valid_psql_type('boolean[]', s.ONLY_ARRAY)) +self.assertTrue(s.is_valid_psql_type('boolean[]', s.ANY_ARRAY)) +self.assertTrue(s.is_valid_psql_type('boolean[]', s.INTEGER | s.ANY_ARRAY)) +self.assertFalse(s.is_valid_psql_type('boolean', s.ANY_ARRAY)) if __name__ == '__main__': unittest.main() http://git-wip-us.apache.org/repos/asf/madlib/blob/a47cd1ff/src/ports/postgres/modules/utilities/utilities.py_in -- diff --git a/src/ports/postgres/modules/utilities/utilities.py_in b/src/ports/postgres/modules/utilities/utilities.py_in index 55b6983..d571b40 100644 --- a/src/ports/postgres/modules/utilities/utilities.py_in +++ b/src/ports/postgres/modules/utilities/utilities.py_in @@ -175,34 +175,40 @@ TEXT = set(['text', 'varchar', 'character varying', 'char', 'character']) BOOLEAN = set(['boolean']) INCLUDE_ARRAY = set([unique_string('__include_array__')]) ONLY_ARRAY = set([unique_string('__only_array__')]) +ANY_ARRAY = set([unique_string('__any_array__')]) + def is_valid_psql_type(arg, valid_types): """ Verify if argument is a valid type Args: @param arg: str. Name of the Postgres type to validate -@param valid_types: set. Set of type names to look into. -This is typically created using the global types -created in this module. -Two non-type flags are provided: +@param valid_types: set. Set of valid type names to search. +This is typically created using the global names +in this module. +Three non-type flags are provided +(in descending order of precedence): +- ANY_ARRAY: check if arg is any array type - ONLY_ARRAY: indicates that only array forms of the valid types should be checked - INCLUDE_ARRAY: indicates that array and scalar forms of the valid types should be checked -If both ONLY_ARRAY and INCLUDE_ARRAY are present, -then ONLY_ARRAY takes precedence - Examples: 1. valid_types = BOOLEAN | INTEGER | TEXT 2. valid_types = BOOLEAN | INTEGER | ONLY_ARRAY 3. valid_types = NUMERIC | INCLUDE_ARRAY """ +if not arg or not valid_types: +return False +if ANY_ARRAY <= valid_types: +return arg.rstrip().endswith('[]') if ONLY_ARRAY <= valid_types: -return ('[]' in arg and arg.rstrip('[]') in valid_types) +return (arg.rstrip().endswith('[]') and arg.rstrip('[] ') in valid_types) if INCLUDE_ARRAY <= valid_types: # Remove the [] from end of the arg type # The single space is needed to ensure trailing white space is stripped arg = arg.rstrip('[] ') return (arg in valid_types) +# -- def is_psql_numeric_type(arg, exclude=None):
[1/5] madlib git commit: Build: Remove symlinks during rpm uninstall
Repository: madlib Updated Branches: refs/heads/master 5f80ba978 -> a47cd1ff5 Build: Remove symlinks during rpm uninstall JIRA: MADLIB-1175 `rpm --install` creates three symlinks to `Versions/`, `.../bin`, and `.../doc`. These symlinks should be deleted during `rpm --erase`. Additionally, we also delete `Versions/` if it is empty after the erase. Closes #286 Co-Authored-by: Arvind Sridhar Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/ac4a51f0 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/ac4a51f0 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/ac4a51f0 Branch: refs/heads/master Commit: ac4a51f0a8aacd72f884f6cebb27f43b19948ccd Parents: 5f80ba9 Author: Rahul Iyer Authored: Tue Jul 3 12:02:55 2018 -0700 Committer: Rahul Iyer Committed: Thu Jul 12 13:28:54 2018 -0700 -- deploy/CMakeLists.txt| 1 + deploy/rpm_post_uninstall.sh | 26 ++ 2 files changed, 27 insertions(+) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/ac4a51f0/deploy/CMakeLists.txt -- diff --git a/deploy/CMakeLists.txt b/deploy/CMakeLists.txt index 32023bd..f8000df 100644 --- a/deploy/CMakeLists.txt +++ b/deploy/CMakeLists.txt @@ -54,6 +54,7 @@ add_subdirectory(gppkg) # -- Finally do the packaging! - set(CPACK_RPM_POST_INSTALL_SCRIPT_FILE "${CMAKE_CURRENT_SOURCE_DIR}/rpm_post.sh") +set(CPACK_RPM_POST_UNINSTALL_SCRIPT_FILE "${CMAKE_CURRENT_SOURCE_DIR}/rpm_post_uninstall.sh") set(CPACK_PREFLIGHT_SCRIPT ${CMAKE_CURRENT_SOURCE_DIR}/preflight.sh) set(CPACK_POSTFLIGHT_SCRIPT ${CMAKE_CURRENT_SOURCE_DIR}/postflight.sh) set(CPACK_MONOLITHIC_INSTALL 1) http://git-wip-us.apache.org/repos/asf/madlib/blob/ac4a51f0/deploy/rpm_post_uninstall.sh -- diff --git a/deploy/rpm_post_uninstall.sh b/deploy/rpm_post_uninstall.sh new file mode 100755 index 000..c67fd34 --- /dev/null +++ b/deploy/rpm_post_uninstall.sh @@ -0,0 +1,26 @@ +# coding=utf-8 +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +# remove symlinks created during rpm install +find $RPM_INSTALL_PREFIX/madlib/Current -depth -type l -exec rm {} \; 2>/dev/null +find $RPM_INSTALL_PREFIX/madlib/bin -depth -type l -exec rm {} \; 2>/dev/null +find $RPM_INSTALL_PREFIX/madlib/doc -depth -type l -exec rm {} \; 2>/dev/null + +# remove "Versions" directory if it's empty +rmdir $RPM_INSTALL_PREFIX/madlib/Versions 2>/dev/null
madlib git commit: Utilites: Add CTAS while dropping some columns
Repository: madlib Updated Branches: refs/heads/master 59ad96a04 -> 5f80ba978 Utilites: Add CTAS while dropping some columns JIRA: MADLIB-1241 This commit adds function to create a new table from existing table while dropping some of the columns of the original table. Closes #282 Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/5f80ba97 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/5f80ba97 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/5f80ba97 Branch: refs/heads/master Commit: 5f80ba9781efb3526e422a0867ae1d34b49c7ac8 Parents: 59ad96a Author: Rahul Iyer Authored: Thu Jul 12 10:08:52 2018 -0700 Committer: Rahul Iyer Committed: Thu Jul 12 10:08:52 2018 -0700 -- doc/mainpage.dox.in | 16 ++-- .../utilities/test/drop_madlib_temp.ic.sql_in | 23 -- .../utilities/test/drop_madlib_temp.sql_in | 16 .../modules/utilities/test/utilities.ic.sql_in | 58 + .../modules/utilities/test/utilities.sql_in | 87 .../postgres/modules/utilities/utilities.py_in | 37 - .../postgres/modules/utilities/utilities.sql_in | 43 +- 7 files changed, 227 insertions(+), 53 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/5f80ba97/doc/mainpage.dox.in -- diff --git a/doc/mainpage.dox.in b/doc/mainpage.dox.in index 8681eb4..e41e6c9 100644 --- a/doc/mainpage.dox.in +++ b/doc/mainpage.dox.in @@ -261,12 +261,9 @@ Contains graph algorithms. @ingroup grp_topic_modelling -@defgroup grp_utility_functions Utility Functions -@defgroup @grp_utilities Developer Database Functions -@ingroup grp_utility_functions - +@defgroup grp_other_functions Other Functions @defgroup grp_linear_solver Linear Solvers -@ingroup grp_utility_functions +@ingroup grp_other_functions @{A collection of methods that implement solutions for systems of consistent linear equations. @} @defgroup grp_dense_linear_solver Dense Linear Systems @@ -276,13 +273,16 @@ Contains graph algorithms. @ingroup grp_linear_solver @defgroup grp_minibatch_preprocessing Mini-Batch Preprocessor -@ingroup grp_utility_functions +@ingroup grp_other_functions @defgroup grp_pmml PMML Export -@ingroup grp_utility_functions +@ingroup grp_other_functions @defgroup grp_text_utilities Term Frequency -@ingroup grp_utility_functions +@ingroup grp_other_functions + +@defgroup @grp_utilities Utilities +@ingroup grp_other_functions @defgroup grp_early_stage Early Stage Development @brief A collection of implementations which are in early stage of development. http://git-wip-us.apache.org/repos/asf/madlib/blob/5f80ba97/src/ports/postgres/modules/utilities/test/drop_madlib_temp.ic.sql_in -- diff --git a/src/ports/postgres/modules/utilities/test/drop_madlib_temp.ic.sql_in b/src/ports/postgres/modules/utilities/test/drop_madlib_temp.ic.sql_in deleted file mode 100644 index 7879385..000 --- a/src/ports/postgres/modules/utilities/test/drop_madlib_temp.ic.sql_in +++ /dev/null @@ -1,23 +0,0 @@ -/* --- *//** - * - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, - * software distributed under the License is distributed on an - * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY - * KIND, either express or implied. See the License for the - * specific language governing permissions and limitations - * under the License. - * - *//* --- */ - --- cleanup -SELECT cleanup_madlib_temp_tables(quote_ident(current_schema())); http://git-wip-us.apache.org/repos/asf/madlib/blob/5f80ba97/src/ports/postgres/modules/utilities/test/drop_madlib_temp.sql_in -- diff --git a/src/ports/postgres/modules/utilities/test/drop_madlib_temp.sql_in b/src/ports/postgres/modules/utilities/test/drop_madlib_temp.sql_in deleted file mode 100644 index f902361..000 --- a/src/ports/postgres/modules/utilities/test/drop_madlib_temp.sql_in +++
[1/2] madlib git commit: Upgrade: Fix multiple bugs
Repository: madlib Updated Branches: refs/heads/master 8e34f68d7 -> b88f60464 Upgrade: Fix multiple bugs 1. Appended schema_madlib to the mlp_igd_final return type. The missing schema name caused the upgrade to fail from 1.12 to 1.x if there was a dependency on mlp_igd_final. 2. A new changelist was created for changes from v1.14 to 1.15-dev. We will rename this at the 1.15 release from 1.14_1.15-dev.yaml to 1.14_1.15.yaml. 3. Commit 8e34f68 added a new function called `_write_to_file` that takes 2 arguments. Some of the calls to this function were not passing the first file handle argument. Closes #278 Co-authored-by : Orhan Kislal Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/89bcdb78 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/89bcdb78 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/89bcdb78 Branch: refs/heads/master Commit: 89bcdb785816716f2ef6c5e4599edbf95584595d Parents: 8e34f68 Author: Nikhil Kak Authored: Fri Jun 15 08:18:34 2018 -0700 Committer: Rahul Iyer Committed: Fri Jun 15 08:22:34 2018 -0700 -- src/madpack/changelist_1.12_1.13.yaml | 2 +- src/madpack/changelist_1.14_1.15-dev.yaml | 58 ++ src/madpack/upgrade_util.py | 14 +++ 3 files changed, 66 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/89bcdb78/src/madpack/changelist_1.12_1.13.yaml -- diff --git a/src/madpack/changelist_1.12_1.13.yaml b/src/madpack/changelist_1.12_1.13.yaml index 0e6c3df..49169c3 100644 --- a/src/madpack/changelist_1.12_1.13.yaml +++ b/src/madpack/changelist_1.12_1.13.yaml @@ -49,7 +49,7 @@ udf: rettype: void argument: character varying, character varying, character varying, character varying - mlp_igd_final: -rettype: mlp_step_result +rettype: schema_madlib.mlp_step_result argument: double precision[] - mlp_igd_transition: rettype: double precision[] http://git-wip-us.apache.org/repos/asf/madlib/blob/89bcdb78/src/madpack/changelist_1.14_1.15-dev.yaml -- diff --git a/src/madpack/changelist_1.14_1.15-dev.yaml b/src/madpack/changelist_1.14_1.15-dev.yaml new file mode 100644 index 000..88bb886 --- /dev/null +++ b/src/madpack/changelist_1.14_1.15-dev.yaml @@ -0,0 +1,58 @@ +# -- +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# -- + +# Changelist for MADlib version 1.14 to 1.15 + +# This file contains all changes that were introduced in a new version of +# MADlib. This changelist is used by the upgrade script to detect what objects +# should be upgraded (while retaining all other objects from the previous version) + +# New modules (actually .sql_in files) added in upgrade version +# For these files the sql_in code is retained as is with the functions in the +# file installed on the upgrade version. All other files (that don't have +# updates), are cleaned up to remove object replacements +new module: +# - Changes from 1.14 to 1.15 + + +# Changes in the types (UDT) including removal and modification +udt: + +# List of the UDF changes that affect the user externally. This includes change +# in function name, return type, argument order or types, or removal of +# the function. In each case, the original function is as good as removed and a +# new function is created. In such cases, we should abort the upgrade if there +# are user views dependent on this function, since the original function will +# not be present in the upgraded version. +udf: +# - Changes from 1.14 to 1.15 -- + + +# Changes to aggregates (UDA) including removal and modification +# Overloaded functions should be mentioned separately +uda: + +# Casts (UDC) updated/removed +udc:
[1/3] madlib git commit: DT: Don't use NULL value to get dep_var type
Repository: madlib Updated Branches: refs/heads/master ccc3a1832 -> abef95ec9 DT: Don't use NULL value to get dep_var type JIRA: MADLIB-1233 Function `_is_dep_categorical` is used to obtain the type of the dependent variable expression. This function gets a random value using `LIMIT 1` and checks the type of the corresponding value in Python. Further this does not filter out NULL values. Since NULL values are not filtered out, it's possible the `LIMIT 1` returns a "None" type in Python, leading to incorrect results. This commit updates the type extraction by checking the type in the database instead of in Python and also filters out NULL values. Additionally it checks if at least one non-NULL value is obtained, else throws an appropriate error. Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/26f61e91 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/26f61e91 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/26f61e91 Branch: refs/heads/master Commit: 26f61e9110f12804c76ca707f52f1774d8844a7c Parents: ccc3a18 Author: Rahul Iyer Authored: Tue May 1 14:24:34 2018 -0700 Committer: Rahul Iyer Committed: Thu May 31 17:03:30 2018 -0700 -- .../recursive_partitioning/decision_tree.py_in | 18 +- .../recursive_partitioning/decision_tree.sql_in | 206 +-- .../modules/utilities/validate_args.py_in | 27 ++- 3 files changed, 135 insertions(+), 116 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/26f61e91/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in -- diff --git a/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in b/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in index 6f64234..48b8fab 100644 --- a/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in +++ b/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in @@ -31,7 +31,7 @@ from utilities.utilities import _assert from utilities.utilities import extract_keyvalue_params from utilities.utilities import unique_string from utilities.utilities import add_postfix -from utilities.utilities import is_psql_numeric_type +from utilities.utilities import is_psql_numeric_type, is_psql_boolean_type from utilities.utilities import split_quoted_delimited_str from utilities.utilities import py_list_to_sql_string # @@ -56,6 +56,11 @@ def _tree_validate_args( "Decision tree error: Invalid data table.") _assert(table_exists(training_table_name), "Decision tree error: Data table is missing.") +_assert(not table_is_empty(training_table_name, + _get_filter_str(dependent_variable, grouping_cols)), +"Decision tree error: Data table ({0}) is empty " +"(after filtering invalid tuples)". +format(training_table_name)) _assert(not table_exists(output_table_name, only_first_schema=True), "Decision tree error: Output table already exists.") @@ -567,10 +572,13 @@ def _is_dep_categorical(training_table_name, dependent_variable): @brief Sample the dependent variable to check whether it is a categorical variable. """ -sample_dep = plpy.execute("SELECT " + dependent_variable + - " AS dep FROM " + - training_table_name + " LIMIT 1")[0]['dep'] -return (not isinstance(sample_dep, float), isinstance(sample_dep, bool)) +sample_dep = get_expr_type(dependent_variable, training_table_name) +is_dep_numeric = is_psql_numeric_type(sample_dep, + exclude=['smallint', + 'integer', + 'bigint']) +is_dep_bool = is_psql_boolean_type(sample_dep) +return (not is_dep_numeric, is_dep_bool) # http://git-wip-us.apache.org/repos/asf/madlib/blob/26f61e91/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in -- diff --git a/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in b/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in index a3c4963..8e69d9b 100644 --- a/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in +++ b/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in @@ -25,7 +25,7 @@ m4_include(`SQLCommon.m4') @brief -Decision trees are tree-based supervised learning methods +Decision trees are tree-based supervised learning methods
[2/3] madlib git commit: DT: Ensure summary table has correct features
DT: Ensure summary table has correct features JIRA: MADLIB-1236 If a cat_feature is dropped (due to just a single level), that feature should not be included in the summary table list, since tree_predict uses the features in summary table while reading source table. This commit ensures the right features are populated in the summary table. Closes #268 Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/ef52d871 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/ef52d871 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/ef52d871 Branch: refs/heads/master Commit: ef52d87198d73db272ef033f5c7c0f26b2956a0b Parents: 26f61e9 Author: Rahul Iyer Authored: Thu May 3 11:38:27 2018 -0700 Committer: Rahul Iyer Committed: Thu May 31 17:03:37 2018 -0700 -- .../recursive_partitioning/decision_tree.py_in | 51 1 file changed, 30 insertions(+), 21 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/ef52d871/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in -- diff --git a/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in b/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in index 48b8fab..04fde7e 100644 --- a/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in +++ b/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in @@ -56,11 +56,6 @@ def _tree_validate_args( "Decision tree error: Invalid data table.") _assert(table_exists(training_table_name), "Decision tree error: Data table is missing.") -_assert(not table_is_empty(training_table_name, - _get_filter_str(dependent_variable, grouping_cols)), -"Decision tree error: Data table ({0}) is empty " -"(after filtering invalid tuples)". -format(training_table_name)) _assert(not table_exists(output_table_name, only_first_schema=True), "Decision tree error: Output table already exists.") @@ -95,6 +90,12 @@ def _tree_validate_args( _assert(max_depth >= 0 and max_depth < 100, "Decision tree error: maximum tree depth must be positive and less than 100.") +_assert(not table_is_empty(training_table_name, + _get_filter_str(dependent_variable, grouping_cols)), +"Decision tree error: Data table ({0}) is empty " +"(after filtering invalid tuples)". +format(training_table_name)) + _assert(cp >= 0, "Decision tree error: cp must be non-negative.") _assert(min_split > 0, "Decision tree error: min_split must be positive.") _assert(min_bucket > 0, "Decision tree error: min_bucket must be positive.") @@ -510,8 +511,7 @@ def tree_train(schema_madlib, training_table_name, output_table_name, def _create_output_tables(schema_madlib, training_table_name, output_table_name, - tree_states, bins, - split_criterion, cat_features, con_features, + tree_states, bins, split_criterion, id_col_name, dependent_variable, list_of_features, is_classification, n_all_rows, n_rows, dep_list, cp, all_cols_types, grouping_cols=None, @@ -519,19 +519,19 @@ def _create_output_tables(schema_madlib, training_table_name, output_table_name, n_folds=0, null_proxy=None, **kwargs): if not grouping_cols: _create_result_table(schema_madlib, tree_states[0], - bins['cat_origin'], bins['cat_n'], cat_features, - con_features, output_table_name, + bins['cat_origin'], bins['cat_n'], bins['cat_features'], + bins['con_features'], output_table_name, use_existing_tables, running_cv, n_folds) else: _create_grp_result_table( -schema_madlib, tree_states, bins, cat_features, -con_features, output_table_name, grouping_cols, training_table_name, -use_existing_tables, running_cv, n_folds) +schema_madlib, tree_states, bins, bins['cat_features'], +bins['con_features'], output_table_name, grouping_cols, +training_table_name, use_existing_tables, running_cv, n_folds) failed_groups = sum(row['finished'] != 1 for row in tree_states) _create_summary_table( schema_madlib, split_criterion, training_table_name, -output_table_name, id_col_name, cat_features, con_features, +output_table_name, id_col_name, bins['cat_features'], bins['con_features'],
[3/3] madlib git commit: Logregr: Report error if output table is empty
Logregr: Report error if output table is empty JIRA MADLIB-1172 When the model cannot be generated due to ill-conditioned input data, the output table doesn't get populated. In this case, we report back an error instead of creating the empty table. Closes #270 Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/abef95ec Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/abef95ec Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/abef95ec Branch: refs/heads/master Commit: abef95ec99d2797fa7a51c8d4548d88a656d7364 Parents: ef52d87 Author: Himanshu Pandey Authored: Thu May 31 18:44:41 2018 -0700 Committer: Rahul Iyer Committed: Thu May 31 18:47:32 2018 -0700 -- .../postgres/modules/regress/logistic.py_in | 157 +-- .../modules/regress/test/logistic.sql_in| 69 ++-- 2 files changed, 92 insertions(+), 134 deletions(-) -- http://git-wip-us.apache.org/repos/asf/madlib/blob/abef95ec/src/ports/postgres/modules/regress/logistic.py_in -- diff --git a/src/ports/postgres/modules/regress/logistic.py_in b/src/ports/postgres/modules/regress/logistic.py_in index 76cbb6a..77ea465 100644 --- a/src/ports/postgres/modules/regress/logistic.py_in +++ b/src/ports/postgres/modules/regress/logistic.py_in @@ -153,7 +153,8 @@ def __logregr_validate_args(schema_madlib, tbl_source, tbl_output, dep_col, plpy.error("Logregr error: Invalid output table name!") if (table_exists(tbl_output, only_first_schema=True)): -plpy.error("Output table name already exists. Drop the table before calling the function.") +plpy.error("Output table name already exists. Drop the table before " + "calling the function.") if not dep_col or dep_col.strip().lower() in ('null', ''): plpy.error("Logregr error: Invalid dependent column name!") @@ -164,7 +165,6 @@ def __logregr_validate_args(schema_madlib, tbl_source, tbl_output, dep_col, if not ind_col or ind_col.lower() in ('null', ''): plpy.error("Logregr error: Invalid independent column name!") - if grouping_col is not None: if grouping_col == '': plpy.error("Logregr error: Invalid grouping columns name!") @@ -173,14 +173,14 @@ def __logregr_validate_args(schema_madlib, tbl_source, tbl_output, dep_col, plpy.error("Logregr error: Grouping column does not exist!") intersect = frozenset(_string_to_array(grouping_col)).intersection( -frozenset(('coef', 'log_likelihood', 'std_err', 'z_stats', - 'p_values', 'odds_ratios', 'condition_no', - 'num_processed', 'num_missing_rows_skipped', - 'variance_covariance'))) +frozenset(('coef', 'log_likelihood', 'std_err', 'z_stats', + 'p_values', 'odds_ratios', 'condition_no', + 'num_processed', 'num_missing_rows_skipped', + 'variance_covariance'))) if len(intersect) > 0: plpy.error("Logregr error: Conflicted grouping column name.\n" "Predefined name(s) {0} are not allow!".format( -', '.join(intersect))) + ', '.join(intersect))) if max_iter <= 0: plpy.error("Logregr error: Maximum number of iterations must be positive!") @@ -231,12 +231,12 @@ def __logregr_train_compute(schema_madlib, tbl_source, tbl_output, dep_col, 'cg': "__logregr_cg_result", 'igd': "__logregr_igd_result"} -plpy.execute("select {schema_madlib}.create_schema_pg_temp()".format(**args)) -plpy.execute( -""" -drop table if exists pg_temp.{tbl_logregr_args}; -create table pg_temp.{tbl_logregr_args} as -select +plpy.execute("SELECT {schema_madlib}.create_schema_pg_temp()". + format(**args)) +plpy.execute(""" +DROP TABLE IF EXISTS pg_temp.{tbl_logregr_args}; +CREATE TABLE pg_temp.{tbl_logregr_args} as +SELECT {max_iter} as max_iter, {tolerance} as tolerance """.format(**args)) @@ -257,7 +257,8 @@ def __logregr_train_compute(schema_madlib, tbl_source, tbl_output, dep_col, dep_col, ind_col, optimizer, grouping_col=grouping_col, grouping_str=grouping_str, - col_grp_iteration=args["col_grp_iteration"], + col_grp_iteration=args[ + "col_grp_iteration"],
[3/3] madlib git commit: Multiple: Remove support for HAWQ from all modules
Multiple: Remove support for HAWQ from all modules With HAWQ support removed for the past few versions, we can eliminate all the code that was specifically written for that port. This includes madpack changes for upgrade and reinstall, workarounds in multiple modules for table updates, and special consideration in Iteration Controllers. Closes #267 Project: http://git-wip-us.apache.org/repos/asf/madlib/repo Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/34ca6188 Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/34ca6188 Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/34ca6188 Branch: refs/heads/master Commit: 34ca6188ecb51577d5994699636a231d2615c548 Parents: 0b8507d Author: Rahul IyerAuthored: Sun Apr 29 21:18:35 2018 -0700 Committer: Rahul Iyer Committed: Fri May 11 10:30:36 2018 -0700 -- HAWQ_Install.txt| 70 RELEASE_NOTES | 21 +- ReadMe_Build.txt| 22 +- deploy/hawq_install.sh | 199 methods/array_ops/src/pg_gp/array_ops.sql_in| 2 - .../svec_util/src/pg_gp/sql/svec_test.sql_in| 5 - pom.xml | 13 - src/config/Ports.yml| 5 +- src/madpack/madpack.py | 153 +++-- src/madpack/upgrade_util.py | 21 +- src/madpack/utilities.py| 2 - .../elastic_net/elastic_net_optimizer_fista.hpp | 8 +- src/modules/linalg/metric.cpp | 94 -- src/modules/linalg/metric.hpp | 7 - src/ports/CMakeLists.txt| 1 - src/ports/hawq/1.2/CMakeLists.txt | 5 - src/ports/hawq/1.2/config/CMakeLists.txt| 19 -- src/ports/hawq/1.2/config/Modules.yml | 47 --- src/ports/hawq/1.3/CMakeLists.txt | 5 - src/ports/hawq/1.3/config/CMakeLists.txt| 19 -- src/ports/hawq/1.3/config/Modules.yml | 46 --- src/ports/hawq/2/CMakeLists.txt | 19 -- src/ports/hawq/CMakeLists.txt | 309 -- src/ports/hawq/cmake/FindHAWQ.cmake | 26 -- src/ports/hawq/cmake/FindHAWQ_1_2.cmake | 2 - src/ports/hawq/cmake/FindHAWQ_1_3.cmake | 2 - src/ports/hawq/cmake/FindHAWQ_2.cmake | 21 -- src/ports/hawq/cmake/HAWQUtils.cmake| 16 - src/ports/postgres/cmake/PostgreSQLUtils.cmake | 6 - .../modules/assoc_rules/assoc_rules.sql_in | 18 +- src/ports/postgres/modules/bayes/bayes.py_in| 15 +- src/ports/postgres/modules/convex/lmf.sql_in| 5 - src/ports/postgres/modules/convex/lmf_igd.py_in | 4 +- src/ports/postgres/modules/crf/crf.py_in| 3 +- .../postgres/modules/crf/crf_feature_gen.py_in | 2 - .../modules/elastic_net/elastic_net.sql_in | 57 ++-- src/ports/postgres/modules/graph/apsp.py_in | 234 -- src/ports/postgres/modules/graph/pagerank.py_in | 2 +- src/ports/postgres/modules/graph/sssp.py_in | 91 +- src/ports/postgres/modules/graph/wcc.py_in | 55 +--- src/ports/postgres/modules/kmeans/kmeans.py_in | 74 ++--- src/ports/postgres/modules/kmeans/kmeans.sql_in | 8 +- .../postgres/modules/kmeans/test/kmeans.sql_in | 4 +- src/ports/postgres/modules/lda/lda.py_in| 2 +- src/ports/postgres/modules/linalg/linalg.sql_in | 27 -- src/ports/postgres/modules/linalg/svd.py_in | 19 +- src/ports/postgres/modules/pca/pca.py_in| 20 +- .../modules/regress/clustered_variance.py_in| 76 + .../postgres/modules/regress/marginal.py_in | 39 --- .../modules/regress/multilogistic.py_in | 84 + .../postgres/modules/regress/test/linear.sql_in | 6 - .../modules/regress/test/logistic.sql_in| 6 +- .../modules/regress/test/marginal.sql_in| 2 - .../modules/regress/test/multilogistic.sql_in | 6 +- .../postgres/modules/regress/test/robust.sql_in | 4 - .../modules/stats/cox_prop_hazards.py_in| 70 +--- .../modules/stats/cox_prop_hazards.sql_in | 10 - .../modules/stats/test/cox_prop_hazards.sql_in | 4 +- src/ports/postgres/modules/tsa/arima.py_in | 6 +- .../postgres/modules/utilities/control.py_in| 234 -- .../modules/utilities/control_composite.py_in | 127 ++-- .../modules/utilities/group_control.py_in | 321 +-- .../postgres/modules/utilities/utilities.py_in | 30 +- .../modules/utilities/validate_args.py_in | 2 +- .../validation/test/cross_validation.sql_in | 3 - 65 files changed, 460 insertions(+), 2375 deletions(-) --
[2/3] madlib git commit: Multiple: Remove support for HAWQ from all modules
http://git-wip-us.apache.org/repos/asf/madlib/blob/34ca6188/src/ports/postgres/modules/convex/lmf_igd.py_in -- diff --git a/src/ports/postgres/modules/convex/lmf_igd.py_in b/src/ports/postgres/modules/convex/lmf_igd.py_in index a2d42f6..300b6d6 100644 --- a/src/ports/postgres/modules/convex/lmf_igd.py_in +++ b/src/ports/postgres/modules/convex/lmf_igd.py_in @@ -58,9 +58,8 @@ def compute_lmf_igd(schema_madlib, rel_args, rel_state, rel_source, (_src.{col_row})::integer, (_src.{col_column})::integer, (_src.{col_value})::integer, -m4_ifdef(`__HAWQ__', `{{__state__}}', ` (SELECT _state FROM {rel_state} -WHERE _iteration = {iteration})'), +WHERE _iteration = {iteration}), (_args.row_dim)::integer, (_args.column_dim)::integer, (_args.max_rank)::integer, @@ -75,4 +74,3 @@ def compute_lmf_igd(schema_madlib, rel_args, rel_state, rel_source, """): break return iterationCtrl.iteration - http://git-wip-us.apache.org/repos/asf/madlib/blob/34ca6188/src/ports/postgres/modules/crf/crf.py_in -- diff --git a/src/ports/postgres/modules/crf/crf.py_in b/src/ports/postgres/modules/crf/crf.py_in index dfd4754..2eaa1e8 100644 --- a/src/ports/postgres/modules/crf/crf.py_in +++ b/src/ports/postgres/modules/crf/crf.py_in @@ -80,8 +80,7 @@ def __runIterativeAlg(stateType, initialState, source, updateExpr, SET client_min_messages = error; DROP TABLE IF EXISTS _madlib_iterative_alg; CREATE TEMPORARY TABLE _madlib_iterative_alg ( -_madlib_iteration INTEGER -m4_ifdef(`__HAWQ__', `', ` PRIMARY KEY'), +_madlib_iteration INTEGER PRIMARY KEY, _madlib_state {stateType} ) m4_ifdef(`__POSTGRESQL__', `', `DISTRIBUTED BY (_madlib_iteration)'); http://git-wip-us.apache.org/repos/asf/madlib/blob/34ca6188/src/ports/postgres/modules/crf/crf_feature_gen.py_in -- diff --git a/src/ports/postgres/modules/crf/crf_feature_gen.py_in b/src/ports/postgres/modules/crf/crf_feature_gen.py_in index 3fa6d40..f55037f 100644 --- a/src/ports/postgres/modules/crf/crf_feature_gen.py_in +++ b/src/ports/postgres/modules/crf/crf_feature_gen.py_in @@ -238,12 +238,10 @@ def generate_test_features(schema_madlib, test_segment_tbl, rtbl_name_idx = add_postfix(rtbl_name, "_idx") -m4_ifdef(`__HAWQ__', `', ` plpy.execute(""" CREATE INDEX {rtbl_name_idx} ON {viterbi_rtbl} (seg_text) """.format(rtbl_name_idx = rtbl_name_idx, viterbi_rtbl = viterbi_rtbl)) -') origClientMinMessages = plpy.execute("""SELECT setting AS setting FROM pg_settings WHERE name = \'client_min_messages\';""") http://git-wip-us.apache.org/repos/asf/madlib/blob/34ca6188/src/ports/postgres/modules/elastic_net/elastic_net.sql_in -- diff --git a/src/ports/postgres/modules/elastic_net/elastic_net.sql_in b/src/ports/postgres/modules/elastic_net/elastic_net.sql_in index f367774..5ea2efb 100644 --- a/src/ports/postgres/modules/elastic_net/elastic_net.sql_in +++ b/src/ports/postgres/modules/elastic_net/elastic_net.sql_in @@ -129,7 +129,7 @@ A value of 1 means L1 regularization, and a value of 0 means L2 regularization.< FLOAT8. Regularization parameter (must be positive). standardize (optional) -BOOLEAN, default: TRUE. Whether to normalize the data or not. +BOOLEAN, default: TRUE. Whether to normalize the data or not. Setting to TRUE usually yields better results and faster convergence. grouping_col (optional) @@ -141,14 +141,14 @@ a single model is generated for all data. @note Expressions are not currently supported for 'grouping_col'. optimizer (optional) -TEXT, default: 'fista'. Name of optimizer, either 'fista' or 'igd'. -FISTA [2] is an algorithm with a fast global rate of convergence for +TEXT, default: 'fista'. Name of optimizer, either 'fista' or 'igd'. +FISTA [2] is an algorithm with a fast global rate of convergence for solving linear inverse problems. Incremental gradient descent (IGD) is a stochastic approach to minimizing an objective function [4]. optimizer_params (optional) -TEXT, default: NULL. Optimizer parameters, delimited with commas. -These parameters differ depending on the value of \e optimizer parameter. +TEXT, default: NULL. Optimizer parameters, delimited with commas. +These parameters differ depending on the value of \e optimizer parameter. See the
[42/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/dense__linear__systems_8sql__in.html -- diff --git a/docs/v1.14/dense__linear__systems_8sql__in.html b/docs/v1.14/dense__linear__systems_8sql__in.html new file mode 100644 index 000..c47dffe --- /dev/null +++ b/docs/v1.14/dense__linear__systems_8sql__in.html @@ -0,0 +1,640 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: dense_linear_systems.sql_in File Reference + + + + + + + + + $(document).ready(initResizable); + + + + + + $(document).ready(function() { init_search(); }); + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +http://cdn.mathjax.org/mathjax/latest/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.14 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +var searchBox = new SearchBox("searchBox", "search",false,'Search'); + + + + + + + + + + + + +$(document).ready(function(){initNavTree('dense__linear__systems_8sql__in.html','');}); + + + + + + + + + + + + + + +Functions + +dense_linear_systems.sql_in File Reference + + + +SQL functions for linear systems. +More... + + +Functions +bytea8dense_residual_norm_transition (bytea8 state, float8[] a, float8 b, float8[] x) + +bytea8dense_residual_norm_merge_states (bytea8 state1, bytea8 state2) + +residual_norm_resultdense_residual_norm_final (bytea8 state) + +aggregate residual_norm_resultdense_residual_norm (float8[] left_hand_side, float8 right_hand_side, float8[] solution) +Compute the residual after solving the dense linear systems. More... + +float8 []dense_direct_linear_system_transition (float8[] state, integer row_id, float8[] a, float8 b, integer num_rows, integer algorithm) + +float8 []dense_direct_linear_system_merge_states (float8[] state1, float8[] state2) + +dense_linear_solver_resultdense_direct_linear_system_final (float8[] state) + +aggregate dense_linear_solver_resultdense_direct_linear_system (integer row_id, float8[] left_hand_side, float8 right_hand_side, integer numEquations, integer algorithm) +Solve a system of linear equations using the direct method. More... + +varcharlinear_solver_dense (varchar input_string) +Help function, to print out the supported families. More... + +varcharlinear_solver_dense () + +voidlinear_solver_dense (varchar source_table, varchar out_table, varchar row_id, varchar left_hand_side, varchar right_hand_side, varchar grouping_cols, varchar optimizer, varchar optimizer_options) +A wrapper function for the various marginal linear_systemsion analyzes. More... + +voidlinear_solver_dense (varchar source_table, varchar out_table, varchar row_id, varchar left_hand_side, varchar right_hand_side) +Marginal effects with default variables. More... + + +Detailed Description +DateJuly 2013 +See alsoComputes the solution of a consistent linear system, for more details see the module description at Dense Linear Systems +Function Documentation + +dense_direct_linear_system() + + + + + + aggregate dense_linear_solver_result dense_direct_linear_system + ( + integer + row_id, + + + + + float8 [] + left_hand_side, + + + + + float8 + right_hand_side, + + + + + integer + numEquations, + + + + + integer + algorithm + + + + ) + + + + +Parameters + +row_idColumn containing the row_id +left_hand_sideColumn containing the left hand side of the system +right_hand_sideColumn containing the right hand side of the system +numEquationsNumber of equations +algorithmAlgorithm used for the dense linear solver + + + +ReturnsA composite value: +solution FLOAT8[] - Array of marginal effects +residual_norm FLOAT8 - Norm of the residual +iters INTEGER - Iterations taken + + +Usage +Get all the diagnostic statistics: + SELECT linear_system_dense(row_id, + left_hand_side, +
[47/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/clustered__variance__coxph_8sql__in.html -- diff --git a/docs/v1.14/clustered__variance__coxph_8sql__in.html b/docs/v1.14/clustered__variance__coxph_8sql__in.html new file mode 100644 index 000..3cf0e4f --- /dev/null +++ b/docs/v1.14/clustered__variance__coxph_8sql__in.html @@ -0,0 +1,489 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: clustered_variance_coxph.sql_in File Reference + + + + + + + + + $(document).ready(initResizable); + + + + + + $(document).ready(function() { init_search(); }); + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +http://cdn.mathjax.org/mathjax/latest/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.14 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +var searchBox = new SearchBox("searchBox", "search",false,'Search'); + + + + + + + + + + + + +$(document).ready(function(){initNavTree('clustered__variance__coxph_8sql__in.html','');}); + + + + + + + + + + + + + + +Functions + +clustered_variance_coxph.sql_in File Reference + + + +SQL functions for clustered robust cox proportional hazards regression. +More... + + +Functions +varcharclustered_variance_coxph () + +varcharclustered_variance_coxph (varchar message) + +voidclustered_variance_coxph (text model_table, text output_table, text clustervar) + +float8 []coxph_a_b_transition (float8[], integer, boolean, float8[], float8) + +float8 []coxph_a_b_merge (float8[], float8[]) + +__coxph_a_b_resultcoxph_a_b_final (float8[]) + +aggregate __coxph_a_b_resultcoxph_a_b (integer, boolean, float8[], float8) + +float8 []coxph_compute_w (float8[] x, boolean status, float8[] coef, float8[] h, float8 s, float8 a, float8[] b) + +__coxph_cl_var_resultcoxph_compute_clustered_stats (float8[] coef, float8[] hessian, float8[] a) + +voidrobust_variance_coxph (varchar model_table, varchar output_table, varchar clustervar) + + +Detailed Description +DateOct 2013 +See alsoFor a brief introduction to clustered robust cox regression, see the module description Clustered Variance +Function Documentation + +clustered_variance_coxph() [1/3] + + + + + + varchar clustered_variance_coxph + ( + ) + + + + + + + + +clustered_variance_coxph() [2/3] + + + + + + varchar clustered_variance_coxph + ( + varchar + message) + + + + + + + + +clustered_variance_coxph() [3/3] + + + + + + void clustered_variance_coxph + ( + text + model_table, + + + + + text + output_table, + + + + + text + clustervar + + + + ) + + + + + + + + +coxph_a_b() + + + + + + aggregate __coxph_a_b_result coxph_a_b + ( + integer + , + + + + + boolean + , + + + + + float8 + [], + + + + + float8 + + + + + ) + + + + + + + + +coxph_a_b_final() + + + + + + __coxph_a_b_result coxph_a_b_final + ( + float8 + []) + + + + + + + + +coxph_a_b_merge() + + + + + + float8 [] coxph_a_b_merge + ( + float8 + [], + + + + + float8 + [] + + + + ) + + + + + + + + +coxph_a_b_transition() + + + + + + float8 [] coxph_a_b_transition + ( + float8 + [], + + + + + integer + , + + + + + boolean + , + + + + + float8 + [], + +
[31/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/group__grp__bayes.html -- diff --git a/docs/v1.14/group__grp__bayes.html b/docs/v1.14/group__grp__bayes.html new file mode 100644 index 000..5cfa013 --- /dev/null +++ b/docs/v1.14/group__grp__bayes.html @@ -0,0 +1,488 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: Naive Bayes Classification + + + + + + + + + $(document).ready(initResizable); + + + + + + $(document).ready(function() { init_search(); }); + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +http://cdn.mathjax.org/mathjax/latest/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.14 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +var searchBox = new SearchBox("searchBox", "search",false,'Search'); + + + + + + + + + + + + +$(document).ready(function(){initNavTree('group__grp__bayes.html','');}); + + + + + + + + + + + + + + +Naive Bayes ClassificationEarly Stage Development + + +Contents + +Training Function(s) + +Classify Function(s) + +Probabilities Function(s) + +Ad Hoc Computation + +Implementation Notes + +Examples + +Technical Background + +Related Topics + +Warning This MADlib method is still in early stage development. There may be some issues that will be addressed in a future version. Interface and implementation is subject to change. +Naive Bayes refers to a stochastic model where all independent variables \( a_1, \dots, a_n \) (often referred to as attributes in this context) independently contribute to the probability that a data point belongs to a certain class \( c \). +Naives Bayes classification estimates feature probabilities and class priors using maximum likelihood or Laplacian smoothing. For numeric attributes, Gaussian smoothing can be used to estimate the feature probabilities.These parameters are then used to classify new data. +Training Function(s) +For data with only categorical attributes, precompute feature probabilities and class priors using the following function: + +create_nb_prepared_data_tables ( trainingSource, + trainingClassColumn, + trainingAttrColumn, + numAttrs, + featureProbsName, + classPriorsName + ) +For data containing both categorical and numeric attributes, use the following form to precompute the Gaussian parameters (mean and variance) for numeric attributes alongside the feature probabilities for categorical attributes and class priors. + +create_nb_prepared_data_tables ( trainingSource, + trainingClassColumn, + trainingAttrColumn, + numericAttrsColumnIndices, + numAttrs, + featureProbsName, + numericAttrParamsName, + classPriorsName + ) +The trainingSource is expected to be of the following form: {TABLE|VIEW} trainingSource ( +... +trainingClassColumn INTEGER, +trainingAttrColumn INTEGER[] OR NUMERIC[] OR FLOAT8[], +... +)numericAttrsColumnIndices should be of type TEXT, specified as an array of indices (starting from 1) in the trainingAttrColumn attributes-array that correspond to numeric attributes. +The two output tables are: +featureProbsName stores feature probabilities +classPriorsName stores the class priors + +In addition to the above, if the function specifying numeric attributes is used, an additional table numericAttrParamsName is created which stores the Gaussian parameters for the numeric attributes. +Classify Function(s) +Perform Naive Bayes classification: +create_nb_classify_view ( featureProbsName, + classPriorsName, + classifySource, + classifyKeyColumn, + classifyAttrColumn, + numAttrs, + destName +) +For data with numeric
[44/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/cross__validation_8sql__in.html -- diff --git a/docs/v1.14/cross__validation_8sql__in.html b/docs/v1.14/cross__validation_8sql__in.html new file mode 100644 index 000..5d0c8b1 --- /dev/null +++ b/docs/v1.14/cross__validation_8sql__in.html @@ -0,0 +1,710 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: cross_validation.sql_in File Reference + + + + + + + + + $(document).ready(initResizable); + + + + + + $(document).ready(function() { init_search(); }); + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +http://cdn.mathjax.org/mathjax/latest/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.14 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +var searchBox = new SearchBox("searchBox", "search",false,'Search'); + + + + + + + + + + + + +$(document).ready(function(){initNavTree('cross__validation_8sql__in.html','');}); + + + + + + + + + + + + + + +Functions + +cross_validation.sql_in File Reference + + + +SQL functions for cross validation. +More... + + +Functions +voidcross_validation_general (varchar modelling_func, varchar[] modelling_params, varchar[] modelling_params_type, varchar param_explored, varchar[] explore_values, varchar predict_func, varchar[] predict_params, varchar[] predict_params_type, varchar metric_func, varchar[] metric_params, varchar[] metric_params_type, varchar data_tbl, varchar data_id, boolean id_is_random, varchar validation_result, varchar[] data_cols, integer n_folds) + +voidcross_validation_general (varchar modelling_func, varchar[] modelling_params, varchar[] modelling_params_type, varchar param_explored, varchar[] explore_values, varchar predict_func, varchar[] predict_params, varchar[] predict_params_type, varchar metric_func, varchar[] metric_params, varchar[] metric_params_type, varchar data_tbl, varchar data_id, boolean id_is_random, varchar validation_result, varchar[] data_cols) + +voidcv_linregr_train (varchar tbl_source, varchar col_ind_var, varchar col_dep_var, varchar tbl_result) +A wrapper for linear regression. More... + +voidcv_linregr_predict (varchar tbl_model, varchar tbl_newdata, varchar col_ind_var, varchar col_id, varchar tbl_predict) +A wrapper for linear regression prediction. More... + +voidmse_error (varchar tbl_prediction, varchar tbl_actual, varchar id_actual, varchar values_actual, varchar tbl_error) + +voidmisclassification_avg (varchar tbl_prediction, varchar tbl_actual, varchar id_actual, varchar values_actual, varchar tbl_error) + +voidcv_logregr_predict (varchar tbl_model, varchar tbl_newdata, varchar col_ind_var, varchar col_id, varchar tbl_predict) +A prediction function for logistic regression The result is stored in the table of tbl_predict. More... + +integerlogregr_accuracy (float8[] coef, float8[] col_ind, boolean col_dep) +Metric function for logistic regression. More... + +voidcv_logregr_accuracy (varchar tbl_predict, varchar tbl_source, varchar col_id, varchar col_dep_var, varchar tbl_accuracy) +Metric function for logistic regression. More... + + +Detailed Description +DateJanuary 2011 +See alsoFor a brief introduction to the usage of cross validation, see the module description Cross Validation. +Function Documentation + +cross_validation_general() [1/2] + + + + + + void cross_validation_general + ( + varchar + modelling_func, + + + + + varchar [] + modelling_params, + + + + + varchar [] + modelling_params_type, + + + + + varchar + param_explored, + + + + + varchar [] + explore_values, + + + + + varchar + predict_func, + + + + + varchar [] + predict_params, + + + + + varchar [] + predict_params_type, + + + + + varchar +
[40/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/dir_71a41f8b7207fbbc465a4e4d95589314.html -- diff --git a/docs/v1.14/dir_71a41f8b7207fbbc465a4e4d95589314.html b/docs/v1.14/dir_71a41f8b7207fbbc465a4e4d95589314.html new file mode 100644 index 000..22fd7cd --- /dev/null +++ b/docs/v1.14/dir_71a41f8b7207fbbc465a4e4d95589314.html @@ -0,0 +1,135 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: ports Directory Reference + + + + + + + + + $(document).ready(initResizable); + + + + + + $(document).ready(function() { init_search(); }); + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +http://cdn.mathjax.org/mathjax/latest/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.14 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +var searchBox = new SearchBox("searchBox", "search",false,'Search'); + + + + + + + + + + + + +$(document).ready(function(){initNavTree('dir_71a41f8b7207fbbc465a4e4d95589314.html','');}); + + + + + + + + + + + + + + +ports Directory Reference + + + + +Directories +directory postgres + + + + + + + +incubator-madlibsrcports +Generated on Wed May 2 2018 13:00:12 for MADlib by +http://www.doxygen.org/index.html;> + 1.8.13 + + + + http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/dir_745a5b6eaaef3a7f811e3c789eb52f97.html -- diff --git a/docs/v1.14/dir_745a5b6eaaef3a7f811e3c789eb52f97.html b/docs/v1.14/dir_745a5b6eaaef3a7f811e3c789eb52f97.html new file mode 100644 index 000..51697ac --- /dev/null +++ b/docs/v1.14/dir_745a5b6eaaef3a7f811e3c789eb52f97.html @@ -0,0 +1,135 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: src Directory Reference + + + + + + + + + $(document).ready(initResizable); + + + + + + $(document).ready(function() { init_search(); }); + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +http://cdn.mathjax.org/mathjax/latest/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.14 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +var searchBox = new SearchBox("searchBox", "search",false,'Search'); + + + + + + + + + + + + +$(document).ready(function(){initNavTree('dir_745a5b6eaaef3a7f811e3c789eb52f97.html','');}); + + + + + + + + + + + + + + +src Directory Reference + + + + +Directories +directory pg_gp + + + + + + + +incubator-madlibmethodssvecsrc +Generated on Wed May 2 2018 13:00:12 for MADlib by +http://www.doxygen.org/index.html;> + 1.8.13 + + + + http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/dir_7826d1d18040ad5cc29c8c0a0584577d.html -- diff --git a/docs/v1.14/dir_7826d1d18040ad5cc29c8c0a0584577d.html b/docs/v1.14/dir_7826d1d18040ad5cc29c8c0a0584577d.html new file mode 100644 index 000..01ecb91 --- /dev/null +++ b/docs/v1.14/dir_7826d1d18040ad5cc29c8c0a0584577d.html @@ -0,0 +1,135 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: src Directory Reference + + + + + + + + + $(document).ready(initResizable); + + + + + + $(document).ready(function() { init_search(); }); + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +http://cdn.mathjax.org/mathjax/latest/MathJax.js"> + + + + + + +
[37/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/eigen_navtree_hacks.js -- diff --git a/docs/v1.14/eigen_navtree_hacks.js b/docs/v1.14/eigen_navtree_hacks.js new file mode 100644 index 000..ee72246 --- /dev/null +++ b/docs/v1.14/eigen_navtree_hacks.js @@ -0,0 +1,236 @@ +var arrowRight = ''; + +// generate a table of contents in the side-nav based on the h1/h2 tags of the current page. +function generate_autotoc() { + var headers = $("h1, h2"); + if(headers.length > 1) { +var toc = $("#side-nav").append('Table of contents'); +toc = $("#nav-toc"); +var footerHeight = footer.height(); +toc = toc.append(''); +toc = toc.find('ul'); +var indices = new Array(); +indices[0] = 0; +indices[1] = 0; + +var h1counts = $("h1").length; +headers.each(function(i) { + var current = $(this); + var levelTag = current[0].tagName.charAt(1); + if(h1counts==0) +levelTag--; + var cur_id = current.attr("id"); + + indices[levelTag-1]+=1; + var prefix = indices[0]; + if (levelTag >1) { +prefix+="."+indices[1]; + } + + // Uncomment to add number prefixes + // current.html(prefix + " " + current.html()); + for(var l = levelTag; l < 2; ++l){ + indices[l] = 0; + } + + if(cur_id == undefined) { +current.attr('id', 'title' + i); +current.addClass('anchor'); +toc.append("" + current.text() + ""); + } else { +toc.append("" + current.text() + ""); + } +}); +resizeHeight(); + } +} + + +var global_navtree_object; + +// Overloaded to remove links to sections/subsections +function getNode(o, po) +{ + po.childrenVisited = true; + var l = po.childrenData.length-1; + for (var i in po.childrenData) { +var nodeData = po.childrenData[i]; +if((!nodeData[1]) || (nodeData[1].indexOf('#')==-1)) // <- we added this line + po.children[i] = newNode(o, po, nodeData[0], nodeData[1], nodeData[2], i==l); + } +} + +// Overloaded to adjust the size of the navtree wrt the toc +function resizeHeight() +{ + var toc = $("#nav-toc"); + var tocHeight = toc.height(); // <- we added this line + var headerHeight = header.height(); + var footerHeight = footer.height(); + var windowHeight = $(window).height() - headerHeight - footerHeight; + content.css({height:windowHeight + "px"}); + navtree.css({height:(windowHeight-tocHeight) + "px"}); // <- we modified this line + sidenav.css({height:(windowHeight) + "px",top: headerHeight+"px"}); +} + +// Overloaded to save the root node into global_navtree_object +function initNavTree(toroot,relpath) +{ + var o = new Object(); + global_navtree_object = o; // <- we added this line + o.toroot = toroot; + o.node = new Object(); + o.node.li = document.getElementById("nav-tree-contents"); + o.node.childrenData = NAVTREE; + o.node.children = new Array(); + o.node.childrenUL = document.createElement("ul"); + o.node.getChildrenUL = function() { return o.node.childrenUL; }; + o.node.li.appendChild(o.node.childrenUL); + o.node.depth = 0; + o.node.relpath = relpath; + o.node.expanded = false; + o.node.isLast = true; + o.node.plus_img = document.createElement("span"); + o.node.plus_img.className = 'arrow'; + o.node.plus_img.innerHTML = arrowRight; + + if (localStorageSupported()) { +var navSync = $('#nav-sync'); +if (cachedLink()) { + showSyncOff(navSync,relpath); + navSync.removeClass('sync'); +} else { + showSyncOn(navSync,relpath); +} +navSync.click(function(){ toggleSyncButton(relpath); }); + } + + navTo(o,toroot,window.location.hash,relpath); + + $(window).bind('hashchange', function(){ + if (window.location.hash && window.location.hash.length>1){ + var a; + if ($(location).attr('hash')){ + var clslink=stripPath($(location).attr('pathname'))+':'+ + $(location).attr('hash').substring(1); + a=$('.item a[class$="'+clslink+'"]'); + } + if (a==null || !$(a).parent().parent().hasClass('selected')){ + $('.item').removeClass('selected'); + $('.item').removeAttr('id'); + } + var link=stripPath2($(location).attr('pathname')); + navTo(o,link,$(location).attr('hash'),relpath); + } else if (!animationInProgress) { + $('#doc-content').scrollTop(0); + $('.item').removeClass('selected'); + $('.item').removeAttr('id'); + navTo(o,toroot,window.location.hash,relpath); + } + }) + + $(window).load(showRoot); +} + +// return false if the the node has no children at all, or has only section/subsection children +function checkChildrenData(node) { + if (!(typeof(node.childrenData)==='string')) { +for (var i in node.childrenData) { + var url = node.childrenData[i][1]; + if(url.indexOf("#")==-1) +return true; +} +return false; +
[34/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/glm_8sql__in.html -- diff --git a/docs/v1.14/glm_8sql__in.html b/docs/v1.14/glm_8sql__in.html new file mode 100644 index 000..c9194c6 --- /dev/null +++ b/docs/v1.14/glm_8sql__in.html @@ -0,0 +1,1919 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: glm.sql_in File Reference + + + + + + + + + $(document).ready(initResizable); + + + + + + $(document).ready(function() { init_search(); }); + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +http://cdn.mathjax.org/mathjax/latest/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.14 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +var searchBox = new SearchBox("searchBox", "search",false,'Search'); + + + + + + + + + + + + +$(document).ready(function(){initNavTree('glm_8sql__in.html','');}); + + + + + + + + + + + + + + +Functions + +glm.sql_in File Reference + + + +SQL functions for GLM (Poisson) +More... + + +Functions +bytea8__glm_merge_states (bytea8 state1, bytea8 state2) + +bytea8__glm_final (bytea8 state) + +bytea8__glm_poisson_log_transition (bytea8, float8, float8[], bytea8) + +aggregate bytea8__glm_poisson_log_agg (float8 y, float8[] x, bytea8 previous_state) + +bytea8__glm_poisson_identity_transition (bytea8, float8, float8[], bytea8) + +aggregate bytea8__glm_poisson_identity_agg (float8 y, float8[] x, bytea8 previous_state) + +bytea8__glm_poisson_sqrt_transition (bytea8, float8, float8[], bytea8) + +aggregate bytea8__glm_poisson_sqrt_agg (float8 y, float8[] x, bytea8 previous_state) + +bytea8__glm_gaussian_identity_transition (bytea8, float8, float8[], bytea8) + +aggregate bytea8__glm_gaussian_identity_agg (float8 y, float8[] x, bytea8 previous_state) + +bytea8__glm_gaussian_log_transition (bytea8, float8, float8[], bytea8) + +aggregate bytea8__glm_gaussian_log_agg (float8 y, float8[] x, bytea8 previous_state) + +bytea8__glm_gaussian_inverse_transition (bytea8, float8, float8[], bytea8) + +aggregate bytea8__glm_gaussian_inverse_agg (float8 y, float8[] x, bytea8 previous_state) + +bytea8__glm_gamma_log_transition (bytea8, float8, float8[], bytea8) + +aggregate bytea8__glm_gamma_log_agg (float8 y, float8[] x, bytea8 previous_state) + +bytea8__glm_gamma_inverse_transition (bytea8, float8, float8[], bytea8) + +aggregate bytea8__glm_gamma_inverse_agg (float8 y, float8[] x, bytea8 previous_state) + +bytea8__glm_gamma_identity_transition (bytea8, float8, float8[], bytea8) + +aggregate bytea8__glm_gamma_identity_agg (float8 y, float8[] x, bytea8 previous_state) + +bytea8__glm_binomial_probit_transition (bytea8, float8, float8[], bytea8) + +aggregate bytea8__glm_binomial_probit_agg (float8 y, float8[] x, bytea8 previous_state) + +bytea8__glm_inverse_gaussian_identity_transition (bytea8, float8, float8[], bytea8) + +bytea8__glm_binomial_logit_transition (bytea8, float8, float8[], bytea8) + +aggregate bytea8__glm_binomial_logit_agg (float8 y, float8[] x, bytea8 previous_state) + +__glm_result_type__glm_result_z_stats (bytea8 state) + +aggregate bytea8__glm_inverse_gaussian_identity_agg (float8 y, float8[] x, bytea8 previous_state) + +bytea8__glm_inverse_gaussian_log_transition (bytea8, float8, float8[], bytea8) + +aggregate bytea8__glm_inverse_gaussian_log_agg (float8 y, float8[] x, bytea8 previous_state) + +bytea8__glm_inverse_gaussian_inverse_transition (bytea8, float8, float8[], bytea8) + +aggregate bytea8__glm_inverse_gaussian_inverse_agg (float8 y, float8[] x, bytea8 previous_state) + +bytea8__glm_inverse_gaussian_sqr_inverse_transition (bytea8, float8, float8[], bytea8) + +aggregate bytea8__glm_inverse_gaussian_sqr_inverse_agg (float8 y, float8[] x, bytea8 previous_state) + +__glm_result_type__glm_result_t_stats (bytea8 state) + +float8__glm_loglik_diff (bytea8 state1, bytea8 state2) + +voidglm (varchar source_table, varchar model_table, varchar dependent_varname, varchar independent_varname, varchar family_params, varchar grouping_col, varchar optim_params, boolean verbose) + +voidglm (varchar source_table, varchar model_table, varchar dependent_varname, varchar independent_varname, varchar family_params, varchar
[33/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/graph_legend.html -- diff --git a/docs/v1.14/graph_legend.html b/docs/v1.14/graph_legend.html new file mode 100644 index 000..65dbfe7 --- /dev/null +++ b/docs/v1.14/graph_legend.html @@ -0,0 +1,154 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: Graph Legend + + + + + + + + + $(document).ready(initResizable); + + + + + + $(document).ready(function() { init_search(); }); + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +http://cdn.mathjax.org/mathjax/latest/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.14 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +var searchBox = new SearchBox("searchBox", "search",false,'Search'); + + + + + + + + + + + + +$(document).ready(function(){initNavTree('graph_legend.html','');}); + + + + + + + + + + + + + + +Graph Legend + + +This page explains how to interpret the graphs that are generated by doxygen. +Consider the following example: /*! Invisible class because of truncation */class Invisible { };/*! Truncated class, inheritance relation is hidden */class Truncated : public Invisible { };/* Class not documented with doxygen comments */class Undocumented { };/*! Class that is inherited using public inheritance */class PublicBase : public Truncated { };/*! A template class */templateclass T class Templ { };/*! Class that is inherited using protected inheritance */class ProtectedBase { };/*! Class that is inherited using private inheritance */class PrivateBase { };/*! Class that is used by the Inherited class */class Used { };/*! Super class that inherits a number of other classes */class Inherited : public PublicBase, protected ProtectedBase, private PrivateBase, public Undocumented, public Templint{ private:Used *m_usedClass;}; This will result in the following graph: +This browser is not able to show SVG: try Firefox, Chrome, Safari, or Opera instead. The boxes in the above graph have the following meaning: + + +A filled gray box represents the struct or class for which the graph is generated. + +A box with a black border denotes a documented struct or class. + +A box with a gray border denotes an undocumented struct or class. + +A box with a red border denotes a documented struct or class forwhich not all inheritance/containment relations are shown. A graph is truncated if it does not fit within the specified boundaries. + +The arrows have the following meaning: + + +A dark blue arrow is used to visualize a public inheritance relation between two classes. + +A dark green arrow is used for protected inheritance. + +A dark red arrow is used for private inheritance. + +A purple dashed arrow is used if a class is contained or used by another class. The arrow is labelled with the variable(s) through which the pointed class or struct is accessible. + +A yellow dashed arrow denotes a relation between a template instance and the template class it was instantiated from. The arrow is labelled with the template parameters of the instance. + + + + + + +Generated on Wed May 2 2018 13:00:12 for MADlib by +http://www.doxygen.org/index.html;> + 1.8.13 + + + + http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/graph_legend.md5 -- diff --git a/docs/v1.14/graph_legend.md5 b/docs/v1.14/graph_legend.md5 new file mode 100644 index 000..a06ed05 --- /dev/null +++ b/docs/v1.14/graph_legend.md5 @@ -0,0 +1 @@ +387ff8eb65306fa251338d3c9bd7bfff \ No newline at end of file http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/graph_legend.svg -- diff --git a/docs/v1.14/graph_legend.svg b/docs/v1.14/graph_legend.svg new file mode 100644 index 000..273f5fd --- /dev/null +++ b/docs/v1.14/graph_legend.svg @@ -0,0 +1,138 @@ + +http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd;> + + +http://www.w3.org/2000/svg; xmlns:xlink="http://www.w3.org/1999/xlink;> + +Graph Legend + + + +Node9 +
[36/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/elastic__net_8sql__in.html -- diff --git a/docs/v1.14/elastic__net_8sql__in.html b/docs/v1.14/elastic__net_8sql__in.html new file mode 100644 index 000..168da54 --- /dev/null +++ b/docs/v1.14/elastic__net_8sql__in.html @@ -0,0 +1,2476 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: elastic_net.sql_in File Reference + + + + + + + + + $(document).ready(initResizable); + + + + + + $(document).ready(function() { init_search(); }); + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +http://cdn.mathjax.org/mathjax/latest/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.14 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +var searchBox = new SearchBox("searchBox", "search",false,'Search'); + + + + + + + + + + + + +$(document).ready(function(){initNavTree('elastic__net_8sql__in.html','');}); + + + + + + + + + + + + + + +Functions + +elastic_net.sql_in File Reference + + + +SQL functions for elastic net regularization. +More... + + +Functions +voidelastic_net_train (text tbl_source, text tbl_result, text col_dep_var, text col_ind_var, text regress_family, float8 alpha, float8 lambda_value, boolean standardize, text grouping_col, text optimizer, text optimizer_params, text excluded, integer max_iter, float8 tolerance) +Interface for elastic net. More... + +voidelastic_net_train (text tbl_source, text tbl_result, text col_ind_var, text col_dep_var, text regress_family, float8 alpha, float8 lambda_value, boolean standardization, text grouping_columns, text optimizer, text optimizer_params, text excluded, integer max_iter) + +voidelastic_net_train (text tbl_source, text tbl_result, text col_ind_var, text col_dep_var, text regress_family, float8 alpha, float8 lambda_value, boolean standardization, text grouping_columns, text optimizer, text optimizer_params, text excluded) + +voidelastic_net_train (text tbl_source, text tbl_result, text col_ind_var, text col_dep_var, text regress_family, float8 alpha, float8 lambda_value, boolean standardization, text grouping_columns, text optimizer, text optimizer_params) + +voidelastic_net_train (text tbl_source, text tbl_result, text col_ind_var, text col_dep_var, text regress_family, float8 alpha, float8 lambda_value, boolean standardization, text grouping_columns, text optimizer) + +voidelastic_net_train (text tbl_source, text tbl_result, text col_ind_var, text col_dep_var, text regress_family, float8 alpha, float8 lambda_value, boolean standardization, text grouping_columns) + +voidelastic_net_train (text tbl_source, text tbl_result, text col_ind_var, text col_dep_var, text regress_family, float8 alpha, float8 lambda_value, boolean standardization) + +voidelastic_net_train (text tbl_source, text tbl_result, text col_ind_var, text col_dep_var, text regress_family, float8 alpha, float8 lambda_value) + +textelastic_net_train () +Help function, to print out the supported families. More... + +textelastic_net_train (text family_or_optimizer) +Help function, to print out the supported optimizer for a family or print out the parameter list for an optimizer. More... + +voidelastic_net_predict (text tbl_model, text tbl_new_source, text col_id, text tbl_predict) +Prediction and put the result in a table can be used together with General-CV. More... + +float8elastic_net_predict (text regress_family, float8[] coefficients, float8 intercept, float8[] ind_var) +Prediction use learned coefficients for a given example. More... + +float8elastic_net_gaussian_predict (float8[] coefficients, float8 intercept, float8[] ind_var) +Prediction for linear models use learned coefficients for a given example. More... + +booleanelastic_net_binomial_predict (float8[] coefficients, float8 intercept, float8[] ind_var) +Prediction for logistic models use learned coefficients for a given example. More... + +float8elastic_net_binomial_prob (float8[] coefficients, float8 intercept, float8[] ind_var) +Compute the probability of belonging to the True class for a given observation. More... + +float8__elastic_net_binomial_loglikelihood (float8[] coefficients, float8
[38/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/dir_d0ff1bc8be395d65672549993d82a3c0.html -- diff --git a/docs/v1.14/dir_d0ff1bc8be395d65672549993d82a3c0.html b/docs/v1.14/dir_d0ff1bc8be395d65672549993d82a3c0.html new file mode 100644 index 000..161bc3c --- /dev/null +++ b/docs/v1.14/dir_d0ff1bc8be395d65672549993d82a3c0.html @@ -0,0 +1,136 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: pg_gp Directory Reference + + + + + + + + + $(document).ready(initResizable); + + + + + + $(document).ready(function() { init_search(); }); + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +http://cdn.mathjax.org/mathjax/latest/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.14 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +var searchBox = new SearchBox("searchBox", "search",false,'Search'); + + + + + + + + + + + + +$(document).ready(function(){initNavTree('dir_d0ff1bc8be395d65672549993d82a3c0.html','');}); + + + + + + + + + + + + + + +pg_gp Directory Reference + + + + +Files +file porter_stemmer.sql_in +implementation of porter stemmer operations in SQL + + + + + + + +incubator-madlibmethodsstemmersrcpg_gp +Generated on Wed May 2 2018 13:00:12 for MADlib by +http://www.doxygen.org/index.html;> + 1.8.13 + + + + http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/dir_d79f036e19ca50f1361675a4687317bc.html -- diff --git a/docs/v1.14/dir_d79f036e19ca50f1361675a4687317bc.html b/docs/v1.14/dir_d79f036e19ca50f1361675a4687317bc.html new file mode 100644 index 000..dce28a4 --- /dev/null +++ b/docs/v1.14/dir_d79f036e19ca50f1361675a4687317bc.html @@ -0,0 +1,139 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: linear_systems Directory Reference + + + + + + + + + $(document).ready(initResizable); + + + + + + $(document).ready(function() { init_search(); }); + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +http://cdn.mathjax.org/mathjax/latest/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.14 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +var searchBox = new SearchBox("searchBox", "search",false,'Search'); + + + + + + + + + + + + +$(document).ready(function(){initNavTree('dir_d79f036e19ca50f1361675a4687317bc.html','');}); + + + + + + + + + + + + + + +linear_systems Directory Reference + + + + +Files +file dense_linear_systems.sql_in +SQL functions for linear systems. + +file sparse_linear_systems.sql_in +SQL functions for linear systems. + + + + + + + +incubator-madlibsrcportspostgresmoduleslinear_systems +Generated on Wed May 2 2018 13:00:12 for MADlib by +http://www.doxygen.org/index.html;> + 1.8.13 + + + + http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/dir_e2aaed6e1ab0079c9b997d45b783e833.html -- diff --git a/docs/v1.14/dir_e2aaed6e1ab0079c9b997d45b783e833.html b/docs/v1.14/dir_e2aaed6e1ab0079c9b997d45b783e833.html new file mode 100644 index 000..bcdfcf3 --- /dev/null +++ b/docs/v1.14/dir_e2aaed6e1ab0079c9b997d45b783e833.html @@ -0,0 +1,136 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: pg_gp Directory Reference + + + + + + + + + $(document).ready(initResizable); + + + + + + $(document).ready(function() {
[25/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/group__grp__graph__vertex__degrees.html -- diff --git a/docs/v1.14/group__grp__graph__vertex__degrees.html b/docs/v1.14/group__grp__graph__vertex__degrees.html new file mode 100644 index 000..4f4a14a --- /dev/null +++ b/docs/v1.14/group__grp__graph__vertex__degrees.html @@ -0,0 +1,266 @@ + +http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;> +http://www.w3.org/1999/xhtml;> + + + + + +MADlib: In-Out Degree + + + + + + + + + $(document).ready(initResizable); + + + + + + $(document).ready(function() { init_search(); }); + + + MathJax.Hub.Config({ +extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"], +jax: ["input/TeX","output/HTML-CSS"], +}); +http://cdn.mathjax.org/mathjax/latest/MathJax.js"> + + + + + + + (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){ + (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o), + m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m) + })(window,document,'script','//www.google-analytics.com/analytics.js','ga'); + ga('create', 'UA-45382226-1', 'madlib.apache.org'); + ga('send', 'pageview'); + + + + + + + + + http://madlib.apache.org;> + + + 1.14 + + User Documentation for Apache MADlib + + + + + + + + + + + + + + + + + +var searchBox = new SearchBox("searchBox", "search",false,'Search'); + + + + + + + + + + + + +$(document).ready(function(){initNavTree('group__grp__graph__vertex__degrees.html','');}); + + + + + + + + + + + + + + +In-Out DegreeGraph Measures + + +Contents + +In-out degrees + +Examples + +This function computes the degree of each node. The node degree is the number of edges adjacent to that node. The node in-degree is the number of edges pointing in to the node and node out-degree is the number of edges pointing out of the node. +In-out degrees +graph_vertex_degrees( +vertex_table, +vertex_id, +edge_table, +edge_args, +out_table, +grouping_cols +) + +Arguments +vertex_table +TEXT. Name of the table containing the vertex data for the graph. Must contain the column specified in the 'vertex_id' parameter below. + + +vertex_id +TEXT, default = 'id'. Name of the column in 'vertex_table' containing vertex ids. The vertex ids are of type INTEGER with no duplicates. They do not need to be contiguous. + + +edge_table +TEXT. Name of the table containing the edge data. The edge table must contain columns for source vertex, destination vertex and edge weight. Column naming convention is described below in the 'edge_args' parameter. + + +edge_args +TEXT. A comma-delimited string containing multiple named arguments of the form "name=value". The following parameters are supported for this string argument: +src (INTEGER): Name of the column containing the source vertex ids in the edge table. Default column name is 'src'. +dest (INTEGER): Name of the column containing the destination vertex ids in the edge table. Default column name is 'dest'. +weight (FLOAT8): Name of the column containing the edge weights in the edge table. Default column name is 'weight'. + + + +out_table +TEXT. Name of the table to store the result. It contains a row for every vertex of every group and has the following columns (in addition to the grouping columns): +vertex: The id for the source vertex. Will use the input vertex column 'id' for column naming. +indegree: Number of incoming edges to the vertex. +outdegree: Number of outgoing edges from the vertex. + + + +grouping_cols +TEXT, default = NULL. List of columns used to group the input into discrete subgraphs. These columns must exist in the edge table. When this value is null, no grouping is used and a single result is generated. + +Examples + +Create vertex and edge tables to represent the graph: +DROP TABLE IF EXISTS vertex, edge; +CREATE TABLE vertex( +id INTEGER, +name TEXT +); +CREATE TABLE edge( +src_id INTEGER, +dest_id INTEGER, +edge_weight FLOAT8 +); +INSERT INTO vertex VALUES +(0, 'A'), +(1, 'B'), +(2, 'C'), +(3, 'D'), +(4, 'E'), +(5, 'F'), +(6, 'G'), +(7, 'H'); +INSERT INTO edge VALUES +(0, 1, 1.0), +(0, 2, 1.0), +(0, 4, 10.0), +(1, 2, 2.0), +(1, 3, 10.0), +(2, 3, 1.0), +(2, 5, 1.0), +(2, 6, 3.0), +(3, 0, 1.0), +(4, 0, -2.0), +(5, 6, 1.0), +(6, 7, 1.0); + +Calculate the in-out degrees for each node: +DROP TABLE IF EXISTS degrees; +SELECT madlib.graph_vertex_degrees( +'vertex', -- Vertex table +'id', -- Vertix id column (NULL means use default naming) +'edge',-- Edge table +'src=src_id, dest=dest_id, weight=edge_weight', +'degrees');-- Output table of shortest paths +SELECT * FROM