[madlib] branch master updated: K-NN: Add kd-tree method for approximate knn

2019-02-20 Thread riyer
This is an automated email from the ASF dual-hosted git repository.

riyer pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git


The following commit(s) were added to refs/heads/master by this push:
 new 5e601fb  K-NN: Add kd-tree method for approximate knn
5e601fb is described below

commit 5e601fbdb4c6423c148f8bdfead0a9988f31800d
Author: Orhan Kislal 
AuthorDate: Wed Feb 20 16:33:46 2019 -0800

K-NN: Add kd-tree method for approximate knn

JIRA: MADLIB-1061

This commit adds a kd-tree option to the 'knn' function. A kd-tree is
used to reduce the search space to find nearest neighbors. The method
implemented here does not produce the complete kd-tree, instead it
allows the user to specify a maximum depth for the binary tree.

Additional changes:
- Add function to clean madlib views
- Move k-nn out of 'Early Stage Development'

Closes #352

Co-authored-by: Rahul Iyer 
Co-authored-by: Frank McQuillan 
---
 doc/design/design.tex  |   1 +
 doc/design/figures/2d_kdtree.pdf   | Bin 0 -> 10652 bytes
 doc/design/modules/knn.tex | 146 +++
 doc/literature.bib |  11 +
 doc/mainpage.dox.in|   2 +-
 src/ports/postgres/modules/knn/knn.py_in   | 480 +
 src/ports/postgres/modules/knn/knn.sql_in  | 249 +--
 src/ports/postgres/modules/knn/test/knn.sql_in | 287 +---
 src/ports/postgres/modules/utilities/admin.py_in   |  22 +
 .../postgres/modules/utilities/utilities.py_in |   1 -
 .../postgres/modules/utilities/utilities.sql_in|   8 +
 11 files changed, 1033 insertions(+), 174 deletions(-)

diff --git a/doc/design/design.tex b/doc/design/design.tex
index e9ed7b8..6772f89 100644
--- a/doc/design/design.tex
+++ b/doc/design/design.tex
@@ -231,6 +231,7 @@
 \input{modules/SVM}
 \input{modules/graph}
 \input{modules/neural-network}
+\input{modules/knn}
 \printbibliography
 
 \end{document}
diff --git a/doc/design/figures/2d_kdtree.pdf b/doc/design/figures/2d_kdtree.pdf
new file mode 100644
index 000..062ae23
Binary files /dev/null and b/doc/design/figures/2d_kdtree.pdf differ
diff --git a/doc/design/modules/knn.tex b/doc/design/modules/knn.tex
new file mode 100644
index 000..71af411
--- /dev/null
+++ b/doc/design/modules/knn.tex
@@ -0,0 +1,146 @@
+% Licensed to the Apache Software Foundation (ASF) under one
+% or more contributor license agreements.  See the NOTICE file
+% distributed with this work for additional information
+% regarding copyright ownership.  The ASF licenses this file
+% to you under the Apache License, Version 2.0 (the
+% "License"); you may not use this file except in compliance
+% with the License.  You may obtain a copy of the License at
+
+%   http://www.apache.org/licenses/LICENSE-2.0
+
+% Unless required by applicable law or agreed to in writing,
+% software distributed under the License is distributed on an
+% "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+% KIND, either express or implied.  See the License for the
+% specific language governing permissions and limitations
+% under the License.
+
+!TEX root = ../design.tex
+
+
+\chapter[k Nearest Neighbors]{k Nearest Neighbors}
+
+\begin{moduleinfo}
+\item[Authors] \href{mailto:okis...@pivotal.io}{Orhan Kislal}
+
+\item[History]
+   \begin{modulehistory}
+   \item[v0.1] Initial version: knn and kd-tree.
+   \end{modulehistory}
+\end{moduleinfo}
+
+
+% Abstract. What is the problem we want to solve?
+\section{Introduction} % (fold)
+\label{sec:knn_introduction}
+
+\emph{Some notes and figures in this section are borrowed from 
\cite{medium_knn} and \cite{point_knn}}.
+
+K-nearest neighbors (KNN) is one of the most commonly used learning
+algorithms. The goal of knn is to find a number (k) of training data points
+closest to the test point. These neighbors can be used to predict labels via
+classification or regression.
+
+KNN does not have a training phase like the most of learning techniques. It
+does not create a model to generalize the data, instead the algorithm uses the
+whole training dataset (or a specific subset of it).
+
+KNN can be used for classification, the output is a class membership (a
+discrete value). An object is classified by a majority vote of its neighbors,
+with the object being assigned to the class most common among its k nearest
+neighbors. It can also be used for regression, output is the value for the
+object (predicts continuous values). This value is the average (or median) of
+the values of its k nearest neighbors.
+
+\section{Implementation Details}
+
+The basic KNN implementation depends on the table join between the training 
dataset and the test dataset.
+
+\begin{sql}
+   (SELECT test_id,
+train_id,
+fn_dist(train_

[madlib] branch mini-batch-dl-v1 deleted (was 4671a0a)

2019-02-13 Thread riyer
This is an automated email from the ASF dual-hosted git repository.

riyer pushed a change to branch mini-batch-dl-v1
in repository https://gitbox.apache.org/repos/asf/madlib.git.


 was 4671a0a  mini-batch preprocessor for image user doc improvements

This change permanently discards the following revisions:

 discard 4671a0a  mini-batch preprocessor for image user doc improvements



[madlib] branch master updated: Encode categorical: Add BIGINT as valid categorical type

2019-01-29 Thread riyer
This is an automated email from the ASF dual-hosted git repository.

riyer pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/madlib.git


The following commit(s) were added to refs/heads/master by this push:
 new 7c3c1a3  Encode categorical: Add BIGINT as valid categorical type
7c3c1a3 is described below

commit 7c3c1a35ab921f2401df4684ab6d48a14fa51b2d
Author: Rahul Iyer 
AuthorDate: Fri Jan 18 14:52:28 2019 -0800

Encode categorical: Add BIGINT as valid categorical type

JIRA: MADLIB-1295
---
 src/ports/postgres/modules/utilities/encode_categorical.py_in   | 2 +-
 src/ports/postgres/modules/utilities/test/encode_categorical.sql_in | 5 -
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/src/ports/postgres/modules/utilities/encode_categorical.py_in 
b/src/ports/postgres/modules/utilities/encode_categorical.py_in
index cd08012..8695a73 100644
--- a/src/ports/postgres/modules/utilities/encode_categorical.py_in
+++ b/src/ports/postgres/modules/utilities/encode_categorical.py_in
@@ -396,7 +396,7 @@ class CategoricalEncoder(object):
 self._all_cols_types = get_cols_and_types(self.source_table)
 
 # any column belonging to the following types are considered 
categorical
-int_types = ['integer', 'smallint']
+int_types = ['integer', 'smallint', 'bigint']
 text_types = ['text', 'varchar', 'character varying', 'char', 
'character']
 boolean_types = ['boolean']
 self._cat_types = set(int_types + text_types + boolean_types)
diff --git 
a/src/ports/postgres/modules/utilities/test/encode_categorical.sql_in 
b/src/ports/postgres/modules/utilities/test/encode_categorical.sql_in
index 7dc6169..f7addc8 100644
--- a/src/ports/postgres/modules/utilities/test/encode_categorical.sql_in
+++ b/src/ports/postgres/modules/utilities/test/encode_categorical.sql_in
@@ -109,7 +109,7 @@ CREATE TABLE abalone_special_char (
 "len$$'%*()gth" double precision,
 diameter double precision,
 height double precision,
-"ClaЖss" integer
+"ClaЖss" bigint
 );
 COPY abalone_special_char ("se$$''x", "len$$'%*()gth", diameter, height, 
"ClaЖss") FROM stdin WITH DELIMITER '|' NULL as '@';
 F"F|0.475|0.37|0.125|2
@@ -121,6 +121,9 @@ M,M|0.47|0.355|0.100|1
 'F'F'|0.55|0.44|0.15|0
 \.
 
+select encode_categorical_variables('abalone_special_char', 
'abalone_special_char_out0', '*');
+select * from abalone_special_char_out0;
+
 select encode_categorical_variables('abalone_special_char', 
'abalone_special_char_out1', '"se$$x", "len$$''%*()gth"');
 select * from abalone_special_char_out1;
 



madlib git commit: Allocator: Remove 16-byte alignment in GPDB 6

2018-09-18 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master 3540a5603 -> d62e5516b


Allocator: Remove 16-byte alignment in GPDB 6

Findings:
1. MADlib performs a 16-byte alignment for pointers returned by palloc.
2. Postgres prepends a small (16 byte usually) header before every
pointer which includes
a. the memory context and
b. the size of the memory allocation.
3. Greenplum 6+ tweaks that scheme a little: instead of the memory context,
the header tracks a "shared header" which points to another struct with
richer information (aside from the memory context).
4. Postgres calls MemoryContextContains both with the final func
for an aggregate and final function for a windowed aggregate.
5. Currently Postgres always concludes that the datum from MADlib is
allocated outside of the context and makes an extra copy. In
Greenplum, MemoryContextContains needs to dereference the shared header.
This is a problem since the pointer has been shifted and the function is
getting a bad header.

In this commit, we disable the pointer alignment for GPDB 6+ to avoid
failure in this check. Further, we also have to disable vectorization in
Eigen since it does not work when pointers are not 16-byte aligned.

Closes #319

Co-authored-by: Jesse Zhang 
Co-authored-by: Nandish Jayaram 


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/d62e5516
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/d62e5516
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/d62e5516

Branch: refs/heads/master
Commit: d62e5516bc6741beee18678da1b9b3e6cc95cdcf
Parents: 3540a56
Author: Rahul Iyer 
Authored: Wed Sep 12 16:59:59 2018 -0700
Committer: Rahul Iyer 
Committed: Tue Sep 18 11:46:45 2018 -0700

--
 src/ports/greenplum/dbconnector/dbconnector.hpp   | 17 +
 src/ports/postgres/dbconnector/Allocator_impl.hpp | 10 +-
 2 files changed, 22 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/d62e5516/src/ports/greenplum/dbconnector/dbconnector.hpp
--
diff --git a/src/ports/greenplum/dbconnector/dbconnector.hpp 
b/src/ports/greenplum/dbconnector/dbconnector.hpp
index 9c38ef6..d06b154 100644
--- a/src/ports/greenplum/dbconnector/dbconnector.hpp
+++ b/src/ports/greenplum/dbconnector/dbconnector.hpp
@@ -32,6 +32,23 @@ extern "C" {
 
 #include "Compatibility.hpp"
 
+#if GP_VERSION_NUM >= 6
+// MADlib aligns the pointers returned by palloc() to 16-byte boundaries
+// (see Allocator_impl.hpp). This is done to allow Eigen vectorization  
(see
+// http://eigen.tuxfamily.org/index.php?title=FAQ#Vectorization for more
+// info).  This vectorization has to be explicitly disabled if pointers are
+// not 16-byte aligned. Further, the pointer realignment invalidates a
+// header that palloc creates just prior to the pointer address.  Greenplum
+// after commit f62bd1c fails due to this invalid header.  Hence, the
+// pointer realignment and Eigen vectorization is disabled below for
+// Greenplum 6 and above.
+
+// See http://eigen.tuxfamily.org/dox/group__TopicUnalignedArrayAssert.html
+// for steps to disable vectorization
+#define EIGEN_DONT_VECTORIZE
+#define EIGEN_DISABLE_UNALIGNED_ARRAY_ASSERT
+#endif
+
 #include "../../postgres/dbconnector/dbconnector.hpp"
 
 #endif // defined(MADLIB_GREENPLUM_DBCONNECTOR_HPP)

http://git-wip-us.apache.org/repos/asf/madlib/blob/d62e5516/src/ports/postgres/dbconnector/Allocator_impl.hpp
--
diff --git a/src/ports/postgres/dbconnector/Allocator_impl.hpp 
b/src/ports/postgres/dbconnector/Allocator_impl.hpp
index 4c44207..996117b 100644
--- a/src/ports/postgres/dbconnector/Allocator_impl.hpp
+++ b/src/ports/postgres/dbconnector/Allocator_impl.hpp
@@ -211,7 +211,7 @@ template 
 inline
 void *
 Allocator::internalPalloc(size_t inSize) const {
-#if MAXIMUM_ALIGNOF >= 16
+#if MAXIMUM_ALIGNOF >= 16  || defined EIGEN_DONT_VECTORIZE
 return (ZM == dbal::DoZero) ? palloc0(inSize) : palloc(inSize);
 #else
 if (inSize > std::numeric_limits::max() - 16)
@@ -221,7 +221,7 @@ Allocator::internalPalloc(size_t inSize) const {
 const size_t size = inSize + 16;
 void *raw = (ZM == dbal::DoZero) ? palloc0(size) : palloc(size);
 return makeAligned(raw);
-#endif
+#endif  // MAXIMUM_ALIGNOF >= 16
 }
 
 /**
@@ -243,7 +243,7 @@ template 
 inline
 void *
 Allocator::internalRePalloc(void *inPtr, size_t inSize) const {
-#if MAXIMUM_ALIGNOF >= 16
+#if MAXIMUM_ALIGNOF >= 16 || defined EIGEN_DONT_VECTORIZE
 return repalloc(inPtr, inSize);
 #else
 if (inSize > std::numeric_limits::max() - 16) {
@@ -262,7 +262,7 @@ Allocator::internalRePalloc(void *inPtr, size_t 

madlib git commit: Control: Add minor comments to context managers

2018-09-13 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master 85d09e675 -> 2cde01d1f


Control: Add minor comments to context managers


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/2cde01d1
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/2cde01d1
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/2cde01d1

Branch: refs/heads/master
Commit: 2cde01d1ff011c47a1e6f03007e0ada5395617f4
Parents: 85d09e6
Author: Rahul Iyer 
Authored: Thu Sep 13 14:43:09 2018 -0700
Committer: Rahul Iyer 
Committed: Thu Sep 13 14:43:13 2018 -0700

--
 src/ports/postgres/modules/utilities/control.py_in | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/2cde01d1/src/ports/postgres/modules/utilities/control.py_in
--
diff --git a/src/ports/postgres/modules/utilities/control.py_in 
b/src/ports/postgres/modules/utilities/control.py_in
index 7900086..d147103 100644
--- a/src/ports/postgres/modules/utilities/control.py_in
+++ b/src/ports/postgres/modules/utilities/control.py_in
@@ -100,6 +100,10 @@ class HashaggControl(ContextDecorator):
 """
 @brief: A wrapper that enables/disables the hashagg and then sets it back
 to the original value on exit
+
+This context manager should be used at the top-level and any exception
+raised from this should be re-raised (if caught) to ensure the transaction
+does not commit.
 """
 
 def __init__(self, enable=True):
@@ -134,6 +138,10 @@ class MinWarning(ContextDecorator):
 
 """
 @brief A wrapper for setting the level of logs going into client
+
+This context manager should be used at the top-level and any exception
+raised from this should be re-raised (if caught) to ensure the transaction
+does not commit.
 """
 
 def __init__(self, warningLevel='error'):
@@ -163,6 +171,10 @@ class AOControl(ContextDecorator):
 
 """
 @brief: A wrapper that enables/disables the AO storage option
+
+This context manager should be used at the top-level and any exception
+raised from this should be re-raised (if caught) to ensure the transaction
+does not commit.
 """
 
 def __init__(self, enable=False):
@@ -192,7 +204,7 @@ class AOControl(ContextDecorator):
 "show 
gp_default_storage_options")[0]["gp_default_storage_options"]
 self._parse_gp_default_storage_options(_storage_options_str)
 
-# Set APPENDONLY=False after backing up existing value
+# Set APPENDONLY= after backing up existing value
 self.was_ao_enabled = self.storage_options_dict['appendonly']
 self.storage_options_dict['appendonly'] = self.to_enable
 plpy.execute("set gp_default_storage_options={0}".



[1/2] madlib git commit: Build: Disable AppendOnly if available

2018-09-13 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master b76a08344 -> 3db98babe


http://git-wip-us.apache.org/repos/asf/madlib/blob/3db98bab/src/ports/postgres/modules/stats/pred_metrics.sql_in
--
diff --git a/src/ports/postgres/modules/stats/pred_metrics.sql_in 
b/src/ports/postgres/modules/stats/pred_metrics.sql_in
index 3f62746..32de9a9 100644
--- a/src/ports/postgres/modules/stats/pred_metrics.sql_in
+++ b/src/ports/postgres/modules/stats/pred_metrics.sql_in
@@ -411,8 +411,9 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.mean_abs_error(
 ) RETURNS VOID
 AS $$
 PythonFunctionBodyOnly(`stats', `pred_metrics')
-return pred_metrics.mean_abs_error(
-table_in, table_out, prediction_col, observed_col, grouping_cols)
+with AOControl(False):
+return pred_metrics.mean_abs_error(
+table_in, table_out, prediction_col, observed_col, grouping_cols)
 $$ LANGUAGE plpythonu
 m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `');
 
@@ -430,8 +431,9 @@ m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL 
DATA', `');
 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.mean_abs_error(message TEXT)
 RETURNS TEXT AS $$
 PythonFunctionBodyOnly(`stats', `pred_metrics')
-return pred_metrics.metric_agg_help_msg(schema_madlib, message,
-'mean_abs_error')
+with AOControl(False):
+return pred_metrics.metric_agg_help_msg(schema_madlib, message,
+'mean_abs_error')
 $$ language plpythonu
 m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `CONTAINS SQL', `');
 
@@ -463,8 +465,9 @@ CREATE OR REPLACE FUNCTION 
MADLIB_SCHEMA.mean_abs_perc_error(
 ) RETURNS VOID
 AS $$
 PythonFunctionBodyOnly(`stats', `pred_metrics')
-return pred_metrics.mean_abs_perc_error(
-table_in, table_out, prediction_col, observed_col, grouping_cols)
+with AOControl(False):
+return pred_metrics.mean_abs_perc_error(
+table_in, table_out, prediction_col, observed_col, grouping_cols)
 $$ LANGUAGE plpythonu
 m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `');
 
@@ -482,8 +485,9 @@ m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL 
DATA', `');
 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.mean_abs_perc_error(message TEXT)
 RETURNS TEXT AS $$
 PythonFunctionBodyOnly(`stats', `pred_metrics')
-return pred_metrics.metric_agg_help_msg(schema_madlib, message,
-'mean_abs_perc_error')
+with AOControl(False):
+return pred_metrics.metric_agg_help_msg(schema_madlib, message,
+'mean_abs_perc_error')
 $$ language plpythonu
 m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `CONTAINS SQL', `');
 
@@ -515,8 +519,9 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.mean_perc_error(
 ) RETURNS VOID
 AS $$
 PythonFunctionBodyOnly(`stats', `pred_metrics')
-return pred_metrics.mean_perc_error(
-   table_in, table_out, prediction_col, observed_col, grouping_cols)
+with AOControl(False):
+return pred_metrics.mean_perc_error(
+table_in, table_out, prediction_col, observed_col, grouping_cols)
 $$ LANGUAGE plpythonu
 m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `');
 
@@ -534,8 +539,9 @@ m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL 
DATA', `');
 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.mean_perc_error(message TEXT)
 RETURNS TEXT AS $$
 PythonFunctionBodyOnly(`stats', `pred_metrics')
-return pred_metrics.metric_agg_help_msg(schema_madlib, message,
-'mean_perc_error')
+with AOControl(False):
+return pred_metrics.metric_agg_help_msg(schema_madlib, message,
+'mean_perc_error')
 $$ language plpythonu
 m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `CONTAINS SQL', `');
 
@@ -567,8 +573,9 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.mean_squared_error(
 ) RETURNS VOID
 AS $$
 PythonFunctionBodyOnly(`stats', `pred_metrics')
-return pred_metrics.mean_squared_error(
-   table_in, table_out, prediction_col, observed_col, grouping_cols)
+with AOControl(False):
+return pred_metrics.mean_squared_error(
+table_in, table_out, prediction_col, observed_col, grouping_cols)
 $$ LANGUAGE plpythonu
 m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL DATA', `');
 
@@ -586,8 +593,9 @@ m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `MODIFIES SQL 
DATA', `');
 CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.mean_squared_error(message TEXT)
 RETURNS TEXT AS $$
 PythonFunctionBodyOnly(`stats', `pred_metrics')
-return pred_metrics.metric_agg_help_msg(schema_madlib, message,
-'mean_squared_error')
+with AOControl(False):
+return pred_metrics.metric_agg_help_msg(schema_madlib, message,
+ 

[2/2] madlib git commit: Build: Disable AppendOnly if available

2018-09-13 Thread riyer
Build: Disable AppendOnly if available

JIRA: MADLIB-1171

Greenplum provides an Append-optimized table storage that does not allow
UPDATE and DELETE. MADlib model tables are small enough that they won't
see a big benefit of using AO instead of Heap tables.

This commit ensures that APPENDONLY=False during MADlib function call
(the GUC is reset back to original value during exit). For cases where
we recreate the data table (standardization, redistribution, etc), we
have to explicitly add an 'APPENDONLY=true' to see the AO benefits.

Closes #316


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/3db98bab
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/3db98bab
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/3db98bab

Branch: refs/heads/master
Commit: 3db98babe3326fb5e2cd16d0639a2bef264f4b04
Parents: b76a083
Author: Rahul Iyer 
Authored: Wed Aug 29 16:23:04 2018 -0700
Committer: Rahul Iyer 
Committed: Thu Sep 13 11:24:22 2018 -0700

--
 src/ports/postgres/madpack/SQLCommon.m4_in  |  15 +-
 .../modules/assoc_rules/assoc_rules.sql_in  |  80 
 src/ports/postgres/modules/convex/mlp.sql_in|  72 ---
 .../modules/convex/utils_regularization.py_in   | 129 ++--
 .../modules/elastic_net/elastic_net.sql_in  |  13 +-
 src/ports/postgres/modules/knn/knn.py_in|   2 +-
 src/ports/postgres/modules/knn/knn.sql_in   |  36 +---
 src/ports/postgres/modules/lda/lda.py_in|  10 +-
 src/ports/postgres/modules/lda/lda.sql_in   |  44 ++--
 .../postgres/modules/linalg/matrix_ops.sql_in   | 201 +++
 src/ports/postgres/modules/linalg/svd.sql_in|  47 +++--
 src/ports/postgres/modules/pca/pca.py_in|  10 +-
 src/ports/postgres/modules/pca/pca.sql_in   |   6 +-
 .../postgres/modules/pca/pca_project.py_in  |   4 +-
 .../recursive_partitioning/decision_tree.sql_in |  50 ++---
 .../recursive_partitioning/random_forest.sql_in |  41 ++--
 .../postgres/modules/stats/correlation.sql_in   |  27 ++-
 .../modules/stats/cox_prop_hazards.sql_in   |  49 ++---
 .../postgres/modules/stats/pred_metrics.sql_in  |  82 +---
 .../postgres/modules/summary/summary.sql_in |  15 +-
 src/ports/postgres/modules/tsa/arima.sql_in |  25 ++-
 .../postgres/modules/utilities/cols2vec.sql_in  |   8 +-
 .../postgres/modules/utilities/control.py_in|  55 +
 .../utilities/minibatch_preprocessing.py_in |  20 +-
 .../utilities/minibatch_preprocessing.sql_in|   7 +-
 .../utilities/test/unit_tests/plpy_mock.py_in   |   8 +
 .../test/unit_tests/test_control.py_in  |  81 
 .../modules/utilities/test/utilities.sql_in |   5 +-
 .../modules/utilities/text_utilities.sql_in |   5 +-
 29 files changed, 684 insertions(+), 463 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/3db98bab/src/ports/postgres/madpack/SQLCommon.m4_in
--
diff --git a/src/ports/postgres/madpack/SQLCommon.m4_in 
b/src/ports/postgres/madpack/SQLCommon.m4_in
index afc82d2..ffc0c37 100644
--- a/src/ports/postgres/madpack/SQLCommon.m4_in
+++ b/src/ports/postgres/madpack/SQLCommon.m4_in
@@ -28,14 +28,14 @@ m4_changequote()
  * RETURNS DOUBLE PRECISION[]
  * AS $$PythonFunction(regress, logistic, compute_logregr_coef)$$
  * LANGUAGE plpythonu VOLATILE;
- */ 
+ */
 m4_define(, , )
 
 /*
@@ -59,14 +61,14 @@ m4_define(, , , )
 
 /*

http://git-wip-us.apache.org/repos/asf/madlib/blob/3db98bab/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in
--
diff --git a/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in 
b/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in
index 8ee9fcb..ec3c330 100644
--- a/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in
+++ b/src/ports/postgres/modules/assoc_rules/assoc_rules.sql_in
@@ -493,23 +493,19 @@ CREATE OR REPLACE FUNCTION MADLIB_SCHEMA.assoc_rules
)
 RETURNS MADLIB_SCHEMA.assoc_rules_results
 AS $$
-
 PythonFunctionBodyOnly(`assoc_rules', `assoc_rules')
-
-plpy.execute("SET client_min_messages = error;")
-
-# schema_madlib comes from PythonFunctionBodyOnly
-return assoc_rules.assoc_rules(
-schema_madlib,
-support,
-confidence,
-tid_col,
-item_col,
-input_table,
-output_schema,
-verbose,
-max_itemset_size
-);
+with AOControl(False):
+plpy.execute("SET client_min_messages = error;")
+# schema_madlib comes from PythonFunctionBodyOnly
+return assoc_rules.assoc_rules(schema_madlib,
+   support,
+   confidence,
+   

[3/3] madlib git commit: Multiple: Remove trailing whitespace from all SQL

2018-09-07 Thread riyer
Multiple: Remove trailing whitespace from all SQL

Markup language states that two trailing whitespace should be
interpreted as a break line (), which has been implemented by
Doxygen 1.8+. This commit removes all such instances since the trailing
whitespace is inadvertent in most cases. If a break line is required,
then it should be added explicitly (using HTML tag ).

Closes #317

Co-authored-by: Domino Valdano 


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/35818fa3
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/35818fa3
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/35818fa3

Branch: refs/heads/master
Commit: 35818fa395f965191b59ddbfcd1469470f44271b
Parents: 92bdf8c
Author: Rahul Iyer 
Authored: Fri Sep 7 15:12:49 2018 -0700
Committer: Rahul Iyer 
Committed: Fri Sep 7 15:12:49 2018 -0700

--
 methods/array_ops/src/pg_gp/array_ops.sql_in|   7 +-
 src/ports/postgres/modules/bayes/bayes.sql_in   |  46 ++---
 .../postgres/modules/bayes/test/bayes.sql_in| 114 ++--
 .../conjugate_gradient/test/conj_grad.sql_in|  24 +--
 src/ports/postgres/modules/convex/mlp.sql_in|   4 +-
 .../modules/crf/test/crf_test_large.sql_in  |   6 +-
 .../modules/crf/test/crf_train_small.sql_in |  10 +-
 .../modules/elastic_net/elastic_net.sql_in  |  12 +-
 src/ports/postgres/modules/glm/glm.sql_in   |   4 +-
 src/ports/postgres/modules/glm/ordinal.sql_in   |  38 ++--
 .../postgres/modules/glm/test/ordinal.sql_in|  12 +-
 src/ports/postgres/modules/graph/bfs.sql_in |  88 +-
 src/ports/postgres/modules/graph/hits.sql_in|  36 ++--
 .../postgres/modules/graph/pagerank.sql_in  |  12 +-
 src/ports/postgres/modules/graph/wcc.sql_in |  10 +-
 src/ports/postgres/modules/knn/knn.sql_in   |  14 +-
 src/ports/postgres/modules/lda/lda.sql_in   | 172 +--
 src/ports/postgres/modules/linalg/svd.sql_in|  62 +++
 src/ports/postgres/modules/pca/pca.sql_in   | 104 +--
 .../postgres/modules/pca/pca_project.sql_in |  54 +++---
 .../recursive_partitioning/decision_tree.sql_in |   4 +-
 .../recursive_partitioning/random_forest.sql_in |   6 +-
 .../postgres/modules/regress/linear.sql_in  |  24 +--
 .../postgres/modules/regress/logistic.sql_in|  26 +--
 .../modules/regress/test/clustered.sql_in   |   8 +-
 .../postgres/modules/stats/correlation.sql_in   |  54 +++---
 .../modules/stats/hypothesis_tests.sql_in   |   6 +-
 .../postgres/modules/stats/pred_metrics.sql_in  |   8 +-
 .../postgres/modules/stats/test/f_test.sql_in   |   2 +-
 .../postgres/modules/stats/test/ks_test.sql_in  |   2 +-
 .../postgres/modules/stats/test/mw_test.sql_in  |   2 +-
 .../postgres/modules/stats/test/t_test.sql_in   |   4 +-
 .../postgres/modules/stats/test/wsr_test.sql_in |   2 +-
 .../postgres/modules/summary/summary.sql_in |  44 ++---
 src/ports/postgres/modules/svm/svm.sql_in   | 130 +++---
 .../modules/tsa/test/arima_train.sql_in |  54 +++---
 .../postgres/modules/utilities/cols2vec.sql_in  |  12 +-
 .../postgres/modules/utilities/path.sql_in  |  24 +--
 .../postgres/modules/utilities/pivot.sql_in |   4 +-
 .../modules/utilities/sessionize.sql_in |  34 ++--
 .../modules/utilities/text_utilities.sql_in |  48 +++---
 .../postgres/modules/utilities/utilities.sql_in |   2 +-
 .../postgres/modules/utilities/vec2cols.sql_in  |  42 ++---
 43 files changed, 685 insertions(+), 686 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/35818fa3/methods/array_ops/src/pg_gp/array_ops.sql_in
--
diff --git a/methods/array_ops/src/pg_gp/array_ops.sql_in 
b/methods/array_ops/src/pg_gp/array_ops.sql_in
index 3c905ce..e1aa368 100644
--- a/methods/array_ops/src/pg_gp/array_ops.sql_in
+++ b/methods/array_ops/src/pg_gp/array_ops.sql_in
@@ -275,12 +275,11 @@ m4_ifdef(`__HAS_FUNCTION_PROPERTIES__', `NO SQL', `');
  * @brief Aggregate, element-wise sum of arrays. It requires that all the 
values are NON-NULL. Return type is the same as the input type.
  *
  * @param x Array x
- * @param y Array y
- * @returns Sum of x and y.
+ * @returns Sum of x
  *
  */
-DROP AGGREGATE IF EXISTS MADLIB_SCHEMA.sum(anyarray) CASCADE;
-CREATE AGGREGATE MADLIB_SCHEMA.sum(anyarray) (
+DROP AGGREGATE IF EXISTS MADLIB_SCHEMA.sum(/* x */ anyarray) CASCADE;
+CREATE AGGREGATE MADLIB_SCHEMA.sum(/* x */ anyarray) (
 SFUNC = MADLIB_SCHEMA.array_add,
 STYPE = anyarray
m4_ifdef( `__POSTGRESQL__', `', `, PREFUNC   = MADLIB_SCHEMA.array_add')

http://git-wip-us.apache.org/repos/asf/madlib/blob/35818fa3/src/ports/postgres/modules/bayes/bayes.sql_in
--
diff --git 

[1/3] madlib git commit: Multiple: Remove trailing whitespace from all SQL

2018-09-07 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master 92bdf8cab -> 35818fa39


http://git-wip-us.apache.org/repos/asf/madlib/blob/35818fa3/src/ports/postgres/modules/tsa/test/arima_train.sql_in
--
diff --git a/src/ports/postgres/modules/tsa/test/arima_train.sql_in 
b/src/ports/postgres/modules/tsa/test/arima_train.sql_in
index e1b2919..e6f5bd2 100644
--- a/src/ports/postgres/modules/tsa/test/arima_train.sql_in
+++ b/src/ports/postgres/modules/tsa/test/arima_train.sql_in
@@ -66,55 +66,55 @@ drop table if exists tsa_out;
 drop table if exists tsa_out_summary;
 drop table if exists tsa_out_residual;
 select arima_train('mini_ts', 'tsa_out', 'id', 'val', NULL, TRUE, 
ARRAY[1,0,1]);
-select assert(relative_error(ar_params, ARRAY[0.685268276058]) < 1e-2, 'ARIMA: 
wrong ar_params') from tsa_out; 
-select assert(relative_error(ar_std_errors, ARRAY[0.103996616127]) < 1e-2, 
'ARIMA: wrong ar_std_errors') from tsa_out; 
-select assert(relative_error(ma_params, ARRAY[0.730629026211]) < 1e-2, 'ARIMA: 
wrong ma_params') from tsa_out; 
-select assert(relative_error(ma_std_errors, ARRAY[0.0979481470864]) < 1e-2, 
'ARIMA: wrong ma_std_errors') from tsa_out; 
-select assert(relative_error(mean, 38.6009250545) < 1e-2, 'ARIMA: wrong mean') 
from tsa_out; 
-select assert(relative_error(mean_std_error, 13.2499230619) < 1e-2, 'ARIMA: 
wrong mean_std_errors') from tsa_out; 
-select assert(relative_error(residual_variance, 281.669418496) < 1e-2, 'ARIMA: 
wrong residual_variance') from tsa_out_summary; 
-select assert(relative_error(log_likelihood, -207.725973784) < 1e-2, 'ARIMA: 
wrong log_likelihood') from tsa_out_summary; 
+select assert(relative_error(ar_params, ARRAY[0.685268276058]) < 1e-2, 'ARIMA: 
wrong ar_params') from tsa_out;
+select assert(relative_error(ar_std_errors, ARRAY[0.103996616127]) < 1e-2, 
'ARIMA: wrong ar_std_errors') from tsa_out;
+select assert(relative_error(ma_params, ARRAY[0.730629026211]) < 1e-2, 'ARIMA: 
wrong ma_params') from tsa_out;
+select assert(relative_error(ma_std_errors, ARRAY[0.0979481470864]) < 1e-2, 
'ARIMA: wrong ma_std_errors') from tsa_out;
+select assert(relative_error(mean, 38.6009250545) < 1e-2, 'ARIMA: wrong mean') 
from tsa_out;
+select assert(relative_error(mean_std_error, 13.2499230619) < 1e-2, 'ARIMA: 
wrong mean_std_errors') from tsa_out;
+select assert(relative_error(residual_variance, 281.669418496) < 1e-2, 'ARIMA: 
wrong residual_variance') from tsa_out_summary;
+select assert(relative_error(log_likelihood, -207.725973784) < 1e-2, 'ARIMA: 
wrong log_likelihood') from tsa_out_summary;
 
 -- FALSE, ARRAY[1,0,1]
 drop table if exists tsa_out;
 drop table if exists tsa_out_summary;
 drop table if exists tsa_out_residual;
 select arima_train('mini_ts', 'tsa_out', 'id', 'val', NULL, FALSE, 
ARRAY[1,0,1]);
-select assert(relative_error(ar_params, ARRAY[0.831752901064]) < 1e-2, 'ARIMA: 
wrong ar_params') from tsa_out; 
-select assert(relative_error(ar_std_errors, ARRAY[0.0695053543058]) < 1e-2, 
'ARIMA: wrong ar_std_errors') from tsa_out; 
-select assert(relative_error(ma_params, ARRAY[0.701393608306]) < 1e-2, 'ARIMA: 
wrong ma_params') from tsa_out; 
-select assert(relative_error(ma_std_errors, ARRAY[0.0969171335486]) < 1e-2, 
'ARIMA: wrong ma_std_errors') from tsa_out; 
-select assert(relative_error(residual_variance, 304.217719576) < 1e-2, 'ARIMA: 
wrong residual_variance') from tsa_out_summary; 
-select assert(relative_error(log_likelihood, -209.61270701) < 1e-2, 'ARIMA: 
wrong log_likelihood') from tsa_out_summary; 
+select assert(relative_error(ar_params, ARRAY[0.831752901064]) < 1e-2, 'ARIMA: 
wrong ar_params') from tsa_out;
+select assert(relative_error(ar_std_errors, ARRAY[0.0695053543058]) < 1e-2, 
'ARIMA: wrong ar_std_errors') from tsa_out;
+select assert(relative_error(ma_params, ARRAY[0.701393608306]) < 1e-2, 'ARIMA: 
wrong ma_params') from tsa_out;
+select assert(relative_error(ma_std_errors, ARRAY[0.0969171335486]) < 1e-2, 
'ARIMA: wrong ma_std_errors') from tsa_out;
+select assert(relative_error(residual_variance, 304.217719576) < 1e-2, 'ARIMA: 
wrong residual_variance') from tsa_out_summary;
+select assert(relative_error(log_likelihood, -209.61270701) < 1e-2, 'ARIMA: 
wrong log_likelihood') from tsa_out_summary;
 
 -- FALSE, ARRAY[1,1,1]
 drop table if exists tsa_out;
 drop table if exists tsa_out_summary;
 drop table if exists tsa_out_residual;
 select arima_train('mini_ts', 'tsa_out', 'id', 'val', NULL, FALSE, 
ARRAY[1,1,1]);
-select assert(relative_error(ar_params, ARRAY[0.16327119476]) < 1e-2, 'ARIMA: 
wrong ar_params') from tsa_out; 
-select assert(relative_error(ar_std_errors, ARRAY[0.211608737666]) < 1e-2, 
'ARIMA: wrong ar_std_errors') from tsa_out; 
-select assert(relative_error(ma_params, ARRAY[0.630297255402]) < 1e-2, 'ARIMA: 
wrong ma_params') from tsa_out; 
-select assert(relative_error(ma_std_errors, ARRAY[0.163395070851]) < 1e-2, 
'ARIMA: wrong ma_std_errors') from tsa_out; 

[2/3] madlib git commit: Multiple: Remove trailing whitespace from all SQL

2018-09-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib/blob/35818fa3/src/ports/postgres/modules/linalg/svd.sql_in
--
diff --git a/src/ports/postgres/modules/linalg/svd.sql_in 
b/src/ports/postgres/modules/linalg/svd.sql_in
index b6d763b..070f5e4 100644
--- a/src/ports/postgres/modules/linalg/svd.sql_in
+++ b/src/ports/postgres/modules/linalg/svd.sql_in
@@ -79,17 +79,17 @@ row22   {0, 1}
 
 
 output_table_prefix
-TEXT. Prefix for output tables. See 
-Output Tables below for a  description 
+TEXT. Prefix for output tables. See
+Output Tables below for a  description
 of the convention used.
 row_id
 TEXT. ID for each row.
 k
 INTEGER. Number of singular values to compute.
 n_iterations (optional). 
-INTEGER. Number of iterations to run.  
-@note The number of iterations must be 
-in the range [k, column dimension], where 
+INTEGER. Number of iterations to run.
+@note The number of iterations must be
+in the range [k, column dimension], where
 k is number of singular values.
 result_summary_table (optional)
 TEXT. The name of the table to store the result summary.
@@ -99,7 +99,7 @@ row22   {0, 1}
 SVD Function for Sparse Matrices
 
 Use this function for matrices that are represented in the sparse-matrix
-format (example below).  Note that the input matrix is converted to a 
+format (example below).  Note that the input matrix is converted to a
 dense matrix before the SVD operation, for efficient computation reasons. 
 
 
@@ -142,8 +142,8 @@ matrix, indicating that the 4th row and 7th column contain 
all zeros.
 
 
 output_table_prefix
-TEXT. Prefix for output tables. See 
-Output Tables below for a  description 
+TEXT. Prefix for output tables. See
+Output Tables below for a  description
 of the convention used. 
 row_id
 TEXT. Name of the column containing the row index for each entry in sparse 
matrix.
@@ -158,9 +158,9 @@ matrix, indicating that the 4th row and 7th column contain 
all zeros.
 k
 INTEGER. Number of singular values to compute.
 n_iterations (optional)
-INTEGER. Number of iterations to run.  
-@note The number of iterations must be 
-in the range [k, column dimension], where 
+INTEGER. Number of iterations to run.
+@note The number of iterations must be
+in the range [k, column dimension], where
 k is number of singular values.
 result_summary_table (optional)
 TEXT. The name of the table to store the result summary.
@@ -171,10 +171,10 @@ matrix, indicating that the 4th row and 7th column 
contain all zeros.
 Native Implementation for Sparse Matrices
 
 Use this function for matrices that are represented in the sparse-matrix
-format (see sparse matrix example above). This function uses the 
+format (see sparse matrix example above). This function uses the
 native sparse representation while computing the SVD.
-@note Note that this function should be favored if the matrix is 
-highly sparse, since it computes very sparse matrices 
+@note Note that this function should be favored if the matrix is
+highly sparse, since it computes very sparse matrices
 efficiently. 
 
 
@@ -195,8 +195,8 @@ svd_sparse_native( source_table,
 source_table
 TEXT. Source table name (sparse matrix - see example above).
 output_table_prefix
-TEXT. Prefix for output tables. See 
-Output Tables below for a  description 
+TEXT. Prefix for output tables. See
+Output Tables below for a  description
 of the convention used.
 row_id
 TEXT. ID for each row.
@@ -211,9 +211,9 @@ svd_sparse_native( source_table,
 k
 INTEGER. Number of singular values to compute.
 n_iterations (optional)
-INTEGER. Number of iterations to run.  
-@note The number of iterations must be 
-in the range [k, column dimension], where 
+INTEGER. Number of iterations to run.
+@note The number of iterations must be
+in the range [k, column dimension], where
 k is number of singular values.
 result_summary_table (optional)
 TEXT. Table name to store result summary.
@@ -307,7 +307,7 @@ CREATE TABLE mat (
 );
 INSERT INTO mat VALUES
 (1,'{396,840,353,446,318,886,15,584,159,383}'),
-(2,'{691,58,899,163,159,533,604,582,269,390}'), 
+(2,'{691,58,899,163,159,533,604,582,269,390}'),
 (3,'{293,742,298,75,404,857,941,662,846,2}'),
 (4,'{462,532,787,265,982,306,600,608,212,885}'),
 (5,'{304,151,337,387,643,753,603,531,459,652}'),
@@ -328,7 +328,7 @@ INSERT INTO mat VALUES
 
 SELECT madlib.svd( 'mat',   -- Input table
'svd',   -- Output table prefix
-   'row_id',-- Column name with row index 
+   'row_id',-- Column name with row index
10,  -- Number of singular values to compute
NULL,-- Use default number of iterations
'svd_summary_table'  

madlib git commit: MLP: Simplify momentum and Nesterov updates

2018-09-04 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master 5ab573bec -> 92bdf8cab


MLP: Simplify momentum and Nesterov updates

JIRA: MADLIB-1272

Momentum updates are complicated due to Nesterov requiring an initial
update before gradient calculations. There is, however, a different form
of the Nesterov update that can be cleanly performed after the regular
update, simplifying the code. This allows performing the gradient
calculations before any update - with or without Nesterov.

Closes #313


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/92bdf8ca
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/92bdf8ca
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/92bdf8ca

Branch: refs/heads/master
Commit: 92bdf8cab087472da1b2962f4ce51dc20255f6ba
Parents: 5ab573b
Author: Rahul Iyer 
Authored: Fri Aug 17 01:42:53 2018 -0700
Committer: Rahul Iyer 
Committed: Wed Aug 29 10:31:08 2018 -0700

--
 src/modules/convex/task/mlp.hpp   | 53 +-
 src/modules/convex/type/model.hpp | 44 ++--
 2 files changed, 42 insertions(+), 55 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/92bdf8ca/src/modules/convex/task/mlp.hpp
--
diff --git a/src/modules/convex/task/mlp.hpp b/src/modules/convex/task/mlp.hpp
index 3915ab1..b772549 100644
--- a/src/modules/convex/task/mlp.hpp
+++ b/src/modules/convex/task/mlp.hpp
@@ -158,9 +158,6 @@ MLP::getLossAndUpdateModel(
 const double ) {
 
 double total_loss = 0.;
-// model is updated with the momentum step (i.e. velocity vector)
-// if Nesterov Accelerated Gradient is enabled
-model.nesterovUpdatePosition();
 
 // initialize gradient vector
 std::vector total_gradient_per_layer(model.num_layers);
@@ -188,22 +185,37 @@ MLP::getLossAndUpdateModel(
 total_loss += getLoss(y_true, o.back(), model.is_classification);
 }
 
-// convert gradient to a gradient update vector
-//  1. normalize to per row update
-//  2. discount by stepsize
-//  3. add regularization
-//  4. make negative
 for (Index k=0; k < model.num_layers; k++){
+// convert gradient to a gradient update vector
+//  1. normalize to per row update
+//  2. discount by stepsize
+//  3. add regularization
+//  4. make negative for descent
 Matrix regularization = MLP::lambda * model.u[k];
 regularization.row(0).setZero(); // Do not update bias
-total_gradient_per_layer[k] = -stepsize * (total_gradient_per_layer[k] 
/ static_cast(num_rows_in_batch) +
-  regularization);
-model.updateVelocity(total_gradient_per_layer[k], k);
-model.updatePosition(total_gradient_per_layer[k], k);
+total_gradient_per_layer[k] = -stepsize *
+(total_gradient_per_layer[k] / 
static_cast(num_rows_in_batch) +
+ regularization);
+
+// total_gradient_per_layer is now the update vector
+if (model.momentum > 0){
+model.velocity[k] = model.momentum * model.velocity[k] + 
total_gradient_per_layer[k];
+if (model.is_nesterov){
+// Below equation ensures that Nesterov updates are half step
+// ahead of regular momentum updates i.e. next step's 
discounted
+// velocity update is already added in the current step.
+model.u[k] += model.momentum * model.velocity[k] + 
total_gradient_per_layer[k];
+}
+else{
+model.u[k] += model.velocity[k];
+}
+} else {
+// no momentum
+model.u[k] += total_gradient_per_layer[k];
+}
 }
 
 return total_loss;
-
 }
 
 
@@ -215,8 +227,6 @@ MLP::gradientInPlace(
 const dependent_variable_type   _true,
 const double)
 {
-model.nesterovUpdatePosition();
-
 std::vector net, o, delta;
 
 feedForward(model, x, net, o);
@@ -225,15 +235,18 @@ MLP::gradientInPlace(
 for (Index k=0; k < model.num_layers; k++){
 Matrix regularization = MLP::lambda*model.u[k];
 regularization.row(0).setZero(); // Do not update bias
+
 if (model.momentum > 0){
 Matrix gradient = -stepsize * (o[k] * delta[k].transpose() + 
regularization);
-model.updateVelocity(gradient, k);
-model.updatePosition(gradient, k);
+model.velocity[k] = model.momentum * model.velocity[k] + gradient;
+if (model.is_nesterov)
+model.u[k] += model.momentum * model.velocity[k] + gradient;
+else
+model.u[k] += model.velocity[k];

[3/3] madlib git commit: Multiple: Re-enable tests in PCA, Pagerank

2018-08-16 Thread riyer
Multiple: Re-enable tests in PCA, Pagerank

JIRA: MADLIB-1264

Some tests were commented out due to failures on GPDB 5.X.
These tests are now working and have been enabled again.

Closes #312

Co-authored-by: Arvind Sridhar 


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/a3b59356
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/a3b59356
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/a3b59356

Branch: refs/heads/master
Commit: a3b59356f328fb949d63758a518aeca6d72220cf
Parents: 5ccf12e
Author: Jingyi Mei 
Authored: Thu Aug 16 20:12:25 2018 -0700
Committer: Rahul Iyer 
Committed: Thu Aug 16 20:18:21 2018 -0700

--
 .../postgres/modules/graph/test/pagerank.sql_in | 23 +++-
 src/ports/postgres/modules/pca/test/pca.sql_in  | 16 ++
 2 files changed, 20 insertions(+), 19 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/a3b59356/src/ports/postgres/modules/graph/test/pagerank.sql_in
--
diff --git a/src/ports/postgres/modules/graph/test/pagerank.sql_in 
b/src/ports/postgres/modules/graph/test/pagerank.sql_in
index 14d3371..e797812 100644
--- a/src/ports/postgres/modules/graph/test/pagerank.sql_in
+++ b/src/ports/postgres/modules/graph/test/pagerank.sql_in
@@ -60,6 +60,7 @@ INSERT INTO "EDGE" VALUES
 (5, 6, 2),
 (6, 3, 2);
 
+-- Test pagerank without group
 DROP TABLE IF EXISTS pagerank_out, pagerank_out_summary;
 SELECT pagerank(
  'vertex',-- Vertex table
@@ -73,6 +74,8 @@ SELECT assert(relative_error(SUM(pagerank), 1) < 0.1,
 'PageRank: Scores do not sum up to 1.'
 ) FROM pagerank_out;
 
+
+-- Test pagerank with group
 DROP TABLE IF EXISTS pagerank_gr_out;
 DROP TABLE IF EXISTS pagerank_gr_out_summary;
 SELECT pagerank(
@@ -84,7 +87,7 @@ SELECT pagerank(
  NULL,  -- Default damping factor (0.85)
  NULL,  -- Default max iters (100)
  NULL,  -- Default Threshold
- 'user_id'); -- Personlized Nodes
+ 'user_id');-- Grouping Column
 
 -- View the PageRank of all vertices, sorted by their scores.
 SELECT assert(relative_error(SUM(pagerank), 1) < 0.1,
@@ -94,8 +97,16 @@ SELECT assert(relative_error(SUM(pagerank), 1) < 0.1,
 'PageRank: Scores do not sum up to 1 for group 2.'
 ) FROM pagerank_gr_out WHERE user_id=2;
 
--- Tests for Personalized Page Rank
+-- Check the iteration numbers for convergency
+SELECT assert(relative_error(__iterations__, 11) = 0,
+'PageRank: Incorrect iterations for group 1.'
+) FROM pagerank_gr_out_summary WHERE user_id=1;
+SELECT assert(relative_error(__iterations__, 14) = 0,
+'PageRank: Incorrect iterations for group 2.'
+) FROM pagerank_gr_out_summary WHERE user_id=2;
+
 
+-- Tests for Personalized Page Rank
 -- Test without grouping
 
 DROP TABLE IF EXISTS pagerank_ppr_out;
@@ -141,14 +152,6 @@ SELECT assert(relative_error(SUM(pagerank), 1) < 0.005,
 ) FROM pagerank_ppr_grp_out WHERE user_id=1;
 select assert(array_agg(user_id order by pagerank desc)= '{2, 2, 1, 1, 2, 2, 
1, 1, 2, 1, 2, 1, 2, 1}','Unexpected Ranking with grouping ') from  
pagerank_ppr_grp_out  ;
 
--- These tests have been temporarily removed for GPDB5 alpha support
-
--- SELECT assert(relative_error(__iterations__, 27) = 0,
--- 'PageRank: Incorrect iterations for group 1.'
--- ) FROM pagerank_gr_out_summary WHERE user_id=1;
--- SELECT assert(relative_error(__iterations__, 31) = 0,
--- 'PageRank: Incorrect iterations for group 2.'
--- ) FROM pagerank_gr_out_summary WHERE user_id=2;
 
 -- Test to capture corner case reported in 
https://issues.apache.org/jira/browse/MADLIB-1229
 

http://git-wip-us.apache.org/repos/asf/madlib/blob/a3b59356/src/ports/postgres/modules/pca/test/pca.sql_in
--
diff --git a/src/ports/postgres/modules/pca/test/pca.sql_in 
b/src/ports/postgres/modules/pca/test/pca.sql_in
index 5a97c94..1510254 100644
--- a/src/ports/postgres/modules/pca/test/pca.sql_in
+++ b/src/ports/postgres/modules/pca/test/pca.sql_in
@@ -145,16 +145,14 @@ COPY mat (id, row_vec, grp) FROM stdin delimiter '|';
 16|{739,651,678,577,273,935,661,47,373,618}|2
 \.
 
--- This test has been temporarily removed for GPDB5 alpha support
-
 -- Learn individaul PCA models based on grouping column (grp)
--- drop table if exists result_table_214712398172490837;
--- drop table if exists result_table_214712398172490837_mean;
--- drop table if exists result_table_214712398172490838;
--- select pca_train('mat', 'result_table_214712398172490837', 'id', 0.8,
--- 'grp', 5, FALSE, 'result_table_214712398172490838');
--- select * 

[1/3] madlib git commit: Elastic Net: Allow grouping by non-numeric column

2018-08-16 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master 441f16bd5 -> a3b59356f


Elastic Net: Allow grouping by non-numeric column

JIRA: MADLIB-1262

- Grouping columns should be quoted if the type of the column is of type
TEXT.
- Grouping column names that require double quoting need special
handling.

Closes #309

Co-authored-by: Domino Valdano 
Co-authored-by: Rahul Iyer 


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/ec328dba
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/ec328dba
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/ec328dba

Branch: refs/heads/master
Commit: ec328dba6853d31df5b1bd6bbdcd35933596fe78
Parents: 441f16b
Author: Arvind Sridhar 
Authored: Thu Aug 16 20:02:48 2018 -0700
Committer: Rahul Iyer 
Committed: Thu Aug 16 20:03:22 2018 -0700

--
 .../elastic_net_generate_result.py_in   |  63 ++
 .../modules/elastic_net/test/elastic_net.sql_in | 122 +++
 2 files changed, 161 insertions(+), 24 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/ec328dba/src/ports/postgres/modules/elastic_net/elastic_net_generate_result.py_in
--
diff --git 
a/src/ports/postgres/modules/elastic_net/elastic_net_generate_result.py_in 
b/src/ports/postgres/modules/elastic_net/elastic_net_generate_result.py_in
index 1dbd664..15881b4 100644
--- a/src/ports/postgres/modules/elastic_net/elastic_net_generate_result.py_in
+++ b/src/ports/postgres/modules/elastic_net/elastic_net_generate_result.py_in
@@ -2,7 +2,10 @@ import plpy
 from elastic_net_utils import _process_results
 from elastic_net_utils import _compute_log_likelihood
 from utilities.validate_args import get_cols_and_types
+from utilities.validate_args import quote_ident
 from utilities.utilities import split_quoted_delimited_str
+from internal.db_utils import quote_literal
+
 
 def _elastic_net_generate_result(optimizer, iteration_run, **args):
 """
@@ -33,26 +36,30 @@ def _elastic_net_generate_result(optimizer, iteration_run, 
**args):
 col_grp_key = args['col_grp_key']
 grouping_str = args['grouping_str']
 cols_types = dict(get_cols_and_types(args["tbl_source"]))
-grouping_str1 = grouping_column + ","
+grouping_cols_list = split_quoted_delimited_str(grouping_column)
+grouping_str1 = ','.join(['{0} AS {1}'.format(g, quote_ident(g))
+ for g in grouping_cols_list])
 
 select_mean_and_std = ''
 inner_join_x = ''
 inner_join_y = ''
-grouping_cols_list = split_quoted_delimited_str(grouping_column)
-select_grp = ','.join(['n_tuples_including_nulls_subq.'+str(grp)
-for grp in grouping_cols_list]) + ','
-select_grouping_info = ','.join([grp_col+"\t"+cols_types[grp_col]
+select_grp = ','.join(['n_tuples_including_nulls_subq.' + 
str(quote_ident(grp))
+  for grp in grouping_cols_list]) + ','
+select_grouping_info = ','.join([grp_col + "\t" + cols_types[grp_col]
 for grp_col in grouping_cols_list]) + 
","
 if data_scaled:
 x_grp_cols = ' AND '.join([
-'n_tuples_including_nulls_subq.{0}={1}.{2}'.format(grp,
-args["x_mean_table"], grp) for grp in grouping_cols_list])
+'{0} = {1}.{2}'.format('n_tuples_including_nulls_subq.' + 
str(quote_ident(grp)),
+   args["x_mean_table"], grp)
+for grp in grouping_cols_list])
 y_grp_cols = ' AND '.join([
-'n_tuples_including_nulls_subq.{0}={1}.{2}'.format(grp,
-args["y_mean_table"], grp) for grp in grouping_cols_list])
-select_mean_and_std = ' {0}.mean AS x_mean, 
'.format(args["x_mean_table"]) +\
-' {0}.mean AS y_mean, '.format(args["y_mean_table"]) +\
-' {0}.std AS x_std, '.format(args["x_mean_table"])
+'{0}={1}.{2}'.format('n_tuples_including_nulls_subq.' + 
str(quote_ident(grp)),
+ args["y_mean_table"], grp)
+for grp in grouping_cols_list])
+select_mean_and_std = (
+' {0}.mean AS x_mean, '.format(args["x_mean_table"]) +
+' {0}.mean AS y_mean, '.format(args["y_mean_table"]) +
+' {0}.std AS x_std, '.format(args["x_mean_table"]))
 inner_join_x = ' INNER JOIN {0} ON {1} '.format(
 args["x_mean_table"], x_grp_cols)
 inner_join_y = ' INNER JOIN {0} ON {1} '.format(
@@ -66,7 +73,7 @@ def _elastic_net_generate_result(optimizer, iteration_run, 
**args):
 FROM
   

[2/3] madlib git commit: Vec2Cols: Allow arrays of different lengths

2018-08-16 Thread riyer
Vec2Cols: Allow arrays of different lengths

JIRA: MADLIB-1270

Added support to split arrays of different lengths in the vector_col.
If the user does not provide feature names, we pad each array to the
maximum length and split across the maximum possible number of features.
If the user does provide feature names, we truncate/pad the arrays
according to the number of features the user desires.

Closes #311

Co-authored-by: Arvind Sridhar 


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/5ccf12e1
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/5ccf12e1
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/5ccf12e1

Branch: refs/heads/master
Commit: 5ccf12e113f04b02b9ccf8c9aee107a4feb4bd88
Parents: ec328db
Author: Rahul Iyer 
Authored: Thu Aug 16 20:08:32 2018 -0700
Committer: Rahul Iyer 
Committed: Thu Aug 16 20:08:32 2018 -0700

--
 .../utilities/test/transform_vec_cols.sql_in| 47 
 .../unit_tests/test_transform_vec_cols.py_in| 14 +-
 .../modules/utilities/transform_vec_cols.py_in  | 34 +++---
 3 files changed, 64 insertions(+), 31 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/5ccf12e1/src/ports/postgres/modules/utilities/test/transform_vec_cols.sql_in
--
diff --git 
a/src/ports/postgres/modules/utilities/test/transform_vec_cols.sql_in 
b/src/ports/postgres/modules/utilities/test/transform_vec_cols.sql_in
index 47ab299..b43b39f 100644
--- a/src/ports/postgres/modules/utilities/test/transform_vec_cols.sql_in
+++ b/src/ports/postgres/modules/utilities/test/transform_vec_cols.sql_in
@@ -104,6 +104,53 @@ SELECT assert ((SELECT count(*) FROM 
information_schema.columns WHERE table_name
 SELECT assert ((SELECT clouds_airquality[1] FROM dt_golf WHERE id = 1) = 
(SELECT clouds FROM out_table WHERE id = 1), 'Split values do not match up');
 SELECT assert ((SELECT clouds_airquality[2] FROM dt_golf WHERE id = 1) = 
(SELECT air_quality FROM out_table WHERE id = 1), 'Split values do not match 
up');
 
+-- Testing splitting arrays of different lengths into features
+
+DROP TABLE IF EXISTS diff_lengths_test;
+CREATE TABLE diff_lengths_test(
+"id" INTEGER,
+"arr" TEXT[]);
+INSERT INTO diff_lengths_test VALUES (1, '{a, b}'), (2, '{c, d}'), (3, '{e, f, 
g, h}'), (4, '{i}'), (5, '{}');
+
+DROP TABLE IF EXISTS out_table;
+SELECT vec2cols(
+'diff_lengths_test',
+'out_table',
+'arr'
+);
+
+SELECT assert ((SELECT count(*) FROM information_schema.columns WHERE 
table_name='out_table') = (SELECT max(array_upper(arr, 1)) from 
diff_lengths_test), 'Number of split columns does not match');
+
+DROP TABLE IF EXISTS out_table;
+SELECT vec2cols(
+'diff_lengths_test',
+'out_table',
+'arr',
+ARRAY['a']
+);
+
+SELECT assert ((SELECT count(*) FROM information_schema.columns WHERE 
table_name='out_table') = 1, 'Number of split columns does not match');
+
+DROP TABLE IF EXISTS out_table;
+SELECT vec2cols(
+'diff_lengths_test',
+'out_table',
+'arr',
+ARRAY['a', 'b', 'c']
+);
+
+SELECT assert ((SELECT count(*) FROM information_schema.columns WHERE 
table_name='out_table') = 3, 'Number of split columns does not match');
+
+DROP TABLE IF EXISTS out_table;
+SELECT vec2cols(
+'diff_lengths_test',
+'out_table',
+'arr',
+ARRAY['a', 'b', 'c', 'd', 'e', 'f', 'g']
+);
+
+SELECT assert ((SELECT count(*) FROM information_schema.columns WHERE 
table_name='out_table') = 7, 'Number of split columns does not match');
+
 -- Special character tests
 
 DROP TABLE IF EXISTS special_char_check;

http://git-wip-us.apache.org/repos/asf/madlib/blob/5ccf12e1/src/ports/postgres/modules/utilities/test/unit_tests/test_transform_vec_cols.py_in
--
diff --git 
a/src/ports/postgres/modules/utilities/test/unit_tests/test_transform_vec_cols.py_in
 
b/src/ports/postgres/modules/utilities/test/unit_tests/test_transform_vec_cols.py_in
index 6475f9b..3020309 100644
--- 
a/src/ports/postgres/modules/utilities/test/unit_tests/test_transform_vec_cols.py_in
+++ 
b/src/ports/postgres/modules/utilities/test/unit_tests/test_transform_vec_cols.py_in
@@ -125,23 +125,13 @@ class Vec2ColsTestSuite(unittest.TestCase):
 
 def test_get_names_for_split_output_cols_feature_names_none(self):
 self.plpy_mock_execute.return_value = [{"n_x": 3}]
-new_cols = 
self.subject.get_names_for_split_output_cols(self.default_source_table, 
'foobar', None)
+new_cols = 
self.subject.get_names_for_split_output_cols(self.default_source_table, 
'foobar')
 self.assertEqual(['f1', 'f2', 'f3'], new_cols)
 
-def test_get_names_for_split_output_cols_feature_names_not_none(self):
-

madlib git commit: Build: Download compatible Boost if version >= 1.65

2018-08-15 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master 0490ea779 -> cf5ace944


Build: Download compatible Boost if version >= 1.65

JIRA: MADLIB-1235

BOOST 1.65.0 removed the TR1 library which is required by MADlib till
C++11 is completely supported. Hence, we force download of a compatible
version if existing Boost is 1.65 or greater. This should be removed
when TR1 dependency is removed.

Closes #310


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/cf5ace94
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/cf5ace94
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/cf5ace94

Branch: refs/heads/master
Commit: cf5ace944bef74648fd456c1f00356df78e90f4f
Parents: 0490ea7
Author: Rahul Iyer 
Authored: Sat Aug 11 12:28:29 2018 -0700
Committer: Rahul Iyer 
Committed: Wed Aug 15 10:14:18 2018 -0700

--
 src/CMakeLists.txt | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/cf5ace94/src/CMakeLists.txt
--
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index e2ce352..c9759ad 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -103,21 +103,30 @@ set(MAD_MODULE_DIR ${CMAKE_CURRENT_SOURCE_DIR}/modules)
 # -- Third-party dependencies: Find or download Boost 
--
 
 find_package(Boost 1.47)
-
-# We use BOOST_ASSERT_MSG, which only exists in Boost 1.47 and later.
-# Unfortunately, the FindBoost module seems to be broken with respect to 
version
-# checking, so we will set Boost_FOUND to FALSE if the version is too old.
 if(Boost_FOUND)
+# We use BOOST_ASSERT_MSG, which only exists in Boost 1.47 and later.
+# Unfortunately, the FindBoost module seems to be broken with respect to
+# version checking, so we will set Boost_FOUND to FALSE if the version is
+# too old.
 if(Boost_VERSION LESS 104600)
+message(STATUS "No sufficiently recent version (>= 1.47) of Boost was 
found. Will download.")
+set(Boost_FOUND FALSE)
+endif(Boost_VERSION LESS 104600)
+
+# BOOST 1.65.0 removed the TR1 library which is required by MADlib till
+# C++11 is completely supported. Hence, we force download of a compatible
+# version if existing Boost is 1.65 or greater. FIXME: This should be
+# removed when TR1 dependency is removed.
+if(NOT Boost_VERSION LESS 106500)
+message(STATUS
+"Incompatible Boost version (>= 1.65) found. Will download a 
compatible version.")
 set(Boost_FOUND FALSE)
-endif(Boost_VERSION LESS 104600 )
+endif(NOT Boost_VERSION LESS 106500)
 endif(Boost_FOUND)
 
 if(Boost_FOUND)
 include_directories(${Boost_INCLUDE_DIRS})
 else(Boost_FOUND)
-message(STATUS "No sufficiently recent version (>= 1.47) of Boost was 
found. Will download.")
-
 ExternalProject_Add(EP_boost
 PREFIX ${MAD_THIRD_PARTY}
 DOWNLOAD_DIR ${MAD_THIRD_PARTY}/downloads



madlib git commit: Utilities: Use plpy.quote_ident if available

2018-08-13 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master aa18c0a3b -> 0490ea779


Utilities: Use plpy.quote_ident if available


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/0490ea77
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/0490ea77
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/0490ea77

Branch: refs/heads/master
Commit: 0490ea77911c33c941a59352ac5a3568f968b186
Parents: aa18c0a
Author: Rahul Iyer 
Authored: Mon Aug 13 15:45:31 2018 -0700
Committer: Rahul Iyer 
Committed: Mon Aug 13 15:45:31 2018 -0700

--
 .../modules/utilities/validate_args.py_in   | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/0490ea77/src/ports/postgres/modules/utilities/validate_args.py_in
--
diff --git a/src/ports/postgres/modules/utilities/validate_args.py_in 
b/src/ports/postgres/modules/utilities/validate_args.py_in
index f7f79e9..28e6aa4 100644
--- a/src/ports/postgres/modules/utilities/validate_args.py_in
+++ b/src/ports/postgres/modules/utilities/validate_args.py_in
@@ -72,19 +72,21 @@ def quote_ident(input_str):
 Returns:
 String
 """
-
-def quote_not_needed(ch):
-return (ch in string.ascii_lowercase or ch in string.digits or ch == 
'_')
-
-if input_str:
-input_str = input_str.strip()
-if all(quote_not_needed(c) for c in input_str):
-return input_str
-else:
-# if input_str has double quotes then each double quote
-# is prependend with a double quote
-# (the 1st double quote is used to escape the 2nd double quote)
-return '"' + re.sub(r'"', r'""', input_str) + '"'
+try:
+return plpy.quote_ident(input_str)
+except AttributeError:
+def quote_not_needed(ch):
+return (ch in string.ascii_lowercase or ch in string.digits or ch 
== '_')
+
+if input_str:
+input_str = input_str.strip()
+if all(quote_not_needed(c) for c in input_str):
+return input_str
+else:
+# if input_str has double quotes then each double quote
+# is prependend with a double quote
+# (the 1st double quote is used to escape the 2nd double quote)
+return '"' + re.sub(r'"', r'""', input_str) + '"'
 # -
 
 



madlib git commit: Build: Update versions after release

2018-08-13 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master fa02339dc -> aa18c0a3b


Build: Update versions after release


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/aa18c0a3
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/aa18c0a3
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/aa18c0a3

Branch: refs/heads/master
Commit: aa18c0a3bffea472eab87f8163a5146f5effb671
Parents: fa02339
Author: Rahul Iyer 
Authored: Mon Aug 13 11:40:36 2018 -0700
Committer: Rahul Iyer 
Committed: Mon Aug 13 11:40:36 2018 -0700

--
 deploy/postflight.sh   | 2 +-
 pom.xml| 2 +-
 src/config/Version.yml | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/aa18c0a3/deploy/postflight.sh
--
diff --git a/deploy/postflight.sh b/deploy/postflight.sh
index df430bd..2cb5460 100755
--- a/deploy/postflight.sh
+++ b/deploy/postflight.sh
@@ -2,7 +2,7 @@
 
 # $0 - Script Path, $1 - Package Path, $2 - Target Location, and $3 - Target 
Volume
 
-MADLIB_VERSION=1.15
+MADLIB_VERSION=1.15.1-dev
 
 find $2/usr/local/madlib/bin -type d -exec cp -RPf {} 
$2/usr/local/madlib/old_bin \; 2>/dev/null
 find $2/usr/local/madlib/bin -depth -type d -exec rm -r {} \; 2>/dev/null

http://git-wip-us.apache.org/repos/asf/madlib/blob/aa18c0a3/pom.xml
--
diff --git a/pom.xml b/pom.xml
index 1417ff8..e441dbc 100644
--- a/pom.xml
+++ b/pom.xml
@@ -22,7 +22,7 @@
 
   org.apache.madlib
   madlib
-  1.15
+  1.15.1-dev
   pom
 
   

http://git-wip-us.apache.org/repos/asf/madlib/blob/aa18c0a3/src/config/Version.yml
--
diff --git a/src/config/Version.yml b/src/config/Version.yml
index 8870dbc..6c9f460 100644
--- a/src/config/Version.yml
+++ b/src/config/Version.yml
@@ -1 +1 @@
-version: 1.15
+version: 1.15.1-dev



[madlib] Git Push Summary

2018-08-13 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/latest_release [created] fa02339dc


[madlib] Git Push Summary

2018-08-13 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/latest_release [deleted] d0ad93d26


[22/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__glm.html
--
diff --git a/docs/rc/group__grp__glm.html b/docs/rc/group__grp__glm.html
deleted file mode 100644
index 78d953c..000
--- a/docs/rc/group__grp__glm.html
+++ /dev/null
@@ -1,585 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: Generalized Linear Models
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('group__grp__glm.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-Generalized Linear ModelsSupervised Learning  Regression Models  

-
-
-Contents
-
-Training Function 
-
-Prediction Function 
-
-Examples 
-
-Related Topics 
-
-Generalized linear models extends ordinary linear regression by 
allowing the response variable to follow a more general set of distributions 
(rather than simply Gaussian distributions), and for a general family of 
functions of the response variable (the link function) to vary linearly with 
the predicted values (rather than assuming that the response itself must vary 
linearly).
-For example, data of counts would typically be modeled with a Poisson 
distribution and a log link, while binary outcomes would typically be modeled 
with a Bernoulli distribution (or binomial distribution, depending on exactly 
how the problem is phrased) and a log-odds (or logit) link function.
-Currently, the implemented distribution families are  
-
-Distribution Family Link Functions  
-
-Binomial logit, probit  
-
-Gamma inverse, identity, log  
-
-Gaussian identity, inverse, log  
-
-Inverse Gaussian inverse of square, inverse, identity, log  

-
-Poisson log, identity, square-root
-  
-
-Training FunctionGLM training function has the following 
format: 
-glm(source_table,
-model_table,
-dependent_varname,
-independent_varname,
-family_params,
-grouping_col,
-optim_params,
-verbose
-)
- Arguments 
-source_table 
-TEXT. The name of the table containing the training 
data.
-
-
-model_table 
-TEXT. Name of the generated table containing the 
model.
-The model table produced by glm contains the following columns:
-
-
-... Text. Grouping columns, if 
provided in input. This could be multiple columns depending on the 
grouping_col input. 
-
-
-
-coef FLOAT8. Vector of the coefficients in 
linear predictor. 
-
-
-
-log_likelihood FLOAT8. The log-likelihood \( 
l(\boldsymbol \beta) \). We use the maximum likelihood estimate of dispersion 
parameter to calculate the log-likelihood while R and Python use deviance 
estimate and Pearson estimate respectively. 
-
-
-
-std_err FLOAT8[]. Vector of the standard error 
of the coefficients. 
-
-
-
-z_stats or t_stats FLOAT8[]. Vector of the 
z-statistics (in Poisson distribtuion and Binomial distribution) or the 
t-statistics (in all other distributions) of the coefficients. 
-
-
-
-p_values FLOAT8[]. Vector of the p-values of 
the coefficients. 
-
-
-
-dispersion FLOAT8. The dispersion value 
(Pearson estimate). When family=poisson or family=binomial, the dispersion is 
always 1. 
-
-
-
-num_rows_processed BIGINT. Numbers of rows 
processed. 
-
-
-
-num_rows_skipped BIGINT. Numbers of rows 
skipped due to missing values or failures. 
-
-
-
-num_iterations INTEGER. The number of iterations actually 
completed. This would be different from 

[21/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__graph__measures.html
--
diff --git a/docs/rc/group__grp__graph__measures.html 
b/docs/rc/group__grp__graph__measures.html
deleted file mode 100644
index 9339d92..000
--- a/docs/rc/group__grp__graph__measures.html
+++ /dev/null
@@ -1,155 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: Measures
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('group__grp__graph__measures.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-Modules  
-  
-MeasuresGraph  
-
-
-Detailed 
Description
-A collection of metrics computed on a graph. 
-
-
-Modules
-Average Path 
Length
-Computes the average 
shortest-path length of a graph. 
-
-Closeness
-Computes the closeness 
centrality value of each node in the graph. 
-
-Graph 
Diameter
-Computes the diameter of a 
graph. 
-
-In-Out Degree
-Computes the degrees for 
each vertex. 
-
-
-
-
-
-
-  
-Generated on Mon Aug 6 2018 21:55:39 for MADlib by
-http://www.doxygen.org/index.html;>
- 1.8.14 
-  
-
-
-

http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__graph__measures.js
--
diff --git a/docs/rc/group__grp__graph__measures.js 
b/docs/rc/group__grp__graph__measures.js
deleted file mode 100644
index 6272fba..000
--- a/docs/rc/group__grp__graph__measures.js
+++ /dev/null
@@ -1,7 +0,0 @@
-var group__grp__graph__measures =
-[
-[ "Average Path Length", "group__grp__graph__avg__path__length.html", null 
],
-[ "Closeness", "group__grp__graph__closeness.html", null ],
-[ "Graph Diameter", "group__grp__graph__diameter.html", null ],
-[ "In-Out Degree", "group__grp__graph__vertex__degrees.html", null ]
-];
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__graph__vertex__degrees.html
--
diff --git a/docs/rc/group__grp__graph__vertex__degrees.html 
b/docs/rc/group__grp__graph__vertex__degrees.html
deleted file mode 100644
index 9d8a2f5..000
--- a/docs/rc/group__grp__graph__vertex__degrees.html
+++ /dev/null
@@ -1,273 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: In-Out Degree
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  

[19/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__lda.html
--
diff --git a/docs/rc/group__grp__lda.html b/docs/rc/group__grp__lda.html
deleted file mode 100644
index 3a04a90..000
--- a/docs/rc/group__grp__lda.html
+++ /dev/null
@@ -1,765 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: Latent Dirichlet Allocation
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('group__grp__lda.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-Latent Dirichlet AllocationUnsupervised Learning 
 Topic 
Modelling  
-
-
-Contents 
-
-Background 
-
-Training Function 
-
-Prediction Function 
-
-Perplexity 
-
-Helper Functions 
-
-Examples 
-
-Literature 
-
-Related Topics
-
-
-
-Latent Dirichlet Allocation (LDA) is a generative probabilistic model 
for natural texts. It is used in problems such as automated topic discovery, 
collaborative filtering, and document classification.
-In addition to an implementation of LDA, this MADlib module also provides a 
number of additional helper functions to interpret results of the LDA 
output.
-NoteTopic modeling is often used as part 
of a larger text processing pipeline, which may include operations such as term 
frequency, stemming and stop word removal. You can use the function Term Frequency to generate the 
required vocabulary format from raw documents for the LDA training function. 
See the examples later on this page for more details.
-Background
-The LDA model posits that each document is associated with a mixture of 
various topics (e.g., a document is related to Topic 1 with probability 0.7, 
and Topic 2 with probability 0.3), and that each word in the document is 
attributable to one of the document's topics. There is a (symmetric) Dirichlet 
prior with parameter \( \alpha \) on each document's topic mixture. In 
addition, there is another (symmetric) Dirichlet prior with parameter \( \beta 
\) on the distribution of words for each topic.
-The following generative process then defines a distribution over a corpus 
of documents:
-
-Sample for each topic \( i \), a per-topic word distribution \( \phi_i \) 
from the Dirichlet( \(\beta\)) prior.
-For each document:
-Sample a document length N from a suitable distribution, say, Poisson.
-Sample a topic mixture \( \theta \) for the document from the Dirichlet( 
\(\alpha\)) distribution.
-For each of the N words:
-Sample a topic \( z_n \) from the multinomial topic distribution \( \theta 
\).
-Sample a word \( w_n \) from the multinomial word distribution \( 
\phi_{z_n} \) associated with topic \( z_n \).
-
-
-
-
-
-In practice, only the words in each document are observable. The topic 
mixture of each document and the topic for each word in each document are 
latent unobservable variables that need to be inferred from the observables, 
and this is referred to as the inference problem for LDA. Exact inference is 
intractable, but several approximate inference algorithms for LDA have been 
developed. The simple and effective Gibbs sampling algorithm described in 
Griffiths and Steyvers [2] appears to be the current algorithm of choice.
-This implementation provides a parallel and scalable in-database solution 
for LDA based on Gibbs sampling. It takes advantage of the shared-nothing MPP 

[24/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__deprecated.html
--
diff --git a/docs/rc/group__grp__deprecated.html 
b/docs/rc/group__grp__deprecated.html
deleted file mode 100644
index aaa9813..000
--- a/docs/rc/group__grp__deprecated.html
+++ /dev/null
@@ -1,149 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: Deprecated Modules
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('group__grp__deprecated.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-Modules  
-  
-Deprecated Modules  
-
-
-Detailed 
Description
-Deprecated modules that will be removed in the next major version (2.0). 
There are newer MADlib modules that have replaced these functions. 
-
-
-Modules
-Create 
Indicator Variables
-Provides utility functions helpful for data preparation 
before modeling. 
-
-Multinomial Logistic Regression
-Also called as softmax regression, models the relationship 
between one or more independent variables and a categorical dependent variable. 

-
-
-
-
-
-
-  
-Generated on Mon Aug 6 2018 21:55:39 for MADlib by
-http://www.doxygen.org/index.html;>
- 1.8.14 
-  
-
-
-

http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__deprecated.js
--
diff --git a/docs/rc/group__grp__deprecated.js 
b/docs/rc/group__grp__deprecated.js
deleted file mode 100644
index 05ef03b..000
--- a/docs/rc/group__grp__deprecated.js
+++ /dev/null
@@ -1,5 +0,0 @@
-var group__grp__deprecated =
-[
-[ "Create Indicator Variables", "group__grp__indicator.html", null ],
-[ "Multinomial Logistic Regression", "group__grp__mlogreg.html", null ]
-];
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__desc__stats.html
--
diff --git a/docs/rc/group__grp__desc__stats.html 
b/docs/rc/group__grp__desc__stats.html
deleted file mode 100644
index 21c7333..000
--- a/docs/rc/group__grp__desc__stats.html
+++ /dev/null
@@ -1,152 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: Descriptive Statistics
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-

svn commit: r28667 - in /release/madlib: 1.14/ 1.15/

2018-08-10 Thread riyer
Author: riyer
Date: Fri Aug 10 23:23:27 2018
New Revision: 28667

Log:
Add 1.15 binaries and remove 1.14

Added:
release/madlib/1.15/
release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg   (with props)
release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg.asc
release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg.sha512
release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm   (with props)
release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm.asc
release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm.sha512
release/madlib/1.15/apache-madlib-1.15-bin-Linux.rpm   (with props)
release/madlib/1.15/apache-madlib-1.15-bin-Linux.rpm.asc
release/madlib/1.15/apache-madlib-1.15-bin-Linux.rpm.sha512
release/madlib/1.15/apache-madlib-1.15-src.tar.gz   (with props)
release/madlib/1.15/apache-madlib-1.15-src.tar.gz.asc
release/madlib/1.15/apache-madlib-1.15-src.tar.gz.sha512
Removed:
release/madlib/1.14/

Added: release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg
==
Binary file - no diff available.

Propchange: release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg
--
svn:mime-type = application/octet-stream

Added: release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg.asc
==
--- release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg.asc (added)
+++ release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg.asc Fri Aug 10 
23:23:27 2018
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCAAdFiEE+pCcIAZjKnTgT/pQYwVq5BLE49cFAltpL1QACgkQYwVq5BLE
+49cDJA/+IhMzkgxv6zX1Omuo8ofNMCetHJC4RmB8rwxem7DnLVUgwYNn+xK7lpAU
+Yn9nm/XtFGXVqJ4CWGzaDL/iW2fsUqI5LX22CgeRaRD/iXasYB5TWMKvspaYY5RW
+23Y7lYv3ea/+Gxnjj3uG7BwqxJ5YvtNiWoKWpq8PhSgo1souBivMGLGVS1DK55Wy
+gnZuGULY9qq3cr0n5N7HDRS0e3bzKWqpm5xcGAtz2O5hW7tVDqT2FBrJmOG8mkPQ
+GZ7cRPbeIeAi+CQzuvm522DtqPepJJW99UAl+0oksHgB6ag+iS80bufF27Fr9P0n
+18Lq59/mJwdeUIxK95ak2AWjjmuuFzLY5QB06kJ5Mze96m4SA/VFJ9qdGljcDesX
+BkwKNboi/zQSrUY5xVWNPWn3Qe5v0FUH8H0K1laqkczkeN+TGh8BlmOUF9DGbZ3l
+L8spewzlbjuUAVUX9Q5Sren4qiliTj7UR4+hhggDvHIAAQQjCsOj78dOzce3Px8c
+BrYRHCHbzBS6vg75DRj3P2KItpeRvwdZfNBaG/F0cPpBP/Yuwma62SdGATLdg6Fj
++mMcYysmJLTrPsN0fu+Q7YasWgkPJthnaIkdxpbpEFkh74ZZaYcpDZZw7HW3FBB7
+qm8DQiMrL5wED9khZtvWNuqrMjlCuIN+/j8d8N7508DMtPtSkVE=
+=wEHu
+-END PGP SIGNATURE-

Added: release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg.sha512
==
--- release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg.sha512 (added)
+++ release/madlib/1.15/apache-madlib-1.15-bin-Darwin.dmg.sha512 Fri Aug 10 
23:23:27 2018
@@ -0,0 +1 @@
+494c374d272ac707dd503b1c1e33900ca0cca56f48e7ad84a7bed4f01090dbc09155fb09998bfb8db2b448ab84b527e619fbfafc90e3369b4b49cc5a27d4d5aa
  apache-madlib-1.15-bin-Darwin.dmg

Added: release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm
==
Binary file - no diff available.

Propchange: release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm
--
svn:mime-type = application/octet-stream

Added: release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm.asc
==
--- release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm.asc (added)
+++ release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm.asc Fri Aug 10 
23:23:27 2018
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCAAdFiEE+pCcIAZjKnTgT/pQYwVq5BLE49cFAltqAGsACgkQYwVq5BLE
+49eFQg//T9SFevZM1eqUwWoM8DhuRdHDNJRaLRuCqTn4RY6cwd4quMLbJ6tMrMcY
+Da0JFfKHUk916eDvgDAyVbbYNfLI6+Td2xdXRZKJdkf8ju1XeLK3hx196C/g+DF+
+ldHILlIoizcLFsypSqOxSwqqIzZ4V+ZdHLsoGILsTQKdok5AuLRYmcJFu7bxbLWI
+gx4tKTFhTJzzDC00Sq9eBIabsWUQhiR7WpmwswRtuOAcvJQH4rwjPjozeBqLGLt5
+/+554enRlTbQw+2URj5DybIYjEVba58sMN8cj83FPu0745e+2kTDW6oZ5TXXGc15
+Rh4PDkSd0+AoUWX64ccT1n/AINMwm1f3g7CWU1lrzXnwY9H9+eABFwtYNBsoPJQU
+bp8QhvjrJMupRKaD89l3JpaRgwb1dxl57V0wKAqpfPBcXS2iElfpq2IZ9DyWOskz
+/pIpgXNFt/JNkww6wxFVyPxZJMBpjDzKMY9UBBqtXcrwx7C6J6OlYWeZLFNpSS/+
+4oVoRJEncN25p9pR4mXlzLKnGQW0pjVrKZocAy55g0WXIilwGiauCO6cQO9cufnF
+6698eIdj5K0ytmdxSsOiLv75j3tynne55aDF8xQTPsa4IDycpc8t/WlQnBmxT2Cs
+y85kVrNoY05+57hxSE1entDMigjbqN0nSrUk2Cp3Mjd47rnrRPA=
+=pmUc
+-END PGP SIGNATURE-

Added: release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm.sha512
==
--- release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm.sha512 (added)
+++ release/madlib/1.15/apache-madlib-1.15-bin-Linux-GPDB43.rpm.sha512 Fri Aug 
10 23:23:27 2018
@@ -0,0 +1

[14/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__ordinal.html
--
diff --git a/docs/rc/group__grp__ordinal.html b/docs/rc/group__grp__ordinal.html
deleted file mode 100644
index 97590d8..000
--- a/docs/rc/group__grp__ordinal.html
+++ /dev/null
@@ -1,477 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: Ordinal Regression
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('group__grp__ordinal.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-Ordinal RegressionSupervised Learning  Regression Models  
-
-
-Contents 
-
-Training Function 
-
-Prediction Function 
-
-Examples 
-
-Model Details 
-
-Literature 
-
-Related Topics 
-
-In statistics, ordinal regression is a type of regression analysis 
used for predicting an ordinal variable, i.e. a variable whose value exists on 
an arbitrary scale where only the relative ordering between different values is 
significant. The two most common types of ordinal regression models are ordered 
logit, which applies to data that meet the proportional odds assumption, and 
ordered probit.
-Training 
FunctionThe ordinal regression training function has the following 
syntax: 
-ordinal(source_table,
- model_table,
- dependent_varname,
- independent_varname,
- cat_order,
- link_func,
- grouping_col,
- optim_params,
- verbose
-)
-
-Arguments 
-source_table 
-VARCHAR. Name of the table containing the training 
data.
-
-
-model_table 
-VARCHAR. Name of the generated table containing the 
model.
-The model table produced by ordinal() contains the following columns:
-
-
-... Grouping columns, if provided in 
input. This could be multiple columns depending on the 
grouping_col input. 
-
-
-
-coef_threshold FLOAT8[]. Vector of the 
threshold coefficients in linear predictor. The threshold coefficients are the 
intercepts specific to each categorical levels 
-
-
-
-std_err_threshold FLOAT8[]. Vector of the 
threshold standard errors of the threshold coefficients. 
-
-
-
-z_stats_threshold FLOAT8[]. Vector of the 
threshold z-statistics of the thresholdcoefficients. 
-
-
-
-p_values_threshold FLOAT8[]. Vector of the 
threshold p-values of the threshold coefficients. 
-
-
-
-log_likelihood FLOAT8. The log-likelihood \( 
l(\boldsymbol \beta) \). The value will be the same across categories within 
the same group. 
-
-
-
-coef_feature FLOAT8[]. Vector of the feature 
coefficients in linear predictor. The feature coefficients are the coefficients 
for the independent variables. They are the same across categories. 
-
-
-
-std_err_feature FLOAT8[]. Vector of the 
feature standard errors of the feature coefficients. 
-
-
-
-z_stats_feature FLOAT8[]. Vector of the 
feature z-statistics of the feature coefficients. 
-
-
-
-p_values_feature FLOAT8[]. Vector of the 
feature p-values of the feature coefficients. 
-
-
-
-num_rows_processed BIGINT. Number of rows 
processed. 
-
-
-
-num_rows_skipped BIGINT. Number of rows 
skipped due to missing values or failures. 
-
-
-
-num_iterations INTEGER. Number of iterations actually completed. 
This would be different from the nIterations argument if a 
tolerance parameter is provided and the algorithm converges before 
all iterations are completed.  

[13/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__pca.html
--
diff --git a/docs/rc/group__grp__pca.html b/docs/rc/group__grp__pca.html
deleted file mode 100644
index 681c6d9..000
--- a/docs/rc/group__grp__pca.html
+++ /dev/null
@@ -1,149 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: Dimensionality Reduction
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('group__grp__pca.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-Modules  
-  
-Dimensionality ReductionUnsupervised Learning  

-
-
-Detailed 
Description
-Methods for reducing the number of variables in a dataset to obtain a set 
of principle variables. 
-
-
-Modules
-Principal 
Component Analysis
-Produces a model that 
transforms a number of (possibly) correlated variables into a (smaller) number 
of uncorrelated variables called principal components. 
-
-Principal 
Component Projection
-Projects a higher 
dimensional data point to a lower dimensional subspace spanned by principal 
components learned through the PCA training procedure. 
-
-
-
-
-
-
-  
-Generated on Mon Aug 6 2018 21:55:39 for MADlib by
-http://www.doxygen.org/index.html;>
- 1.8.14 
-  
-
-
-

http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__pca.js
--
diff --git a/docs/rc/group__grp__pca.js b/docs/rc/group__grp__pca.js
deleted file mode 100644
index 2863cf8..000
--- a/docs/rc/group__grp__pca.js
+++ /dev/null
@@ -1,5 +0,0 @@
-var group__grp__pca =
-[
-[ "Principal Component Analysis", "group__grp__pca__train.html", null ],
-[ "Principal Component Projection", "group__grp__pca__project.html", null ]
-];
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__pca__project.html
--
diff --git a/docs/rc/group__grp__pca__project.html 
b/docs/rc/group__grp__pca__project.html
deleted file mode 100644
index d5eda16..000
--- a/docs/rc/group__grp__pca__project.html
+++ /dev/null
@@ -1,513 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: Principal Component Projection
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  

[17/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__marginal.html
--
diff --git a/docs/rc/group__grp__marginal.html 
b/docs/rc/group__grp__marginal.html
deleted file mode 100644
index d88997f..000
--- a/docs/rc/group__grp__marginal.html
+++ /dev/null
@@ -1,440 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: Marginal Effects
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('group__grp__marginal.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-Marginal EffectsSupervised Learning  Regression Models  
-
-
-Contents 
-
-Marginal Effects with Interaction Terms 
-
-Examples 
-
-Notes 
-
-Technical Background 
-
-Literature 
-
-Related Topics 
-
-A marginal effect (ME) or partial effect measures the effect on the 
conditional mean of \( y \) for a change in one of the regressors, say \(X_k\). 
In the linear regression model, the ME equals the relevant slope coefficient, 
greatly simplifying analysis. For nonlinear models, specialized algorithms are 
required for calculating ME. The marginal effect computed is the average of the 
marginal effect at every data point present in the source table.
-MADlib provides marginal effects regression functions for linear, logistic 
and multinomial logistic regressions.
-WarningThe margins_logregr()
 and margins_mlogregr()
 functions have been deprecated in favor of the margins() function.
-Marginal Effects with Interaction Terms
-margins( model_table,
- output_table,
- x_design,
- source_table,
- marginal_vars
-   )
- Arguments 
-model_table 
-VARCHAR. The name of the model table, which is the output of logregr_train() or mlogregr_train(). 
-output_table 
-VARCHAR. The name of the result table. The output table has the following 
columns. 
-
-variables INTEGER[]. The indices of the basis variables.  

-
-margins DOUBLE PRECISION[]. The marginal effects.  
-
-std_err DOUBLE PRECISION[]. An array of the standard errors, 
computed using the delta method.  
-
-z_stats DOUBLE PRECISION[]. An array of the z-stats of the 
marginal effects.  
-
-p_values DOUBLE PRECISION[]. An array of the Wald p-values of the 
marginal effects.  
-
-
-x_design (optional) 
-VARCHAR, default: NULL. The design of independent 
variables, necessary only if interaction term or indicator (categorical) terms 
are present. This parameter is necessary since the independent variables in the 
underlying regression is not parsed to extract the relationship between 
variables.
-Example: The independent_varname in the regression method can be 
specified in either of the following ways:
- ‘array[1, color_blue, color_green, gender_female, gpa, gpa^2, 
gender_female*gpa, gender_female*gpa^2, weight]’ 
- ‘x’ 
-
-In the second version, the column x is an array containing data 
identical to that expressed in the first version, computed in a prior data 
preparation step. Supply an x_design argument to the margins() function in the following 
way:
- ‘1, i.color_blue.color, i.color_green.color, i.gender_female, 
gpa, gpa^2, gender_female*gpa, gender_female*gpa^2, weight’
-
-The variable names ('gpa', 'weight', ...), referred to here as 
identifiers, should be unique for each basis variable and need not be 
the same as the original variable 

[30/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__arraysmatrix.html
--
diff --git a/docs/rc/group__grp__arraysmatrix.html 
b/docs/rc/group__grp__arraysmatrix.html
deleted file mode 100644
index 520ac21..000
--- a/docs/rc/group__grp__arraysmatrix.html
+++ /dev/null
@@ -1,182 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: Arrays and Matrices
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('group__grp__arraysmatrix.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-Modules  
-  
-Arrays and MatricesData Types and 
Transformations  
-
-
-Detailed 
Description
-These modules provide basic mathematical operations to be run on array and 
matrices.
-For a distributed system, a matrix cannot simply be represented as a 2D 
array of numbers in memory. We provide two forms of distributed 
representation of a matrix:
-
-Dense: The matrix is represented as a distributed collection of 1-D 
arrays. An example 3x10 matrix would be the below table: 
- row_id | row_vec
-+-
-   1| {9,6,5,8,5,6,6,3,10,8}
-   2| {8,2,2,6,6,10,2,1,9,9}
-   3| {3,9,9,9,8,6,3,9,5,6}
-
-Sparse: The matrix is represented using the row and column indices for 
each non-zero entry of the matrix. Example: 
- row_id | col_id | value
-++---
-  1 |  1 | 9
-  1 |  5 | 6
-  1 |  6 | 6
-  2 |  1 | 8
-  3 |  1 | 3
-  3 |  2 | 9
-  4 |  7 | 0
-(6 rows)
-  All matrix operations work with either form of 
representation.
-
-In many cases, a matrix function can be decomposed to vector operations 
applied independently on each row of a matrix (or corresponding rows of two 
matrices). We have also provided access to these internal vector operations 
(Array Operations) for greater 
flexibility. Matrix operations like matrix_add use the corresponding 
vector operation (array_add) and also include additional validation 
and formating. Other functions like matrix_mult are complex and use a 
combination of such vector operations and other SQL operations.
-It's important to note that these array functions are only available 
for the dense format representation of the matrix. In general, the scope of a 
single array function invocation is limited to only an array (1-dimensional or 
2-dimensional) that fits in memory. When such function is executed on a table 
of arrays, the function is called multiple times - once for each array (or pair 
of arrays). On contrary, scope of a single matrix function invocation is the 
complete matrix stored as a distributed table. 
-
-
-Modules
-Array Operations
-Provides fast array operations supporting other MADlib 
modules. 
-
-Matrix Operations
-Provides fast matrix operations supporting other MADlib 
modules. 
-
-Matrix Factorization
-Linear algebra methods that 
factorize a matrix into a product of matrices. 
-
-Norms and Distance Functions
-Provides utility functions for basic linear algebra 
operations. 
-
-Sparse Vectors
-Implements a sparse vector data type that provides 
compressed storage of vectors that may have many duplicate elements. 
-
-
-
-
-
-
-  
-Generated on Mon Aug 6 2018 21:55:39 for MADlib by
-

[15/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__nn.html
--
diff --git a/docs/rc/group__grp__nn.html b/docs/rc/group__grp__nn.html
deleted file mode 100644
index d7569f6..000
--- a/docs/rc/group__grp__nn.html
+++ /dev/null
@@ -1,1143 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: Neural Network
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('group__grp__nn.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-Neural NetworkSupervised Learning  
-
-
-Contents
-
-Classification 
-
-Regression 
-
-Optimizer Parameters 
-
-Prediction Functions 
-
-Examples 
-
-Technical Background 
-
-Literature 
-
-Related Topics 
-
-Multilayer Perceptron (MLP) is a type of neural network that can be 
used for regression and classification.
-MLPs consist of several fully connected hidden layers with non-linear 
activation functions. In the case of classification, the final layer of the 
neural net has as many nodes as classes, and the output of the neural net can 
be interpreted as the probability that a given input feature belongs to a 
specific class.
-MLP can be used with or without mini-batching. The advantage of using 
mini-batching is that it can perform better than stochastic gradient descent 
(default MADlib optimizer) because it uses more than one training example at a 
time, typically resulting faster and smoother convergence [3].
-NoteIn order to use mini-batching, you 
must first run the Mini-Batch Preprocessor, 
which is a utility that prepares input data for use by models that support 
mini-batch as an optimization option, such as MLP. This is a one-time operation 
and you would only need to re-run the preprocessor if your input data has 
changed, or if you change the grouping parameter.
-Classification Training FunctionThe MLP classification 
training function has the following format:
-
-mlp_classification(
-source_table,
-output_table,
-independent_varname,
-dependent_varname,
-hidden_layer_sizes,
-optimizer_params,
-activation,
-weights,
-warm_start,
-verbose,
-grouping_col
-)
-Arguments 
-source_table 
-TEXT. Name of the table containing the training data. 
If you are using mini-batching, this is the name of the output table from the 
mini-batch preprocessor.
-
-
-output_table 
-TEXT. Name of the output table containing the model. 
Details of the output table are shown below. 
-
-
-independent_varname 
-TEXT. Expression list to evaluate for the independent 
variables. It should be a numeric array expression. If you are using 
mini-batching, set this parameter to 'independent_varname' which is the 
hardcoded name of the column from the mini-batch preprocessor containing the 
packed independent variables.
-NoteIf you are not using mini-batching, 
please note that an intercept variable should not be included as part of this 
expression - this is different from other MADlib modules. Also please note that 
independent variables should be encoded properly. All values are cast to DOUBLE 
PRECISION, so categorical variables should be one-hot or dummy encoded as 
appropriate. See Encoding 
Categorical Variables for more details. 
-
-dependent_varname 
-TEXT. Name of the dependent variable column. For 
classification, supported 

[18/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__linreg.html
--
diff --git a/docs/rc/group__grp__linreg.html b/docs/rc/group__grp__linreg.html
deleted file mode 100644
index 9e73ca3..000
--- a/docs/rc/group__grp__linreg.html
+++ /dev/null
@@ -1,479 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: Linear Regression
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('group__grp__linreg.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-Linear RegressionSupervised Learning  Regression Models  
-
-
-Contents 
-
-Training Function 
-
-Prediction Function 
-
-Examples 
-
-Technical Background 
-
-Literature 
-
-Related Topics 
-
-Linear regression models a linear relationship of a scalar dependent 
variable \( y \) to one or more explanatory independent variables \( x \) and 
builds a model of coefficients.
-Training 
Function
-The linear regression training function has the following syntax. 
-linregr_train( source_table,
-   out_table,
-   dependent_varname,
-   independent_varname,
-   grouping_cols,
-   heteroskedasticity_option
- )
-Arguments 
-source_table 
-TEXT. Name of the table containing the training 
data.
-
-
-out_table 
-TEXT. Name of the generated table containing the output 
model.
-The output table contains the following columns: 
-
-... Any grouping columns provided during training. 
Present only if the grouping option is used.  
-
-coef FLOAT8[]. Vector of the coefficients of the regression.  

-
-r2 FLOAT8. R-squared coefficient of determination of the model.  

-
-std_err FLOAT8[]. Vector of the standard error of the 
coefficients.  
-
-t_stats FLOAT8[]. Vector of the t-statistics of the coefficients. 
 
-
-p_values FLOAT8[]. Vector of the p-values of the coefficients.  

-
-condition_no FLOAT8 array. The condition number of the \(X^{*}X\) 
matrix. A high condition number is usually an indication that there may be some 
numeric instability in the result yielding a less reliable model. A high 
condition number often results when there is a significant amount of 
colinearity in the underlying design matrix, in which case other regression 
techniques, such as elastic net regression, may be more appropriate.  
-
-bp_stats FLOAT8. The Breush-Pagan statistic of heteroskedacity. 
Present only if the heteroskedacity argument was set to True when the model was 
trained.  
-
-bp_p_value FLOAT8. The Breush-Pagan calculated p-value. Present 
only if the heteroskedacity parameter was set to True when the model was 
trained.  
-
-num_rows_processed INTEGER. The number of rows that are actually 
used in each group.  
-
-num_missing_rows_skipped INTEGER. The number of rows that have 
NULL values in the dependent and independent variables, and were skipped in the 
computation for each group. 
-
-variance_covariance FLOAT[]. Variance/covariance matrix. 

-
-A summary table named out_table_summary is created 
together with the output table. It has the following columns: 
-
-method 'linregr' for linear regression.  
-
-source_table The data source table name 
-
-out_table The output table name 
-
-dependent_varname The dependent variable 
-
-independent_varname The independent variables 
-

[37/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/dir_8f36046b7fd6891397115ddb47a5ee66.html
--
diff --git a/docs/rc/dir_8f36046b7fd6891397115ddb47a5ee66.html 
b/docs/rc/dir_8f36046b7fd6891397115ddb47a5ee66.html
deleted file mode 100644
index a5d0284..000
--- a/docs/rc/dir_8f36046b7fd6891397115ddb47a5ee66.html
+++ /dev/null
@@ -1,143 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: prob Directory Reference
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('dir_8f36046b7fd6891397115ddb47a5ee66.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-prob Directory Reference  
-
-
-
-
-Files
-file prob.sql_in
-SQL functions for evaluating probability functions. 
-
-
-
-
-
-
-  
-madlibsrcportspostgresmodulesprob
-Generated on Mon Aug 6 2018 21:55:39 for MADlib by
-http://www.doxygen.org/index.html;>
- 1.8.14 
-  
-
-
-

http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/dir_a3a6204225c05cbe8d92623799329235.html
--
diff --git a/docs/rc/dir_a3a6204225c05cbe8d92623799329235.html 
b/docs/rc/dir_a3a6204225c05cbe8d92623799329235.html
deleted file mode 100644
index bbea54a..000
--- a/docs/rc/dir_a3a6204225c05cbe8d92623799329235.html
+++ /dev/null
@@ -1,142 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: src Directory Reference
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('dir_a3a6204225c05cbe8d92623799329235.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-src Directory Reference  
-
-
-
-
-Directories
-directory pg_gp
-
-
-
-
-
-
-  
-

[05/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__validation.html
--
diff --git a/docs/rc/group__grp__validation.html 
b/docs/rc/group__grp__validation.html
deleted file mode 100644
index 632fc47..000
--- a/docs/rc/group__grp__validation.html
+++ /dev/null
@@ -1,273 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: Cross Validation
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('group__grp__validation.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-Cross ValidationModel Selection  
-
-
-Contents 
-
-Cross-Validation Function 
-
-Examples 
-
-Notes 
-
-Technical Background 
-
-Related Topics 
-
-Estimates the fit of a predictive model given a data set and 
specifications for the training, prediction, and error estimation functions.
-Cross validation, sometimes called rotation estimation, is a technique for 
assessing how the results of a statistical analysis will generalize to an 
independent data set. It is mainly used in settings where the goal is 
prediction, and you want to estimate how accurately a predictive model will 
perform in practice.
-The cross-validation function provided by this module is very flexible and 
can work with algorithms you want to cross validate, including algorithms you 
write yourself. Among the inputs to the cross-validation function are 
specifications of the modelling, prediction, and error metric functions. These 
three-part specifications include the name of the function, an array of 
arguments to pass to the function, and an array of the data types of the 
arguments. This makes it possible to use functions from other MADlib modules or 
user-defined functions that you supply.
-
-The modelling (training) function takes in a given data set with 
independent and dependent variables and produces a model, which is stored in an 
output table.
-The prediction function takes in the model generated by the modelling 
function and a different data set with independent variables, and produces a 
prediction of the dependent variables based on the model, which is stored in an 
output table. The prediction function should take a unique ID column name in 
the data table as one of the inputs, so that the prediction result can be 
compared with the validation values. Note: Prediction function in some MADlib 
modules do not save results into an output table. These prediction functions 
are not suitable for cross-validation.
-The error metric function compares the prediction results with the known 
values of the dependent variables in the data set that was fed into the 
prediction function. It computes the error metric using the specified error 
metric function, storing the results in a table.
-
-Other inputs include the output table name, k value for the k-fold cross 
validation, and how many folds to try. For example, you can choose to run a 
simple validation instead of a full cross validation.
-Cross-Validation Function
-
-cross_validation_general( modelling_func,
-  modelling_params,
-  modelling_params_type,
-  param_explored,
-  explore_values,
-  predict_func,
-  predict_params,
- 

[51/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
Move v1.15 RC1 to latest released


Project: http://git-wip-us.apache.org/repos/asf/madlib-site/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib-site/commit/573d66d8
Tree: http://git-wip-us.apache.org/repos/asf/madlib-site/tree/573d66d8
Diff: http://git-wip-us.apache.org/repos/asf/madlib-site/diff/573d66d8

Branch: refs/heads/asf-site
Commit: 573d66d85212546a9c200d9ff396124378fc478f
Parents: 9a2b301
Author: Rahul Iyer 
Authored: Fri Aug 10 15:59:38 2018 -0700
Committer: Rahul Iyer 
Committed: Fri Aug 10 15:59:38 2018 -0700

--
 data/.RData |  Bin 0 -> 69 bytes
 data/.Rhistory  |4 +
 docs/index.html |2 +-
 docs/latest |2 +-
 docs/rc/apsp_8sql__in.html  |  335 -
 docs/rc/arima_8sql__in.html | 1070 ---
 docs/rc/array__ops_8sql__in.html| 1275 
 docs/rc/assoc__rules_8sql__in.html  |  415 --
 docs/rc/balance__sample_8sql__in.html   |  497 --
 docs/rc/bayes_8sql__in.html |  994 ---
 docs/rc/bc_s.png|  Bin 676 -> 0 bytes
 docs/rc/bdwn.png|  Bin 147 -> 0 bytes
 docs/rc/bfs_8sql__in.html   |  443 --
 docs/rc/closed.png  |  Bin 132 -> 0 bytes
 docs/rc/clustered__variance_8sql__in.html   | 1954 --
 .../rc/clustered__variance__coxph_8sql__in.html |  496 --
 docs/rc/cols2vec_8sql__in.html  |  316 -
 docs/rc/conjugate__gradient_8sql__in.html   |  263 -
 docs/rc/correlation_8sql__in.html   |  685 --
 docs/rc/cox__prop__hazards_8sql__in.html| 2150 --
 docs/rc/create__indicators_8sql__in.html|  340 -
 docs/rc/crf_8sql__in.html   |  559 --
 docs/rc/crf__data__loader_8sql__in.html |  342 -
 docs/rc/crf__feature__gen_8sql__in.html |  305 -
 docs/rc/cross__validation_8sql__in.html |  717 --
 docs/rc/decision__tree_8sql__in.html| 2764 
 docs/rc/dense__linear__systems_8sql__in.html|  647 --
 .../dir_012f026af89a95e7964e87a3db4f3f72.html   |  142 -
 .../dir_080635afba7a03bce9bcf848b744ecef.html   |  142 -
 .../dir_082e548d8897978bd67db5bb10c3f4ca.html   |  142 -
 .../dir_0e2c82fdc38d6347747c84b2495b87bb.html   |  143 -
 .../dir_0f0603029f2766ba6362c0486f42266f.html   |  142 -
 .../dir_1d47b74c56eeb36d1b42c4eefe6268df.html   |  143 -
 .../dir_1f3edc2a41a90b71d908e98c40e8e20f.html   |  143 -
 .../dir_20517c5c235c3aa13e267a0084f413b4.html   |  143 -
 .../dir_212c462ae803c05eae1fe2b1df645c56.html   |  142 -
 .../dir_26cdf48399aa0105c53c7623e443a32b.html   |  150 -
 .../dir_2fd46cdf9feef20c5a1de2fea1748af4.html   |  143 -
 .../dir_31661a94ac35e1e3b8f7fadfa53703b5.html   |  146 -
 .../dir_3c5e27e75c1f20438079b385c860229e.html   |  143 -
 .../dir_3e00766ce7bbd3258084476ece235bc0.html   |  152 -
 .../dir_3e5da3f4b4c531df2ac983d22a9bd897.html   |  143 -
 .../dir_4efa676c70d986e4be6149ce0c1d0b98.html   |  149 -
 .../dir_508e3ba2de19c9cb39df85c09ad79f77.html   |  143 -
 .../dir_51681ee935e6dd3c9bf433c39db08bf4.html   |  142 -
 .../dir_57f83f46582e45fe02cb0209b9cad992.html   |  152 -
 .../dir_6944d646d96379d734d568fa9f457ac2.html   |  142 -
 .../dir_6c8c1662e04f4d84cf895381f3c4ee75.html   |  169 -
 .../dir_7b19f40af17a56bc8266e4b0ec256b61.html   |  146 -
 .../dir_7b71f02250bd83717b51065786bd49f6.html   |  143 -
 .../dir_7bef9b9f49f23083f873cdb4f9aa5595.html   |  148 -
 .../dir_7f0185f98acca08613e6e8b8ed2c9454.html   |  142 -
 .../dir_84af2f6304104e948345b9ffbceda59c.html   |  143 -
 .../dir_8a1630b9e626a27a0fba85144676dd7e.html   |  143 -
 .../dir_8b6eadc8746db3a817149b816651d271.html   |  143 -
 .../dir_8f36046b7fd6891397115ddb47a5ee66.html   |  143 -
 .../dir_a3a6204225c05cbe8d92623799329235.html   |  142 -
 .../dir_a50d939c472d90effb762b784a85c42f.html   |  143 -
 .../dir_abda3f8ccfdd5b50c49da6f23b1283ee.html   |  144 -
 .../dir_ac8432244a3c88336507031a023f4059.html   |  151 -
 .../dir_b18f8a58178adf1282c19b355fc56476.html   |  157 -
 .../dir_bfb6fe26cbfcbd6092eb3bff4002d9b4.html   |  142 -
 .../dir_c8a121080de679af346a38eb58b36514.html   |  142 -
 .../dir_c97a42988cbd79aebe75c02ffb75992a.html   |  144 -
 .../dir_cc740a115287ad80150f497f51742950.html   |  143 -
 .../dir_ce4fa7aad06dd1bbca713eb50be7391b.html   |  190 -
 .../dir_d2a50e0bf8f1a4defa2e0ba0a193a480.html   |  142 -
 .../dir_d3dbcc2e792650c67298228e381c9a26.html   |  143 -
 .../dir_d9f864a50dc114ae327fea67d9326f10.html   |  161 -
 .../dir_efdb815e8132703ae96e54278f654003.html   |  142 -
 .../dir_f22d7b16c4d94fc51216129c2f2d4ca9.html   |  143 -
 .../dir_f6ab7d321b1475f96a73949691e0e1a0.html   |  157 -
 .../dir_fe3b9425dacf2fb6ecd5c85236398360.html   |  142 -
 docs/rc/distribution_8sql__in.html  |  330 -
 

[01/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
Repository: madlib-site
Updated Branches:
  refs/heads/asf-site 9a2b301d3 -> 573d66d85


http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/lda_8sql__in.html
--
diff --git a/docs/rc/lda_8sql__in.html b/docs/rc/lda_8sql__in.html
deleted file mode 100644
index e2423a7..000
--- a/docs/rc/lda_8sql__in.html
+++ /dev/null
@@ -1,1422 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: lda.sql_in File Reference
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('lda_8sql__in.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-Functions  
-  
-lda.sql_in File Reference  
-
-
-
-SQL functions for Latent Dirichlet Allocation.  
-More...
-
-
-Functions
-set lda_result lda_train (text 
data_table, text model_table, text output_data_table, int4 voc_size, int4 
topic_num, int4 iter_num, float8 alpha, float8 beta)
-This UDF provides an entry 
for the lda training process.  More...
-
-set lda_result lda_predict 
(text data_table, text model_table, text output_table)
-This UDF provides an entry 
for the lda predicton process.  More...
-
-set lda_result lda_predict 
(text data_table, text model_table, text output_table, int4 iter_num)
-A overloaded version which 
allows users to specify iter_num.  More...
-
-set lda_result lda_get_topic_word_count
 (text model_table, text output_table)
-This UDF computes the 
per-topic word counts.  More...
-
-set lda_result lda_get_word_topic_count
 (text model_table, text output_table)
-This UDF computes the 
per-word topic counts.  More...
-
-set lda_result lda_get_topic_desc
 (text model_table, text vocab_table, text desc_table, int4 top_k)
-This UDF gets the 
description for each topic (top-k words)  More...
-
-set lda_result lda_get_word_topic_mapping
 (text lda_output_table, text mapping_table)
-This UDF gets the wordid - 
topicid mapping from the lda training output table.  More...
-
-int4 []__lda_random_assign
 (int4 word_count, int4 topic_num)
-This UDF assigns topics to 
words in a document randomly.  More...
-
-int4 []__lda_gibbs_sample
 (int4[] words, int4[] counts, int4[] doc_topic, int8[] model, float8 alpha, 
float8 beta, int4 voc_size, int4 topic_num, int4 iter_num)
-This UDF learns the topics 
of words in a document and is the main step of a Gibbs sampling iteration. The 
model parameter (including the per-word topic counts and corpus-level topic 
counts) is passed to this function in the first call and then transfered to the 
rest calls through fcinfo-flinfo-fn_extra to allow the immediate 
update.  More...
-
-int8 []__lda_count_topic_sfunc
 (int8[] state, int4[] words, int4[] counts, int4[] topic_assignment, int4 
voc_size, int4 topic_num)
-This UDF is the sfunc for 
the aggregator computing the topic counts for each word and the topic count in 
the whole corpus. It scans the topic assignments in a document and updates the 
topic counts.  More...
-
-int8 []__lda_count_topic_prefunc
 (int8[] state1, int8[] state2)
-This UDF is the prefunc for 
the aggregator computing the per-word topic counts.  More...
-
-aggregate int8 []__lda_count_topic_agg
 (int4[], int4[], int4[], int4, int4)
-This uda computes the word 
topic counts by scanning and summing up topic assignments in each document.  More...
-
-float8lda_get_perplexity
 (text model_table, text 

[08/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__strs.html
--
diff --git a/docs/rc/group__grp__strs.html b/docs/rc/group__grp__strs.html
deleted file mode 100644
index a3a6e3b..000
--- a/docs/rc/group__grp__strs.html
+++ /dev/null
@@ -1,269 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: Stratified Sampling
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('group__grp__strs.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-Stratified SamplingSampling  
-
-
-Contents 
-
-Stratified Sampling 
-
-Examples 
-
-Stratified sampling is a method for independently sampling 
subpopulations (strata). It is commonly used to reduce sampling error by 
ensuring that subgroups are adequately represented in the sample.
-Stratified 
Sampling
-
-stratified_sample(  source_table,
-output_table,
-proportion,
-grouping_cols,
-target_cols,
-with_replacement
-  )
-Arguments 
-source_table 
-TEXT. Name of the table containing the input data.
-
-
-output_table 
-TEXT. Name of output table that contains the sampled 
data. The output table contains all columns present in the source table unless 
otherwise specified in the 'target_cols' parameter below.
-
-
-proportion 
-FLOAT8 in the range (0,1). Each stratum is sampled 
independently.
-
-
-grouping_cols (optional) 
-TEXT, default: NULL. A single column or a list of 
comma-separated columns that defines the strata. When this parameter is NULL, 
no grouping is used so the sampling is non-stratified, that is, the whole table 
is treated as a single group.
-
-
-target_cols (optional) 
-TEXT, default NULL. A comma-separated list of columns 
to appear in the 'output_table'. If NULL or '*', all columns from the 
'source_table' will appear in the 'output_table'.
-NoteDo not include 'grouping_cols' in the parameter 
'target_cols', because they are always included in the 'output_table'.
-
-with_replacement (optional) 
-BOOLEAN, default FALSE. Determines whether to sample with replacement or 
without replacement (default). With replacement means that it is possible that 
the same row may appear in the sample set more than once. Without replacement 
means a given row can be selected only once. 
-
-Examples
-Please note that due to the random nature of sampling, your results may 
look different from those below.
-
-Create an input table: 
-DROP TABLE IF EXISTS test;
-CREATE TABLE test(
-id1 INTEGER,
-id2 INTEGER,
-gr1 INTEGER,
-gr2 INTEGER
-);
-INSERT INTO test VALUES
-(1,0,1,1),
-(2,0,1,1),
-(3,0,1,1),
-(4,0,1,1),
-(5,0,1,1),
-(6,0,1,1),
-(7,0,1,1),
-(8,0,1,1),
-(9,0,1,1),
-(9,0,1,1),
-(9,0,1,1),
-(9,0,1,1),
-(0,1,1,2),
-(0,2,1,2),
-(0,3,1,2),
-(0,4,1,2),
-(0,5,1,2),
-(0,6,1,2),
-(10,10,2,2),
-(20,20,2,2),
-(30,30,2,2),
-(40,40,2,2),
-(50,50,2,2),
-(60,60,2,2),
-(70,70,2,2);
-
-Sample without replacement: 
-DROP TABLE IF EXISTS out;
-SELECT madlib.stratified_sample(
-'test',-- Source table
-'out', -- Output table
-0.5,   -- Sample proportion
-'gr1,gr2', -- Strata definition
-

[26/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__crf.html
--
diff --git a/docs/rc/group__grp__crf.html b/docs/rc/group__grp__crf.html
deleted file mode 100644
index 20fd7da..000
--- a/docs/rc/group__grp__crf.html
+++ /dev/null
@@ -1,632 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: Conditional Random Field
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('group__grp__crf.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-Conditional Random FieldSupervised Learning  
-
-
-Contents 
-
-Training Feature Generation 
-
-CRF Training Function 
-
-Testing Feature Generation 
-
-Inference using Viterbi 
-
-Using CRF 
-
-Examples 
-
-Technical Background 
-
-Literature 
-
-Related Topics 
-
-A conditional random field (CRF) is a type of discriminative, 
undirected probabilistic graphical model. A linear-chain CRF is a special type 
of CRF that assumes the current state depends only on the previous state.
-Feature extraction modules are provided for text-analysis tasks such as 
part-of-speech (POS) tagging and named-entity resolution (NER). Currently, six 
feature types are implemented:
-
-Edge Feature: transition feature that encodes the transition feature 
weight from current label to next label.
-Start Feature: fired when the current token is the first token in a 
sequence.
-End Feature: fired when the current token is the last token in a 
sequence.
-Word Feature: fired when the current token is observed in the trained 
dictionary.
-Unknown Feature: fired when the current token is not observed in the 
trained dictionary for at least a certain number of times (default 1).
-Regex Feature: fired when the current token can be matched by a regular 
expression.
-
-A Viterbi implementation is also provided to get the best label sequence 
and the conditional probability \( \Pr( \text{best label sequence} \mid 
\text{sequence}) \).
-Following steps are required for CRF Learning and Inference:
-Training Feature Generation
-CRF Training
-Testing Feature Generation
-Inference using Viterbi
-
-Training Feature GenerationThe function takes 
train_segment_tbl and regex_tbl as input and does 
feature generation generating three tables dictionary_tbl, 
train_feature_tbl and train_featureset_tbl, that are 
required as an input for CRF training. 
-crf_train_fgen(train_segment_tbl,
-   regex_tbl,
-   label_tbl,
-   dictionary_tbl,
-   train_feature_tbl,
-   train_featureset_tbl)
- Arguments 
-train_segment_tbl 
-TEXT. Name of the training segment table. The table is expected to have 
the following columns: 
-
-doc_id INTEGER. Document id column  
-
-start_pos INTEGER. Index of a particular term in the respective 
document  
-
-seg_text TEXT. Term at the respective start_pos in 
the document  
-
-label INTEGER. Label id for the term corresponding to the actual 
label from label_tbl   
-
-
-regex_tbl 
-TEXT. Name of the regular expression table. The table is expected to have 
the following columns: 
-
-pattern TEXT. Regular Expression  
-
-name TEXT. Regular Expression name  
-
-
-label_tbl 
-TEXT. Name of the table containing unique labels and their id's. The table 
is expected to have the following columns: 
-
-id INTEGER. Unique 

[29/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/group__grp__balance__sampling.html
--
diff --git a/docs/rc/group__grp__balance__sampling.html 
b/docs/rc/group__grp__balance__sampling.html
deleted file mode 100644
index 20a971d..000
--- a/docs/rc/group__grp__balance__sampling.html
+++ /dev/null
@@ -1,607 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: Balanced Sampling
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('group__grp__balance__sampling.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-Balanced SamplingSampling  
-
-
-Contents 
-
-Balanced Sampling 
-
-Examples 
-
-Literature 
-
-Related Topics 
-
-Some classification algorithms only perform optimally when the number 
of samples in each class is roughly the same. Highly skewed datasets are common 
in many domains (e.g., fraud detection), so resampling to offset this imbalance 
can produce a better decision boundary.
-This module offers a number of resampling techniques including 
undersampling majority classes, oversampling minority classes, and combinations 
of the two.
-Balanced 
Sampling
-
-balance_sample( source_table,
-output_table,
-class_col,
-class_sizes,
-output_table_size,
-grouping_cols,
-with_replacement,
-keep_null
-  )
-Arguments 
-source_table 
-TEXT. Name of the table containing the input data.
-
-
-output_table 
-TEXT. Name of output table that contains the sampled 
data. The output table contains all columns present in the source table, plus a 
new generated id called "__madlib_id__" added as the first column. 
-
-
-class_col 
-TEXT, Name of the column containing the class to be 
balanced. 
-
-
-class_sizes (optional) 
-VARCHAR, default ‘uniform’. Parameter to define the 
size of the different class values. (Class values are sometimes also called 
levels). Can be set to the following:
-
-
-‘uniform’: All class values will be resampled to have the same 
number of rows.  
-
-'undersample': Undersample such that all class values end up with the 
same number of observations as the minority class. Done without replacement by 
default unless the parameter ‘with_replacement’ is set to TRUE.  
-
-'oversample': Oversample with replacement such that all class values 
end up with the same number of observations as the majority class. Not affected 
by the parameter ‘with_replacement’ since oversampling is always done with 
replacement.  Short forms of the above will work too, e.g., 'uni' works the 
same as 'uniform'. 
-
-Alternatively, you can also explicitly set class size in a string 
containing a comma-delimited list. Order does not matter and all class values 
do not need to be specified. Use the format “class_value_1=x, 
class_value_2=y, …” where 'class_value' in the list must exist in the 
column 'class_col'. Set to an integer representing the desired number of 
observations. E.g., ‘red=3000, blue=4000’ means you want to resample the 
dataset to result in exactly 3000 red and 4000 blue rows in the 
‘output_table’.  
-NoteThe allowed names for class values 
follows object naming rules in PostgreSQL [1]. Quoted identifiers are allowed 
and should be enclosed 

[48/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/balance__sample_8sql__in.html
--
diff --git a/docs/rc/balance__sample_8sql__in.html 
b/docs/rc/balance__sample_8sql__in.html
deleted file mode 100644
index 1e22170..000
--- a/docs/rc/balance__sample_8sql__in.html
+++ /dev/null
@@ -1,497 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: balance_sample.sql_in File Reference
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('balance__sample_8sql__in.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-Functions  
-  
-balance_sample.sql_in File Reference  
-
-
-
-SQL functions for balanced data sets sampling.  
-More...
-
-
-Functions
-voidbalance_sample
 (text source_table, text output_table, text class_col, varchar class_sizes, 
integer output_table_size, text grouping_cols, boolean with_replacement, 
boolean keep_null)
-
-voidbalance_sample
 (text source_table, text output_table, text class_col, varchar class_sizes, 
integer output_table_size, text grouping_cols, boolean 
with_replacement)
-
-voidbalance_sample
 (text source_table, text output_table, text class_col, varchar class_sizes, 
integer output_table_size, text grouping_cols)
-
-voidbalance_sample
 (text source_table, text output_table, text class_col, varchar class_sizes, 
integer output_table_size)
-
-voidbalance_sample
 (text source_table, text output_table, text class_col, varchar 
class_sizes)
-
-voidbalance_sample
 (text source_table, text output_table, text class_col)
-
-varcharbalance_sample
 (varchar message)
-
-varcharbalance_sample
 ()
-
-
-Detailed 
Description
-Licensed to the Apache Software Foundation (ASF) 
under one or more contributor license agreements. See the NOTICE file 
distributed with this work for additional information regarding copyright 
ownership. The ASF licenses this file to you under the Apache License, Version 
2.0 (the "License"); you may not use this file except in compliance with the 
License. You may obtain a copy of the License at
-http://www.apache.org/licenses/LICENSE-2.0;>http://www.apache.org/licenses/LICENSE-2.0
-Unless required by applicable law or agreed to in writing, software 
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT 
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the 
License for the specific language governing permissions and limitations under 
the License.
-Date12/14/2017
-See alsoGiven a table, balanced sampling 
returns a sampled data set with specified proportions for each class (defaults 
to uniform sampling). 
-Function Documentation
-
-balance_sample()
 [1/8]
-
-
-
-  
-
-  void balance_sample 
-  (
-  text
-  source_table, 
-
-
-  
-  
-  text
-  output_table, 
-
-
-  
-  
-  text
-  class_col, 
-
-
-  
-  
-  varchar
-  class_sizes, 
-
-
-  
-  
-  integer
-  output_table_size, 
-
-
-  
-  
-  text
-  grouping_cols, 
-
-
-  
-  
-  boolean
-  with_replacement, 
-
-
-  

[49/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/array__ops_8sql__in.html
--
diff --git a/docs/rc/array__ops_8sql__in.html b/docs/rc/array__ops_8sql__in.html
deleted file mode 100644
index c6140a5..000
--- a/docs/rc/array__ops_8sql__in.html
+++ /dev/null
@@ -1,1275 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: array_ops.sql_in File Reference
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('array__ops_8sql__in.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-Functions  
-  
-array_ops.sql_in File Reference  
-
-
-
-implementation of array operations in SQL  
-More...
-
-
-Functions
-anyarrayarray_add 
(anyarray x, anyarray y)
-Adds two arrays. It 
requires that all the values are NON-NULL. Return type is the same as the input 
type.  More...
-
-aggregate anyarraysum 
(anyarray)
-Aggregate, element-wise sum 
of arrays. It requires that all the values are NON-NULL. Return type is the 
same as the input type.  More...
-
-anyarrayarray_sub 
(anyarray x, anyarray y)
-Subtracts two arrays. It 
requires that all the values are NON-NULL. Return type is the same as the input 
type.  More...
-
-anyarrayarray_mult
 (anyarray x, anyarray y)
-Element-wise product of two 
arrays. It requires that all the values are NON-NULL. Return type is the same 
as the input type.  More...
-
-anyarrayarray_div 
(anyarray x, anyarray y)
-Element-wise division of 
two arrays. It requires that all the values are NON-NULL. Return type is the 
same as the input type.  More...
-
-float8array_dot 
(anyarray x, anyarray y)
-Dot-product of two arrays. 
It requires that all the values are NON-NULL. Return type is the same as the 
input type.  More...
-
-boolarray_contains
 (anyarray x, anyarray y)
-Checks whether one array 
contains the other. This function returns TRUE if each non-zero element in the 
right array equals to the element with the same index in the left array.  More...
-
-anyelementarray_max 
(anyarray x)
-This function finds the 
maximum value in the array. NULLs are ignored. Return type is the same as the 
input type.  More...
-
-float8 []array_max_index
 (anyarray x)
-This function finds the 
maximum value and corresponding index in the array. NULLs are ignored. Return 
type is the same as the input type.  More...
-
-anyelementarray_min 
(anyarray x)
-This function finds the 
minimum value in the array. NULLs are ignored. Return type is the same as the 
input type.  More...
-
-float8 []array_min_index
 (anyarray x)
-This function finds the 
minimum value and corresponding index in the array. NULLs are ignored. Return 
type is the same as the input type.  More...
-
-anyelementarray_sum 
(anyarray x)
-This function finds the sum 
of the values in the array. NULLs are ignored. Return type is the same as the 
input type.  More...
-
-float8array_sum_big
 (anyarray x)
-This function finds the sum 
of the values in the array. NULLs are ignored. Return type is always FLOAT8 
regardless of input. This function is meant to replace array_sum() in the cases when sum may overflow the 
element type.  More...
-
-anyelementarray_abs_sum
 (anyarray x)
-This function finds the sum 
of abs of the values in the array. NULLs are ignored. Return type is the same 
as the input type.  More...
-
-anyarrayarray_abs 

[43/51] [partial] madlib-site git commit: Move v1.15 RC1 to latest released

2018-08-10 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/573d66d8/docs/rc/create__indicators_8sql__in.html
--
diff --git a/docs/rc/create__indicators_8sql__in.html 
b/docs/rc/create__indicators_8sql__in.html
deleted file mode 100644
index 51316e0..000
--- a/docs/rc/create__indicators_8sql__in.html
+++ /dev/null
@@ -1,340 +0,0 @@
-
-http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
-http://www.w3.org/1999/xhtml;>
-
-
-
-
-
-MADlib: create_indicators.sql_in File Reference
-
-
-
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(initResizable);
-/* @license-end */
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-  $(document).ready(function() { init_search(); });
-/* @license-end */
-
-
-  MathJax.Hub.Config({
-extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
-jax: ["input/TeX","output/HTML-CSS"],
-});
-https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
-
-
-
-
-
-
-  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
-  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
-  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
-  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
-  ga('create', 'UA-45382226-1', 'madlib.apache.org');
-  ga('send', 'pageview');
-
-
-
-
-
-
- 
- 
-  http://madlib.apache.org;>
-  
-   
-   1.15
-   
-   User Documentation for Apache MADlib
-  
-   
-
-  
-  
-  
-
-  
-
-
- 
- 
-
-
-
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-var searchBox = new SearchBox("searchBox", "search",false,'Search');
-/* @license-end */
-
-
-
-  
-
-  
-
-  
-  
-  
-
-
-/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
-$(document).ready(function(){initNavTree('create__indicators_8sql__in.html','');});
-/* @license-end */
-
-
-
-
-
-
-
-
-
-
-
-
-
-  
-Functions  
-  
-create_indicators.sql_in File Reference  
-
-
-
-SQL functions for dummy coding categorical variables.  
-More...
-
-
-Functions
-voidcreate_indicator_variables
 (text source_table, text out_table, text categorical_cols, boolean keep_null, 
text distributed_by)
-Create new table containing 
dummy coded variables for categorical variables.  More...
-
-voidcreate_indicator_variables
 (text source_table, text out_table, text categorical_cols, boolean 
keep_null)
-Create new table containing 
dummy coded variables for categorical variables.  More...
-
-voidcreate_indicator_variables
 (text source_table, text out_table, text categorical_cols)
-
-varcharcreate_indicator_variables
 (varchar message)
-
-varcharcreate_indicator_variables
 ()
-
-
-Detailed 
Description
-Licensed to the Apache Software Foundation (ASF) 
under one or more contributor license agreements. See the NOTICE file 
distributed with this work for additional information regarding copyright 
ownership. The ASF licenses this file to you under the Apache License, Version 
2.0 (the "License"); you may not use this file except in compliance with the 
License. You may obtain a copy of the License at
-http://www.apache.org/licenses/LICENSE-2.0;>http://www.apache.org/licenses/LICENSE-2.0
-Unless required by applicable law or agreed to in writing, software 
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT 
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the 
License for the specific language governing permissions and limitations under 
the License.
-DateJune 2014
-See alsoCalculates dummy-coded indicator 
variables for categorical variables 
-Function Documentation
-
-create_indicator_variables()
 [1/5]
-
-
-
-  
-
-  void create_indicator_variables 
-  (
-  text
-  source_table, 
-
-
-  
-  
-  text
-  out_table, 
-
-
-  
-  
-  text
-  categorical_cols, 
-
-
-  
-  
-  boolean
-  keep_null, 
-
-
-  
-  
-  text
-  distributed_by
-
-
-  
-  )
-  
-
-  
-
-Parameters
-  
-source_tableName of table containing 
categorical variable 
-out_tableName of table to output dummy 
variables 
-categorical_colsComma-separated list of 
column names to dummy code 
-keep_nullBoolean to determine the 
behavior for rows with NULL value 
-distributed_byComma-separated list of 
column names to use for distribution of output
-  
-  
-
-ReturnsVoid 
-
-
-
-

[40/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/dense__linear__systems_8sql__in.html
--
diff --git a/docs/rc/dense__linear__systems_8sql__in.html 
b/docs/rc/dense__linear__systems_8sql__in.html
new file mode 100644
index 000..4b9a16f
--- /dev/null
+++ b/docs/rc/dense__linear__systems_8sql__in.html
@@ -0,0 +1,647 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: dense_linear_systems.sql_in File Reference
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('dense__linear__systems_8sql__in.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Functions  
+  
+dense_linear_systems.sql_in File Reference  
+
+
+
+SQL functions for linear systems.  
+More...
+
+
+Functions
+bytea8dense_residual_norm_transition
 (bytea8 state, float8[] a, float8 b, float8[] x)
+
+bytea8dense_residual_norm_merge_states
 (bytea8 state1, bytea8 state2)
+
+residual_norm_resultdense_residual_norm_final
 (bytea8 state)
+
+aggregate residual_norm_resultdense_residual_norm
 (float8[] left_hand_side, float8 right_hand_side, float8[] solution)
+Compute the residual after 
solving the dense linear systems.  More...
+
+float8 []dense_direct_linear_system_transition
 (float8[] state, integer row_id, float8[] a, float8 b, integer num_rows, 
integer algorithm)
+
+float8 []dense_direct_linear_system_merge_states
 (float8[] state1, float8[] state2)
+
+dense_linear_solver_resultdense_direct_linear_system_final
 (float8[] state)
+
+aggregate dense_linear_solver_resultdense_direct_linear_system
 (integer row_id, float8[] left_hand_side, float8 right_hand_side, integer 
numEquations, integer algorithm)
+Solve a system of linear 
equations using the direct method.  More...
+
+varcharlinear_solver_dense
 (varchar input_string)
+Help function, to print out 
the supported families.  More...
+
+varcharlinear_solver_dense
 ()
+
+voidlinear_solver_dense
 (varchar source_table, varchar out_table, varchar row_id, varchar 
left_hand_side, varchar right_hand_side, varchar grouping_cols, varchar 
optimizer, varchar optimizer_options)
+A wrapper function for the 
various marginal linear_systemsion analyzes.  More...
+
+voidlinear_solver_dense
 (varchar source_table, varchar out_table, varchar row_id, varchar 
left_hand_side, varchar right_hand_side)
+Marginal effects with 
default variables.  More...
+
+
+Detailed 
Description
+DateJuly 
2013
+See alsoComputes the solution of a 
consistent linear system, for more details see the module description at Dense Linear 
Systems 
+Function Documentation
+
+dense_direct_linear_system()
+
+
+
+  
+
+  aggregate dense_linear_solver_result 
dense_direct_linear_system 
+  (
+  integer
+  row_id, 
+
+
+  
+  
+  float8 []
+  left_hand_side, 
+
+
+  
+  
+  float8
+  right_hand_side, 
+
+
+  
+  
+  integer
+  numEquations, 
+
+
+  
+  
+  integer
+  algorithm
+
+
+  
+  )
+  
+
+  
+
+Parameters
+  
+row_idColumn containing the row_id 

+left_hand_sideColumn containing the 
left hand side of the system 
+

[45/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/clustered__variance__coxph_8sql__in.html
--
diff --git a/docs/rc/clustered__variance__coxph_8sql__in.html 
b/docs/rc/clustered__variance__coxph_8sql__in.html
new file mode 100644
index 000..46fe8d4
--- /dev/null
+++ b/docs/rc/clustered__variance__coxph_8sql__in.html
@@ -0,0 +1,496 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: clustered_variance_coxph.sql_in File Reference
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('clustered__variance__coxph_8sql__in.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Functions  
+  
+clustered_variance_coxph.sql_in File Reference  
+
+
+
+SQL functions for clustered robust cox proportional hazards regression.  
+More...
+
+
+Functions
+varcharclustered_variance_coxph
 ()
+
+varcharclustered_variance_coxph
 (varchar message)
+
+voidclustered_variance_coxph
 (text model_table, text output_table, text clustervar)
+
+float8 []coxph_a_b_transition
 (float8[], integer, boolean, float8[], float8)
+
+float8 []coxph_a_b_merge
 (float8[], float8[])
+
+__coxph_a_b_resultcoxph_a_b_final
 (float8[])
+
+aggregate __coxph_a_b_resultcoxph_a_b
 (integer, boolean, float8[], float8)
+
+float8 []coxph_compute_w
 (float8[] x, boolean status, float8[] coef, float8[] h, float8 s, float8 a, 
float8[] b)
+
+__coxph_cl_var_resultcoxph_compute_clustered_stats
 (float8[] coef, float8[] hessian, float8[] a)
+
+voidrobust_variance_coxph
 (varchar model_table, varchar output_table, varchar clustervar)
+
+
+Detailed 
Description
+DateOct 
2013
+See alsoFor a brief introduction to 
clustered robust cox regression, see the module description Clustered Variance 
+Function Documentation
+
+clustered_variance_coxph()
 [1/3]
+
+
+
+  
+
+  varchar clustered_variance_coxph 
+  (
+  )
+  
+
+  
+
+
+
+
+
+clustered_variance_coxph()
 [2/3]
+
+
+
+  
+
+  varchar clustered_variance_coxph 
+  (
+  varchar
+  message)
+  
+
+  
+
+
+
+
+
+clustered_variance_coxph()
 [3/3]
+
+
+
+  
+
+  void clustered_variance_coxph 
+  (
+  text
+  model_table, 
+
+
+  
+  
+  text
+  output_table, 
+
+
+  
+  
+  text
+  clustervar
+
+
+  
+  )
+  
+
+  
+
+
+
+
+
+coxph_a_b()
+
+
+
+  
+
+  aggregate __coxph_a_b_result coxph_a_b 
+  (
+  integer
+  , 
+
+
+  
+  
+  boolean
+  , 
+
+
+  
+  
+  float8
+  [], 
+
+
+  
+  
+  float8
+  
+
+
+  
+  )
+  
+
+  
+
+
+
+
+
+coxph_a_b_final()
+
+
+
+  
+
+  __coxph_a_b_result coxph_a_b_final 
+  (
+  float8
+  [])
+  
+
+  
+
+
+
+
+
+coxph_a_b_merge()
+
+
+
+  
+
+  float8 [] coxph_a_b_merge 
+  (
+  float8
+  [], 
+
+
+  
+  
+   

[48/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/balance__sample_8sql__in.html
--
diff --git a/docs/rc/balance__sample_8sql__in.html 
b/docs/rc/balance__sample_8sql__in.html
new file mode 100644
index 000..1e22170
--- /dev/null
+++ b/docs/rc/balance__sample_8sql__in.html
@@ -0,0 +1,497 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: balance_sample.sql_in File Reference
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('balance__sample_8sql__in.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Functions  
+  
+balance_sample.sql_in File Reference  
+
+
+
+SQL functions for balanced data sets sampling.  
+More...
+
+
+Functions
+voidbalance_sample
 (text source_table, text output_table, text class_col, varchar class_sizes, 
integer output_table_size, text grouping_cols, boolean with_replacement, 
boolean keep_null)
+
+voidbalance_sample
 (text source_table, text output_table, text class_col, varchar class_sizes, 
integer output_table_size, text grouping_cols, boolean 
with_replacement)
+
+voidbalance_sample
 (text source_table, text output_table, text class_col, varchar class_sizes, 
integer output_table_size, text grouping_cols)
+
+voidbalance_sample
 (text source_table, text output_table, text class_col, varchar class_sizes, 
integer output_table_size)
+
+voidbalance_sample
 (text source_table, text output_table, text class_col, varchar 
class_sizes)
+
+voidbalance_sample
 (text source_table, text output_table, text class_col)
+
+varcharbalance_sample
 (varchar message)
+
+varcharbalance_sample
 ()
+
+
+Detailed 
Description
+Licensed to the Apache Software Foundation (ASF) 
under one or more contributor license agreements. See the NOTICE file 
distributed with this work for additional information regarding copyright 
ownership. The ASF licenses this file to you under the Apache License, Version 
2.0 (the "License"); you may not use this file except in compliance with the 
License. You may obtain a copy of the License at
+http://www.apache.org/licenses/LICENSE-2.0;>http://www.apache.org/licenses/LICENSE-2.0
+Unless required by applicable law or agreed to in writing, software 
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT 
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the 
License for the specific language governing permissions and limitations under 
the License.
+Date12/14/2017
+See alsoGiven a table, balanced sampling 
returns a sampled data set with specified proportions for each class (defaults 
to uniform sampling). 
+Function Documentation
+
+balance_sample()
 [1/8]
+
+
+
+  
+
+  void balance_sample 
+  (
+  text
+  source_table, 
+
+
+  
+  
+  text
+  output_table, 
+
+
+  
+  
+  text
+  class_col, 
+
+
+  
+  
+  varchar
+  class_sizes, 
+
+
+  
+  
+  integer
+  output_table_size, 
+
+
+  
+  
+  text
+  grouping_cols, 
+
+
+  
+  
+  boolean
+  with_replacement, 
+
+
+  
+   

[27/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__correlation.html
--
diff --git a/docs/rc/group__grp__correlation.html 
b/docs/rc/group__grp__correlation.html
new file mode 100644
index 000..742985a
--- /dev/null
+++ b/docs/rc/group__grp__correlation.html
@@ -0,0 +1,397 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: Covariance and Correlation
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('group__grp__correlation.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Covariance and CorrelationStatistics  Descriptive Statistics  

+
+
+Contents 
+
+Covariance and Correlation Functions 
+
+Examples 
+
+Literature 
+
+Related Topics 
+
+A correlation function is the degree and direction of association of 
two variableshow well one random variable can be predicted from the 
other. It is a normalized version of covariance. The Pearson correlation 
coefficient is used here, which has a value between -1 and 1, where 1 implies 
total positive linear correlation, 0 means no linear correlation, and -1 means 
total negative linear correlation.
+This function generates an \(N\)x \(N\) cross correlation matrix for pairs 
of numeric columns in a source_table. It is square symmetrical with 
the \( (i,j) \)th element equal to the correlation coefficient between the 
\(i\)th and the \(j\)th variable. The diagonal elements (correlations of 
variables with themselves) are always equal to 1.0.
+We also provide a covariance function which is similar in nature to 
correlation, and is a measure of the joint variability of two random 
variables.
+Covariance and Correlation Functions
+The correlation function has the following syntax: 
+correlation( source_table,
+ output_table,
+ target_cols,
+ verbose,
+ grouping_cols
+   )
+The covariance function has a similar syntax: 
+covariance( source_table,
+output_table,
+target_cols,
+verbose,
+grouping_cols
+  )
+
+source_table 
+TEXT. Name of the table containing the input data.
+
+
+output_table 
+TEXT. Name of the table containing the cross 
correlation matrix. The output table has N rows, where N is the number of 
'target_cols' in the 'source_table' for which correlation or 
covariance is being computed. It has the following columns: 
+
+column_position An automatically generated sequential counter 
indicating the order of the variable in the 'output_table'.  
+
+variable Contains the row header for the variables of interest.  

+
+grouping_cols Contains the grouping columns, if any.  
+
+... The remainder of the table is the NxN correlation 
matrix for the pairs of variables of interest.  
+
+The output table is arranged as a lower-triangular matrix with the upper 
triangle set to NULL and the diagonal elements set to 1.0. To obtain the result 
from the 'output_table' order by 'column_position': 
+SELECT * FROM output_table ORDER BY column_position;
+In addition to output table, a summary table named 
output_table_summary is also created, which has the following columns: 

+
+method'Correlation' or 'Covariance' 
+
+source_tableVARCHAR. Data source table name. 
+
+output_tableVARCHAR. Output table name. 
+
+column_namesVARCHAR. Column names 

[11/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__random__forest.html
--
diff --git a/docs/rc/group__grp__random__forest.html 
b/docs/rc/group__grp__random__forest.html
new file mode 100644
index 000..42255a6
--- /dev/null
+++ b/docs/rc/group__grp__random__forest.html
@@ -0,0 +1,1157 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: Random Forest
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('group__grp__random__forest.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Random ForestSupervised Learning  Tree Methods  
+
+
+Contents
+
+Training Function 
+
+Prediction Function 
+
+Tree Display 
+
+Importance Display 
+
+Examples 
+
+Literature 
+
+Related Topics 
+
+Random forest builds an ensemble of classifiers, each of which is a 
tree model constructed using bootstrapped samples from the input data. The 
results of these models are then combined to yield a single prediction, which, 
at the expense of some loss in interpretation, can be highly accurate. Refer to 
Breiman et al. [1][2][3] for details on the implementation used here.
+Also refer to the decision tree 
user documentation since many parameters and examples are similar to random 
forest.
+Training 
FunctionRandom forest training function has the following format: 
+forest_train(training_table_name,
+ output_table_name,
+ id_col_name,
+ dependent_variable,
+ list_of_features,
+ list_of_features_to_exclude,
+ grouping_cols,
+ num_trees,
+ num_random_features,
+ importance,
+ num_permutations,
+ max_tree_depth,
+ min_split,
+ min_bucket,
+ num_splits,
+ null_handling_params,
+ verbose,
+ sample_ratio
+ )
+
+Arguments 
+training_table_name 
+text. Name of the table containing the training 
data.
+
+
+output_table_name 
+TEXT. Name of the generated table containing the model. 
If a table with the same name already exists, an error will be returned. A 
summary table named output_table_name_summary and a grouping 
table named output_table_name_group are also created. These 
are described later on this page. 
+
+
+id_col_name 
+TEXT. Name of the column containing id information in 
the training data. This is a mandatory argument and is used for prediction and 
other purposes. The values are expected to be unique for each row.
+
+
+dependent_variable 
+TEXT. Name of the column that contains the output 
(response) for training. Boolean, integer and text types are considered to be 
classification outputs, while double precision values are considered to be 
regression outputs. The response variable for a classification tree can be 
multinomial, but the time and space complexity of the training function 
increases linearly as the number of response classes increases.
+
+
+list_of_features 
+TEXT. Comma-separated string of column names or 
expressions to use as predictors. Can also be a '*' implying all columns are to 
be used as predictors (except for the ones included in the next argument that 
lists exclusions). The types of the features can be mixed: boolean, integer, 
and text columns are considered 

[17/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__marginal.html
--
diff --git a/docs/rc/group__grp__marginal.html 
b/docs/rc/group__grp__marginal.html
new file mode 100644
index 000..d88997f
--- /dev/null
+++ b/docs/rc/group__grp__marginal.html
@@ -0,0 +1,440 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: Marginal Effects
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('group__grp__marginal.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Marginal EffectsSupervised Learning  Regression Models  
+
+
+Contents 
+
+Marginal Effects with Interaction Terms 
+
+Examples 
+
+Notes 
+
+Technical Background 
+
+Literature 
+
+Related Topics 
+
+A marginal effect (ME) or partial effect measures the effect on the 
conditional mean of \( y \) for a change in one of the regressors, say \(X_k\). 
In the linear regression model, the ME equals the relevant slope coefficient, 
greatly simplifying analysis. For nonlinear models, specialized algorithms are 
required for calculating ME. The marginal effect computed is the average of the 
marginal effect at every data point present in the source table.
+MADlib provides marginal effects regression functions for linear, logistic 
and multinomial logistic regressions.
+WarningThe margins_logregr()
 and margins_mlogregr()
 functions have been deprecated in favor of the margins() function.
+Marginal Effects with Interaction Terms
+margins( model_table,
+ output_table,
+ x_design,
+ source_table,
+ marginal_vars
+   )
+ Arguments 
+model_table 
+VARCHAR. The name of the model table, which is the output of logregr_train() or mlogregr_train(). 
+output_table 
+VARCHAR. The name of the result table. The output table has the following 
columns. 
+
+variables INTEGER[]. The indices of the basis variables.  

+
+margins DOUBLE PRECISION[]. The marginal effects.  
+
+std_err DOUBLE PRECISION[]. An array of the standard errors, 
computed using the delta method.  
+
+z_stats DOUBLE PRECISION[]. An array of the z-stats of the 
marginal effects.  
+
+p_values DOUBLE PRECISION[]. An array of the Wald p-values of the 
marginal effects.  
+
+
+x_design (optional) 
+VARCHAR, default: NULL. The design of independent 
variables, necessary only if interaction term or indicator (categorical) terms 
are present. This parameter is necessary since the independent variables in the 
underlying regression is not parsed to extract the relationship between 
variables.
+Example: The independent_varname in the regression method can be 
specified in either of the following ways:
+ ‘array[1, color_blue, color_green, gender_female, gpa, gpa^2, 
gender_female*gpa, gender_female*gpa^2, weight]’ 
+ ‘x’ 
+
+In the second version, the column x is an array containing data 
identical to that expressed in the first version, computed in a prior data 
preparation step. Supply an x_design argument to the margins() function in the following 
way:
+ ‘1, i.color_blue.color, i.color_green.color, i.gender_female, 
gpa, gpa^2, gender_female*gpa, gender_female*gpa^2, weight’
+
+The variable names ('gpa', 'weight', ...), referred to here as 
identifiers, should be unique for each basis variable and need not be 
the same as the original variable name 

[06/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__text__utilities.html
--
diff --git a/docs/rc/group__grp__text__utilities.html 
b/docs/rc/group__grp__text__utilities.html
new file mode 100644
index 000..c4326a0
--- /dev/null
+++ b/docs/rc/group__grp__text__utilities.html
@@ -0,0 +1,368 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: Term Frequency
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('group__grp__text__utilities.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Term FrequencyUtilities  
+
+
+Contents 
+
+Function Syntax 
+
+Examples 
+
+Related Topics 
+
+Term frequency computes the number of times that a word or term 
occurs in a document. Term frequency is often used as part of a larger text 
processing pipeline, which may include operations such as stemming, stop word 
removal and topic modelling.
+Function Syntax
+
+term_frequency(input_table,
+   doc_id_col,
+   word_col,
+   output_table,
+   compute_vocab)
+Arguments: 
+input_table 
+TEXT. The name of the table containing the documents, 
with one document per row. Each row is in the form doc_id, word_vector 
where doc_id is an id unique to each document, and 
word_vector is a text array containing the words in the document. 
The word_vector should contain multiple entries of a word if the 
document contains multiple occurrence of that word. 
+
+
+doc_id_col 
+TEXT. The name of the column containing the document 
id. 
+
+
+word_col 
+TEXT. The name of the column containing the vector of 
words/terms in the document. This column should be of type that can be cast to 
TEXT[].
+
+
+output_table 
+TEXT. The name of the table to store the term frequency 
output. The output table contains the following columns:
+doc_id_col: This the document id column (name will be same as 
the one provided as input).
+word: Word/term present in a document. Depending on the value 
of compute_vocab below, this is either the original word as it 
appears in word_col, or an id representing the word. Note that 
word id's start from 0 not 1.
+count: The number of times this word is found in the 
document. 
+
+
+
+compute_vocab 
+BOOLEAN. (Optional, Default=FALSE) Flag to indicate if a vocabulary table 
is to be created. If TRUE, an additional output table is created containing the 
vocabulary of all words, with an id assigned to each word in alphabetical 
order. The table is called output_table_vocabulary (i.e., suffix added 
to the output_table name) and contains the following columns:
+wordid: An id for each word in alphabetical order.
+word: The word/term corresponding to the id.  
+
+
+
+Examples
+
+First we create a document table with one document per row: 
+DROP TABLE IF EXISTS documents;
+CREATE TABLE documents(docid INT4, contents TEXT);
+INSERT INTO documents VALUES
+(0, 'I like to eat broccoli and bananas. I ate a banana and spinach smoothie 
for breakfast.'),
+(1, 'Chinchillas and kittens are cute.'),
+(2, 'My sister adopted two kittens yesterday.'),
+(3, 'Look at this cute hamster munching on a piece of broccoli.');
+ You can apply stemming, stop word removal and tokenization at this 
point in order to prepare the documents for text processing. Depending 

[03/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/jquery.js
--
diff --git a/docs/rc/jquery.js b/docs/rc/jquery.js
new file mode 100644
index 000..2771c74
--- /dev/null
+++ b/docs/rc/jquery.js
@@ -0,0 +1,115 @@
+/*
+ @licstart  The following is the entire license notice for the
+ JavaScript code in this file.
+
+ Copyright (C) 1997-2017 by Dimitri van Heesch
+
+ Permission is hereby granted, free of charge, to any person obtaining
+ a copy of this software and associated documentation files (the
+ "Software"), to deal in the Software without restriction, including
+ without limitation the rights to use, copy, modify, merge, publish,
+ distribute, sublicense, and/or sell copies of the Software, and to
+ permit persons to whom the Software is furnished to do so, subject to
+ the following conditions:
+
+ The above copyright notice and this permission notice shall be included
+ in all copies or substantial portions of the Software.
+
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+ IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
+ @licend  The above is the entire license notice
+ for the JavaScript code in this file
+ */
+/*!
+ * jQuery JavaScript Library v1.7.1
+ * http://jquery.com/
+ *
+ * Copyright 2011, John Resig
+ * Dual licensed under the MIT or GPL Version 2 licenses.
+ * http://jquery.org/license
+ *
+ * Includes Sizzle.js
+ * http://sizzlejs.com/
+ * Copyright 2011, The Dojo Foundation
+ * Released under the MIT, BSD, and GPL Licenses.
+ *
+ * Date: Mon Nov 21 21:11:03 2011 -0500
+ */
+(function(bb,L){var av=bb.document,bu=bb.navigator,bl=bb.location;var 
b=(function(){var bF=function(b0,b1){return new 
bF.fn.init(b0,b1,bD)},bU=bb.jQuery,bH=bb.$,bD,bY=/^(?:[^#<]*(<[\w\W]+>)[^>]*$|#([\w\-]*)$)/,bM=/\S/,bI=/^\s+/,bE=/\s+$/,bA=/^<(\w+)\s*\/?>(?:<\/\1>)?$/,bN=/^[\],:{}\s]*$/,bW=/\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4})/g,bP=/"[^"\\\n\r]*"|true|false|null|-?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/g,bJ=/(?:^|:|,)(?:\s*\[)+/g,by=/(webkit)[
 \/]([\w.]+)/,bR=/(opera)(?:.*version)?[ \/]([\w.]+)/,bQ=/(msie) 
([\w.]+)/,bS=/(mozilla)(?:.*? 
rv:([\w.]+))?/,bB=/-([a-z]|[0-9])/ig,bZ=/^-ms-/,bT=function(b0,b1){return(b1+"").toUpperCase()},bX=bu.userAgent,bV,bC,e,bL=Object.prototype.toString,bG=Object.prototype.hasOwnProperty,bz=Array.prototype.push,bK=Array.prototype.slice,bO=String.prototype.trim,bv=Array.prototype.indexOf,bx={};bF.fn=bF.prototype={constructor:bF,init:function(b0,b4,b3){var
 b2,b5,b1,b6;if(!b0){return 
this}if(b0.nodeType){this.context=this[0]=b0;this.length=1;return 
this}if(b0==="bo
 
dy"&&!b4&){this.context=av;this[0]=av.body;this.selector=b0;this.length=1;return
 this}if(typeof 
b0==="string"){if(b0.charAt(0)==="<"&(b0.length-1)===">"&>=3){b2=[null,b0,null]}else{b2=bY.exec(b0)}if(b2&&(b2[1]||!b4)){if(b2[1]){b4=b4
 instanceof 
bF?b4[0]:b4;b6=(b4?b4.ownerDocument||b4:av);b1=bA.exec(b0);if(b1){if(bF.isPlainObject(b4)){b0=[av.createElement(b1[1])];bF.fn.attr.call(b0,b4,true)}else{b0=[b6.createElement(b1[1])]}}else{b1=bF.buildFragment([b2[1]],[b6]);b0=(b1.cacheable?bF.clone(b1.fragment):b1.fragment).childNodes}return
 
bF.merge(this,b0)}else{b5=av.getElementById(b2[2]);if(b5&){if(b5.id!==b2[2]){return
 b3.find(b0)}this.length=1;this[0]=b5}this.context=av;this.selector=b0;return 
this}}else{if(!b4||b4.jquery){return(b4||b3).find(b0)}else{return 
this.constructor(b4).find(b0)}}}else{if(bF.isFunction(b0)){return 
b3.ready(b0)}}if(b0.selector!==L){this.selector=b0.selector;this.context=b0.context}return
 bF.makeArray(b0,this)},selector:"",
 jquery:"1.7.1",length:0,size:function(){return 
this.length},toArray:function(){return bK.call(this,0)},get:function(b0){return 
b0==null?this.toArray():(b0<0?this[this.length+b0]:this[b0])},pushStack:function(b1,b3,b0){var
 
b2=this.constructor();if(bF.isArray(b1)){bz.apply(b2,b1)}else{bF.merge(b2,b1)}b2.prevObject=this;b2.context=this.context;if(b3==="find"){b2.selector=this.selector+(this.selector?"
 ":"")+b0}else{if(b3){b2.selector=this.selector+"."+b3+"("+b0+")"}}return 
b2},each:function(b1,b0){return 
bF.each(this,b1,b0)},ready:function(b0){bF.bindReady();bC.add(b0);return 
this},eq:function(b0){b0=+b0;return 
b0===-1?this.slice(b0):this.slice(b0,b0+1)},first:function(){return 
this.eq(0)},last:function(){return this.eq(-1)},slice:function(){return 
this.pushStack(bK.apply(this,arguments),"slice",bK.call(arguments).join(","))},map:function(b0){return
 this.pushStack(bF.map(this,function(b2,b1){return 
b0.call(b2,b1,b2)}))},end:function(){return 
this.prevObject||this.constructor(null)},pus
 

[21/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__graph__measures.html
--
diff --git a/docs/rc/group__grp__graph__measures.html 
b/docs/rc/group__grp__graph__measures.html
new file mode 100644
index 000..9339d92
--- /dev/null
+++ b/docs/rc/group__grp__graph__measures.html
@@ -0,0 +1,155 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: Measures
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('group__grp__graph__measures.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Modules  
+  
+MeasuresGraph  
+
+
+Detailed 
Description
+A collection of metrics computed on a graph. 
+
+
+Modules
+Average Path 
Length
+Computes the average 
shortest-path length of a graph. 
+
+Closeness
+Computes the closeness 
centrality value of each node in the graph. 
+
+Graph 
Diameter
+Computes the diameter of a 
graph. 
+
+In-Out Degree
+Computes the degrees for 
each vertex. 
+
+
+
+
+
+
+  
+Generated on Mon Aug 6 2018 21:55:39 for MADlib by
+http://www.doxygen.org/index.html;>
+ 1.8.14 
+  
+
+
+

http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__graph__measures.js
--
diff --git a/docs/rc/group__grp__graph__measures.js 
b/docs/rc/group__grp__graph__measures.js
new file mode 100644
index 000..6272fba
--- /dev/null
+++ b/docs/rc/group__grp__graph__measures.js
@@ -0,0 +1,7 @@
+var group__grp__graph__measures =
+[
+[ "Average Path Length", "group__grp__graph__avg__path__length.html", null 
],
+[ "Closeness", "group__grp__graph__closeness.html", null ],
+[ "Graph Diameter", "group__grp__graph__diameter.html", null ],
+[ "In-Out Degree", "group__grp__graph__vertex__degrees.html", null ]
+];
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__graph__vertex__degrees.html
--
diff --git a/docs/rc/group__grp__graph__vertex__degrees.html 
b/docs/rc/group__grp__graph__vertex__degrees.html
new file mode 100644
index 000..9d8a2f5
--- /dev/null
+++ b/docs/rc/group__grp__graph__vertex__degrees.html
@@ -0,0 +1,273 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: In-Out Degree
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  

[18/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__linreg.html
--
diff --git a/docs/rc/group__grp__linreg.html b/docs/rc/group__grp__linreg.html
new file mode 100644
index 000..9e73ca3
--- /dev/null
+++ b/docs/rc/group__grp__linreg.html
@@ -0,0 +1,479 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: Linear Regression
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('group__grp__linreg.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Linear RegressionSupervised Learning  Regression Models  
+
+
+Contents 
+
+Training Function 
+
+Prediction Function 
+
+Examples 
+
+Technical Background 
+
+Literature 
+
+Related Topics 
+
+Linear regression models a linear relationship of a scalar dependent 
variable \( y \) to one or more explanatory independent variables \( x \) and 
builds a model of coefficients.
+Training 
Function
+The linear regression training function has the following syntax. 
+linregr_train( source_table,
+   out_table,
+   dependent_varname,
+   independent_varname,
+   grouping_cols,
+   heteroskedasticity_option
+ )
+Arguments 
+source_table 
+TEXT. Name of the table containing the training 
data.
+
+
+out_table 
+TEXT. Name of the generated table containing the output 
model.
+The output table contains the following columns: 
+
+... Any grouping columns provided during training. 
Present only if the grouping option is used.  
+
+coef FLOAT8[]. Vector of the coefficients of the regression.  

+
+r2 FLOAT8. R-squared coefficient of determination of the model.  

+
+std_err FLOAT8[]. Vector of the standard error of the 
coefficients.  
+
+t_stats FLOAT8[]. Vector of the t-statistics of the coefficients. 
 
+
+p_values FLOAT8[]. Vector of the p-values of the coefficients.  

+
+condition_no FLOAT8 array. The condition number of the \(X^{*}X\) 
matrix. A high condition number is usually an indication that there may be some 
numeric instability in the result yielding a less reliable model. A high 
condition number often results when there is a significant amount of 
colinearity in the underlying design matrix, in which case other regression 
techniques, such as elastic net regression, may be more appropriate.  
+
+bp_stats FLOAT8. The Breush-Pagan statistic of heteroskedacity. 
Present only if the heteroskedacity argument was set to True when the model was 
trained.  
+
+bp_p_value FLOAT8. The Breush-Pagan calculated p-value. Present 
only if the heteroskedacity parameter was set to True when the model was 
trained.  
+
+num_rows_processed INTEGER. The number of rows that are actually 
used in each group.  
+
+num_missing_rows_skipped INTEGER. The number of rows that have 
NULL values in the dependent and independent variables, and were skipped in the 
computation for each group. 
+
+variance_covariance FLOAT[]. Variance/covariance matrix. 

+
+A summary table named out_table_summary is created 
together with the output table. It has the following columns: 
+
+method 'linregr' for linear regression.  
+
+source_table The data source table name 
+
+out_table The output table name 
+
+dependent_varname The dependent variable 
+
+independent_varname The independent variables 
+
+num_rows_processed 

[44/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/cox__prop__hazards_8sql__in.html
--
diff --git a/docs/rc/cox__prop__hazards_8sql__in.html 
b/docs/rc/cox__prop__hazards_8sql__in.html
new file mode 100644
index 000..aff8dbb
--- /dev/null
+++ b/docs/rc/cox__prop__hazards_8sql__in.html
@@ -0,0 +1,2150 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: cox_prop_hazards.sql_in File Reference
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('cox__prop__hazards_8sql__in.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Functions  
+  
+cox_prop_hazards.sql_in File Reference  
+
+
+
+SQL functions for cox proportional hazards.  
+More...
+
+
+Functions
+float8 []array_avg_transition
 (float8[] state, float8[] x, boolean use_abs)
+
+float8 []array_avg_merge
 (float8[] left, float8[] right)
+
+float8 []array_avg_final
 (float8[] state)
+
+aggregate float8 []array_avg
 (float8[], boolean)
+
+voidcoxph_train
 (varchar source_table, varchar output_table, varchar dependent_varname, 
varchar independent_varname, varchar right_censoring_status, varchar strata, 
varchar optimizer_params)
+Compute cox-regression 
coefficients and diagnostic statistics.  More...
+
+varcharcoxph_train
 ()
+
+varcharcoxph_train
 (varchar message)
+
+voidcoxph_train
 (varchar source_table, varchar output_table, varchar dependent_variable, 
varchar independent_variable, varchar right_censoring_status, varchar 
strata)
+Cox regression training 
function.  More...
+
+voidcoxph_train
 (varchar source_table, varchar output_table, varchar dependent_variable, 
varchar independent_variable, varchar right_censoring_status)
+Cox regression training 
function.  More...
+
+voidcoxph_train
 (varchar source_table, varchar output_table, varchar dependent_variable, 
varchar independent_variable)
+Cox regression training 
function.  More...
+
+voidcoxph_predict
 (text model_table, text source_table, text id_col_name, text output_table, 
text pred_type, text reference)
+Predict the linear 
Predictor or the risk for the given data.  More...
+
+voidcoxph_predict
 (text model_table, text source_table, text id_col_name, text output_table, 
text pred_type)
+
+voidcoxph_predict
 (text model_table, text source_table, text id_col_name, text 
output_table)
+
+float8_coxph_predict_resp
 (float8[] coef, float8[] col_ind_var, float8[] mean_ind_var, text 
pred_type)
+
+float8 []_coxph_predict_terms
 (float8[] coef, float8[] col_ind_var, float8[] mean_ind_var)
+
+varcharcoxph_predict
 (varchar message)
+
+varcharcoxph_predict
 ()
+
+float8 []_split_transition
 (float8[], float8, integer, integer)
+
+float8 []_split_merge
 (float8[], float8[])
+
+float8 []_split_final
 (float8[])
+
+aggregate float8 []_compute_splits
 (float8, integer, integer)
+
+integer_compute_grpid
 (float8[] splits, float8 split_col, boolean reverse)
+
+integer_compute_grpid
 (float8[] splits, float8 split_col)
+
+coxph_resultcompute_coxph_result
 (float8[] coef, float8 l, float8[] d2l, integer niter, float8[] stds)
+
+coxph_step_resultcoxph_improved_step_final
 (float8[] state)
+
+float8 []coxph_improved_step_transition
 (float8[] state, float8[] x, float8[] y, integer[] status, float8[] coef, 
float8[] max_coef)
+
+float8 []coxph_step_inner_final
 (float8[] state)
+
+float8 

[26/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__crf.html
--
diff --git a/docs/rc/group__grp__crf.html b/docs/rc/group__grp__crf.html
new file mode 100644
index 000..20fd7da
--- /dev/null
+++ b/docs/rc/group__grp__crf.html
@@ -0,0 +1,632 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: Conditional Random Field
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('group__grp__crf.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Conditional Random FieldSupervised Learning  
+
+
+Contents 
+
+Training Feature Generation 
+
+CRF Training Function 
+
+Testing Feature Generation 
+
+Inference using Viterbi 
+
+Using CRF 
+
+Examples 
+
+Technical Background 
+
+Literature 
+
+Related Topics 
+
+A conditional random field (CRF) is a type of discriminative, 
undirected probabilistic graphical model. A linear-chain CRF is a special type 
of CRF that assumes the current state depends only on the previous state.
+Feature extraction modules are provided for text-analysis tasks such as 
part-of-speech (POS) tagging and named-entity resolution (NER). Currently, six 
feature types are implemented:
+
+Edge Feature: transition feature that encodes the transition feature 
weight from current label to next label.
+Start Feature: fired when the current token is the first token in a 
sequence.
+End Feature: fired when the current token is the last token in a 
sequence.
+Word Feature: fired when the current token is observed in the trained 
dictionary.
+Unknown Feature: fired when the current token is not observed in the 
trained dictionary for at least a certain number of times (default 1).
+Regex Feature: fired when the current token can be matched by a regular 
expression.
+
+A Viterbi implementation is also provided to get the best label sequence 
and the conditional probability \( \Pr( \text{best label sequence} \mid 
\text{sequence}) \).
+Following steps are required for CRF Learning and Inference:
+Training Feature Generation
+CRF Training
+Testing Feature Generation
+Inference using Viterbi
+
+Training Feature GenerationThe function takes 
train_segment_tbl and regex_tbl as input and does 
feature generation generating three tables dictionary_tbl, 
train_feature_tbl and train_featureset_tbl, that are 
required as an input for CRF training. 
+crf_train_fgen(train_segment_tbl,
+   regex_tbl,
+   label_tbl,
+   dictionary_tbl,
+   train_feature_tbl,
+   train_featureset_tbl)
+ Arguments 
+train_segment_tbl 
+TEXT. Name of the training segment table. The table is expected to have 
the following columns: 
+
+doc_id INTEGER. Document id column  
+
+start_pos INTEGER. Index of a particular term in the respective 
document  
+
+seg_text TEXT. Term at the respective start_pos in 
the document  
+
+label INTEGER. Label id for the term corresponding to the actual 
label from label_tbl   
+
+
+regex_tbl 
+TEXT. Name of the regular expression table. The table is expected to have 
the following columns: 
+
+pattern TEXT. Regular Expression  
+
+name TEXT. Regular Expression name  
+
+
+label_tbl 
+TEXT. Name of the table containing unique labels and their id's. The table 
is expected to have the following columns: 
+
+id INTEGER. Unique label 

[01/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
Repository: madlib-site
Updated Branches:
  refs/heads/asf-site acd339f65 -> 9a2b301d3


http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/lda_8sql__in.html
--
diff --git a/docs/rc/lda_8sql__in.html b/docs/rc/lda_8sql__in.html
new file mode 100644
index 000..e2423a7
--- /dev/null
+++ b/docs/rc/lda_8sql__in.html
@@ -0,0 +1,1422 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: lda.sql_in File Reference
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('lda_8sql__in.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Functions  
+  
+lda.sql_in File Reference  
+
+
+
+SQL functions for Latent Dirichlet Allocation.  
+More...
+
+
+Functions
+set lda_result lda_train (text 
data_table, text model_table, text output_data_table, int4 voc_size, int4 
topic_num, int4 iter_num, float8 alpha, float8 beta)
+This UDF provides an entry 
for the lda training process.  More...
+
+set lda_result lda_predict 
(text data_table, text model_table, text output_table)
+This UDF provides an entry 
for the lda predicton process.  More...
+
+set lda_result lda_predict 
(text data_table, text model_table, text output_table, int4 iter_num)
+A overloaded version which 
allows users to specify iter_num.  More...
+
+set lda_result lda_get_topic_word_count
 (text model_table, text output_table)
+This UDF computes the 
per-topic word counts.  More...
+
+set lda_result lda_get_word_topic_count
 (text model_table, text output_table)
+This UDF computes the 
per-word topic counts.  More...
+
+set lda_result lda_get_topic_desc
 (text model_table, text vocab_table, text desc_table, int4 top_k)
+This UDF gets the 
description for each topic (top-k words)  More...
+
+set lda_result lda_get_word_topic_mapping
 (text lda_output_table, text mapping_table)
+This UDF gets the wordid - 
topicid mapping from the lda training output table.  More...
+
+int4 []__lda_random_assign
 (int4 word_count, int4 topic_num)
+This UDF assigns topics to 
words in a document randomly.  More...
+
+int4 []__lda_gibbs_sample
 (int4[] words, int4[] counts, int4[] doc_topic, int8[] model, float8 alpha, 
float8 beta, int4 voc_size, int4 topic_num, int4 iter_num)
+This UDF learns the topics 
of words in a document and is the main step of a Gibbs sampling iteration. The 
model parameter (including the per-word topic counts and corpus-level topic 
counts) is passed to this function in the first call and then transfered to the 
rest calls through fcinfo-flinfo-fn_extra to allow the immediate 
update.  More...
+
+int8 []__lda_count_topic_sfunc
 (int8[] state, int4[] words, int4[] counts, int4[] topic_assignment, int4 
voc_size, int4 topic_num)
+This UDF is the sfunc for 
the aggregator computing the topic counts for each word and the topic count in 
the whole corpus. It scans the topic assignments in a document and updates the 
topic counts.  More...
+
+int8 []__lda_count_topic_prefunc
 (int8[] state1, int8[] state2)
+This UDF is the prefunc for 
the aggregator computing the per-word topic counts.  More...
+
+aggregate int8 []__lda_count_topic_agg
 (int4[], int4[], int4[], int4, int4)
+This uda computes the word 
topic counts by scanning and summing up topic assignments in each document.  More...
+
+float8lda_get_perplexity
 (text model_table, text 

[08/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__strs.html
--
diff --git a/docs/rc/group__grp__strs.html b/docs/rc/group__grp__strs.html
new file mode 100644
index 000..a3a6e3b
--- /dev/null
+++ b/docs/rc/group__grp__strs.html
@@ -0,0 +1,269 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: Stratified Sampling
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('group__grp__strs.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Stratified SamplingSampling  
+
+
+Contents 
+
+Stratified Sampling 
+
+Examples 
+
+Stratified sampling is a method for independently sampling 
subpopulations (strata). It is commonly used to reduce sampling error by 
ensuring that subgroups are adequately represented in the sample.
+Stratified 
Sampling
+
+stratified_sample(  source_table,
+output_table,
+proportion,
+grouping_cols,
+target_cols,
+with_replacement
+  )
+Arguments 
+source_table 
+TEXT. Name of the table containing the input data.
+
+
+output_table 
+TEXT. Name of output table that contains the sampled 
data. The output table contains all columns present in the source table unless 
otherwise specified in the 'target_cols' parameter below.
+
+
+proportion 
+FLOAT8 in the range (0,1). Each stratum is sampled 
independently.
+
+
+grouping_cols (optional) 
+TEXT, default: NULL. A single column or a list of 
comma-separated columns that defines the strata. When this parameter is NULL, 
no grouping is used so the sampling is non-stratified, that is, the whole table 
is treated as a single group.
+
+
+target_cols (optional) 
+TEXT, default NULL. A comma-separated list of columns 
to appear in the 'output_table'. If NULL or '*', all columns from the 
'source_table' will appear in the 'output_table'.
+NoteDo not include 'grouping_cols' in the parameter 
'target_cols', because they are always included in the 'output_table'.
+
+with_replacement (optional) 
+BOOLEAN, default FALSE. Determines whether to sample with replacement or 
without replacement (default). With replacement means that it is possible that 
the same row may appear in the sample set more than once. Without replacement 
means a given row can be selected only once. 
+
+Examples
+Please note that due to the random nature of sampling, your results may 
look different from those below.
+
+Create an input table: 
+DROP TABLE IF EXISTS test;
+CREATE TABLE test(
+id1 INTEGER,
+id2 INTEGER,
+gr1 INTEGER,
+gr2 INTEGER
+);
+INSERT INTO test VALUES
+(1,0,1,1),
+(2,0,1,1),
+(3,0,1,1),
+(4,0,1,1),
+(5,0,1,1),
+(6,0,1,1),
+(7,0,1,1),
+(8,0,1,1),
+(9,0,1,1),
+(9,0,1,1),
+(9,0,1,1),
+(9,0,1,1),
+(0,1,1,2),
+(0,2,1,2),
+(0,3,1,2),
+(0,4,1,2),
+(0,5,1,2),
+(0,6,1,2),
+(10,10,2,2),
+(20,20,2,2),
+(30,30,2,2),
+(40,40,2,2),
+(50,50,2,2),
+(60,60,2,2),
+(70,70,2,2);
+
+Sample without replacement: 
+DROP TABLE IF EXISTS out;
+SELECT madlib.stratified_sample(
+'test',-- Source table
+'out', -- Output table
+0.5,   -- Sample proportion
+'gr1,gr2', -- Strata definition
+

[15/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__nn.html
--
diff --git a/docs/rc/group__grp__nn.html b/docs/rc/group__grp__nn.html
new file mode 100644
index 000..d7569f6
--- /dev/null
+++ b/docs/rc/group__grp__nn.html
@@ -0,0 +1,1143 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: Neural Network
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('group__grp__nn.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Neural NetworkSupervised Learning  
+
+
+Contents
+
+Classification 
+
+Regression 
+
+Optimizer Parameters 
+
+Prediction Functions 
+
+Examples 
+
+Technical Background 
+
+Literature 
+
+Related Topics 
+
+Multilayer Perceptron (MLP) is a type of neural network that can be 
used for regression and classification.
+MLPs consist of several fully connected hidden layers with non-linear 
activation functions. In the case of classification, the final layer of the 
neural net has as many nodes as classes, and the output of the neural net can 
be interpreted as the probability that a given input feature belongs to a 
specific class.
+MLP can be used with or without mini-batching. The advantage of using 
mini-batching is that it can perform better than stochastic gradient descent 
(default MADlib optimizer) because it uses more than one training example at a 
time, typically resulting faster and smoother convergence [3].
+NoteIn order to use mini-batching, you 
must first run the Mini-Batch Preprocessor, 
which is a utility that prepares input data for use by models that support 
mini-batch as an optimization option, such as MLP. This is a one-time operation 
and you would only need to re-run the preprocessor if your input data has 
changed, or if you change the grouping parameter.
+Classification Training FunctionThe MLP classification 
training function has the following format:
+
+mlp_classification(
+source_table,
+output_table,
+independent_varname,
+dependent_varname,
+hidden_layer_sizes,
+optimizer_params,
+activation,
+weights,
+warm_start,
+verbose,
+grouping_col
+)
+Arguments 
+source_table 
+TEXT. Name of the table containing the training data. 
If you are using mini-batching, this is the name of the output table from the 
mini-batch preprocessor.
+
+
+output_table 
+TEXT. Name of the output table containing the model. 
Details of the output table are shown below. 
+
+
+independent_varname 
+TEXT. Expression list to evaluate for the independent 
variables. It should be a numeric array expression. If you are using 
mini-batching, set this parameter to 'independent_varname' which is the 
hardcoded name of the column from the mini-batch preprocessor containing the 
packed independent variables.
+NoteIf you are not using mini-batching, 
please note that an intercept variable should not be included as part of this 
expression - this is different from other MADlib modules. Also please note that 
independent variables should be encoded properly. All values are cast to DOUBLE 
PRECISION, so categorical variables should be one-hot or dummy encoded as 
appropriate. See Encoding 
Categorical Variables for more details. 
+
+dependent_varname 
+TEXT. Name of the dependent variable column. For 
classification, supported types 

[42/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/cross__validation_8sql__in.html
--
diff --git a/docs/rc/cross__validation_8sql__in.html 
b/docs/rc/cross__validation_8sql__in.html
new file mode 100644
index 000..c5acec3
--- /dev/null
+++ b/docs/rc/cross__validation_8sql__in.html
@@ -0,0 +1,717 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: cross_validation.sql_in File Reference
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('cross__validation_8sql__in.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Functions  
+  
+cross_validation.sql_in File Reference  
+
+
+
+SQL functions for cross validation.  
+More...
+
+
+Functions
+voidcross_validation_general
 (varchar modelling_func, varchar[] modelling_params, varchar[] 
modelling_params_type, varchar param_explored, varchar[] explore_values, 
varchar predict_func, varchar[] predict_params, varchar[] predict_params_type, 
varchar metric_func, varchar[] metric_params, varchar[] metric_params_type, 
varchar data_tbl, varchar data_id, boolean id_is_random, varchar 
validation_result, varchar[] data_cols, integer n_folds)
+
+voidcross_validation_general
 (varchar modelling_func, varchar[] modelling_params, varchar[] 
modelling_params_type, varchar param_explored, varchar[] explore_values, 
varchar predict_func, varchar[] predict_params, varchar[] predict_params_type, 
varchar metric_func, varchar[] metric_params, varchar[] metric_params_type, 
varchar data_tbl, varchar data_id, boolean id_is_random, varchar 
validation_result, varchar[] data_cols)
+
+voidcv_linregr_train
 (varchar tbl_source, varchar col_ind_var, varchar col_dep_var, varchar 
tbl_result)
+A wrapper for linear 
regression.  More...
+
+voidcv_linregr_predict
 (varchar tbl_model, varchar tbl_newdata, varchar col_ind_var, varchar col_id, 
varchar tbl_predict)
+A wrapper for linear 
regression prediction.  More...
+
+voidmse_error
 (varchar tbl_prediction, varchar tbl_actual, varchar id_actual, varchar 
values_actual, varchar tbl_error)
+
+voidmisclassification_avg
 (varchar tbl_prediction, varchar tbl_actual, varchar id_actual, varchar 
values_actual, varchar tbl_error)
+
+voidcv_logregr_predict
 (varchar tbl_model, varchar tbl_newdata, varchar col_ind_var, varchar col_id, 
varchar tbl_predict)
+A prediction function for 
logistic regression The result is stored in the table of tbl_predict.  More...
+
+integerlogregr_accuracy
 (float8[] coef, float8[] col_ind, boolean col_dep)
+Metric function for 
logistic regression.  More...
+
+voidcv_logregr_accuracy
 (varchar tbl_predict, varchar tbl_source, varchar col_id, varchar col_dep_var, 
varchar tbl_accuracy)
+Metric function for 
logistic regression.  More...
+
+
+Detailed 
Description
+DateJanuary 
2011
+See alsoFor a brief introduction to the 
usage of cross validation, see the module description Cross Validation. 
+Function Documentation
+
+cross_validation_general()
 [1/2]
+
+
+
+  
+
+  void cross_validation_general 
+  (
+  varchar
+  modelling_func, 
+
+
+  
+  
+  varchar []
+  modelling_params, 
+
+
+  
+  
+  varchar []
+  modelling_params_type, 
+
+
+  
+   

[23/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__encode__categorical.html
--
diff --git a/docs/rc/group__grp__encode__categorical.html 
b/docs/rc/group__grp__encode__categorical.html
new file mode 100644
index 000..61f6f45
--- /dev/null
+++ b/docs/rc/group__grp__encode__categorical.html
@@ -0,0 +1,700 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: Encoding Categorical Variables
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('group__grp__encode__categorical.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Encoding Categorical VariablesData Types and 
Transformations  
+
+
+Contents 
+
+Coding Systems for Categorical Variables 
+
+Examples 
+
+Literature 
+
+Coding Systems for Categorical VariablesCategorical 
variables [1] require special attention in regression analysis because, unlike 
dichotomous or continuous variables, they cannot be entered into the regression 
equation just as they are. For example, if you have a variable called race that 
is coded with 1=Hispanic, 2=Asian, 3=Black, 4=White, then entering race in your 
regression will look at the linear effect of the race variable, which is 
probably not what you intended. Instead, categorical variables like this need 
to be coded into a series of indicator variables which can then be entered into 
the regression model. There are a variety of coding systems that can be used 
for coding categorical variables, including one-hot, dummy, effects, 
orthogonal, and Helmert.
+We currently support one-hot and dummy coding techniques.
+Dummy coding is used when a researcher wants to compare other groups of the 
predictor variable with one specific group of the predictor variable. Often, 
the specific group to compare with is called the reference group.
+One-hot encoding is similar to dummy coding except it builds indicator 
(0/1) columns (cast as numeric) for each value of each category. Only one of 
these columns could take on the value 1 for each row (data point). There is no 
reference category for this function.
+
+encode_categorical_variables (
+source_table,
+output_table,
+categorical_cols,
+categorical_cols_to_exclude,-- Optional
+row_id, -- Optional
+top,-- Optional
+value_to_drop,  -- Optional
+encode_null,-- Optional
+output_type,-- Optional
+output_dictionary,  -- Optional
+distributed_by  -- Optional
+)
+ Arguments 
+source_table 
+VARCHAR. Name of the table containing the source 
categorical data to encode.
+
+
+output_table 
+VARCHAR. Name of the result table.
+NoteIf there are index columns in the 
'source_table' specified by the parameter 'row_id' (see below), then the output 
table will contain only the index columns 'row_id' and the encoded columns. If 
the parameter 'row_id' is not specified, then all columns from the 
'source_table', with the exception of the original columns that have been 
encoded, will be included in the 'output_table'. 
+
+categorical_cols 
+VARCHAR. Comma-separated string of column names of 
categorical variables to encode. Can also be '*' meaning 

[20/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__kmeans.html
--
diff --git a/docs/rc/group__grp__kmeans.html b/docs/rc/group__grp__kmeans.html
new file mode 100644
index 000..0f40769
--- /dev/null
+++ b/docs/rc/group__grp__kmeans.html
@@ -0,0 +1,492 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: k-Means Clustering
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('group__grp__kmeans.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+k-Means ClusteringUnsupervised Learning  Clustering  
+
+
+Contents 
+
+Training Function 
+
+Output Format 
+
+Cluster Assignment 
+
+Examples 
+
+Notes 
+
+Technical Background 
+
+Literature 
+
+Related Topics 
+
+Clustering refers to the problem of partitioning a set of objects 
according to some problem-dependent measure of similarity. In the 
k-means variant, given \( n \) points \( x_1, \dots, x_n \in \mathbb R^d \), 
the goal is to position \( k \) centroids \( c_1, \dots, c_k \in \mathbb R^d \) 
so that the sum of distances between each point and its closest 
centroid is minimized. Each centroid represents a cluster that consists of all 
points to which this centroid is closest.
+Training 
Function
+The k-means algorithm can be invoked in four ways, depending on the source 
of the initial set of centroids:
+
+Use the random centroid seeding method. 
+kmeans_random( rel_source,
+   expr_point,
+   k,
+   fn_dist,
+   agg_centroid,
+   max_num_iterations,
+   min_frac_reassigned
+ )
+
+Use the kmeans++ centroid seeding method. 
+kmeanspp( rel_source,
+  expr_point,
+  k,
+  fn_dist,
+  agg_centroid,
+  max_num_iterations,
+  min_frac_reassigned,
+  seeding_sample_ratio
+)
+
+Supply an initial centroid set in a relation identified by the 
rel_initial_centroids argument. 
+kmeans( rel_source,
+expr_point,
+rel_initial_centroids,
+expr_centroid,
+fn_dist,
+agg_centroid,
+max_num_iterations,
+min_frac_reassigned
+  )
+
+Provide an initial centroid set as an array expression in the 
initial_centroids argument. 
+kmeans( rel_source,
+expr_point,
+initial_centroids,
+fn_dist,
+agg_centroid,
+max_num_iterations,
+min_frac_reassigned
+  )
+ Arguments 
+rel_source 
+TEXT. The name of the table containing the input data 
points.
+Data points and predefined centroids (if used) are expected to be stored 
row-wise, in a column of type SVEC (or any type convertible to 
SVEC, like 
FLOAT[] or INTEGER[]). Data points with non-finite 
values (NULL, NaN, infinity) in any component are skipped during analysis. 
+
+
+expr_point 
+TEXT. The name of the column with point coordinates or 
an array expression.
+
+
+k 
+INTEGER. The number of centroids to calculate.
+
+
+fn_dist (optional) 
+TEXT, default: squared_dist_norm2'. The name of the 
function to use to calculate the distance from a data point to a centroid.
+The following distance functions can be used (computation of 
barycenter/mean in parentheses): 
+
+dist_norm1:
 1-norm/Manhattan (element-wise median [Note that MADlib does not provide a 
median 

[07/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__svec.html
--
diff --git a/docs/rc/group__grp__svec.html b/docs/rc/group__grp__svec.html
new file mode 100644
index 000..efdf975
--- /dev/null
+++ b/docs/rc/group__grp__svec.html
@@ -0,0 +1,455 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: Sparse Vectors
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('group__grp__svec.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Sparse VectorsData Types and Transformations  Arrays and 
Matrices  
+
+
+Contents 
+
+Using Sparse Vectors 
+
+Document Vectorization into Sparse Vectors 
+
+Examples 
+
+Related Topics 
+
+This module implements a sparse vector data type, named "svec", which 
provides compressed storage of vectors that have many duplicate elements.
+Arrays of floating point numbers for various calculations sometimes have 
long runs of zeros (or some other default value). This is common in 
applications like scientific computing, retail optimization, and text 
processing. Each floating point number takes 8 bytes of storage in memory 
and/or disk, so saving those zeros is often worthwhile. There are also many 
computations that can benefit from skipping over the zeros.
+Consider, for example, the following array of doubles stored as a 
Postgres/Greenplum "float8[]" data type:
+
+'{0, 33,...40,000 zeros..., 12, 22 }'::float8[]
+This array would occupy slightly more than 320KB of memory or disk, 
most of it zeros. Even if we were to exploit the null bitmap and store the 
zeros as nulls, we would still end up with a 5KB null bitmap, which is still 
not nearly as memory efficient as we'd like. Also, as we perform various 
operations on the array, we do work on 40,000 fields that turn out to be 
unimportant.
+To solve the problems associated with the processing of vectors discussed 
above, the svec type employs a simple Run Length Encoding (RLE) scheme to 
represent sparse vectors as pairs of count-value arrays. For example, the array 
above would be represented as
+
+'{1,1,4,1,1}:{0,33,0,12,22}'::madlib.svec
+which says there is 1 occurrence of 0, followed by 1 occurrence of 
33, followed by 40,000 occurrences of 0, etc. This uses just 5 integers and 5 
floating point numbers to store the array. Further, it is easy to implement 
vector operations that can take advantage of the RLE representation to make 
computations faster. The SVEC module provides a library of such functions.
+The current version only supports sparse vectors of float8 values. Future 
versions will support other base types.
+Using 
Sparse Vectors
+An SVEC can be constructed directly with a constant expression, as follows: 

+SELECT '{n1,n2,...,nk}:{v1,v2,...vk}'::madlib.svec;
+ where n1,n2,...,nk specifies the counts for the values 
v1,v2,...,vk.
+A float array can be cast to an SVEC: 
+SELECT ('{v1,v2,...vk}'::float[])::madlib.svec;
+An SVEC can be created with an aggregation: 
+SELECT madlib.svec_agg(v1) FROM generate_series(1,k);
+An SVEC can be created using the 
madlib.svec_cast_positions_float8arr() function by supplying an 
array of positions and an array of values at those positions: 
+SELECT madlib.svec_cast_positions_float8arr(
+array[n1,n2,...nk],-- positions of values in vector
+

[49/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/array__ops_8sql__in.html
--
diff --git a/docs/rc/array__ops_8sql__in.html b/docs/rc/array__ops_8sql__in.html
new file mode 100644
index 000..c6140a5
--- /dev/null
+++ b/docs/rc/array__ops_8sql__in.html
@@ -0,0 +1,1275 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: array_ops.sql_in File Reference
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('array__ops_8sql__in.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Functions  
+  
+array_ops.sql_in File Reference  
+
+
+
+implementation of array operations in SQL  
+More...
+
+
+Functions
+anyarrayarray_add 
(anyarray x, anyarray y)
+Adds two arrays. It 
requires that all the values are NON-NULL. Return type is the same as the input 
type.  More...
+
+aggregate anyarraysum 
(anyarray)
+Aggregate, element-wise sum 
of arrays. It requires that all the values are NON-NULL. Return type is the 
same as the input type.  More...
+
+anyarrayarray_sub 
(anyarray x, anyarray y)
+Subtracts two arrays. It 
requires that all the values are NON-NULL. Return type is the same as the input 
type.  More...
+
+anyarrayarray_mult
 (anyarray x, anyarray y)
+Element-wise product of two 
arrays. It requires that all the values are NON-NULL. Return type is the same 
as the input type.  More...
+
+anyarrayarray_div 
(anyarray x, anyarray y)
+Element-wise division of 
two arrays. It requires that all the values are NON-NULL. Return type is the 
same as the input type.  More...
+
+float8array_dot 
(anyarray x, anyarray y)
+Dot-product of two arrays. 
It requires that all the values are NON-NULL. Return type is the same as the 
input type.  More...
+
+boolarray_contains
 (anyarray x, anyarray y)
+Checks whether one array 
contains the other. This function returns TRUE if each non-zero element in the 
right array equals to the element with the same index in the left array.  More...
+
+anyelementarray_max 
(anyarray x)
+This function finds the 
maximum value in the array. NULLs are ignored. Return type is the same as the 
input type.  More...
+
+float8 []array_max_index
 (anyarray x)
+This function finds the 
maximum value and corresponding index in the array. NULLs are ignored. Return 
type is the same as the input type.  More...
+
+anyelementarray_min 
(anyarray x)
+This function finds the 
minimum value in the array. NULLs are ignored. Return type is the same as the 
input type.  More...
+
+float8 []array_min_index
 (anyarray x)
+This function finds the 
minimum value and corresponding index in the array. NULLs are ignored. Return 
type is the same as the input type.  More...
+
+anyelementarray_sum 
(anyarray x)
+This function finds the sum 
of the values in the array. NULLs are ignored. Return type is the same as the 
input type.  More...
+
+float8array_sum_big
 (anyarray x)
+This function finds the sum 
of the values in the array. NULLs are ignored. Return type is always FLOAT8 
regardless of input. This function is meant to replace array_sum() in the cases when sum may overflow the 
element type.  More...
+
+anyelementarray_abs_sum
 (anyarray x)
+This function finds the sum 
of abs of the values in the array. NULLs are ignored. Return type is the same 
as the input type.  More...
+
+anyarrayarray_abs 
(anyarray 

[36/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/dir_ce4fa7aad06dd1bbca713eb50be7391b.html
--
diff --git a/docs/rc/dir_ce4fa7aad06dd1bbca713eb50be7391b.html 
b/docs/rc/dir_ce4fa7aad06dd1bbca713eb50be7391b.html
new file mode 100644
index 000..d149a74
--- /dev/null
+++ b/docs/rc/dir_ce4fa7aad06dd1bbca713eb50be7391b.html
@@ -0,0 +1,190 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: modules Directory Reference
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('dir_ce4fa7aad06dd1bbca713eb50be7391b.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+modules Directory Reference  
+
+
+
+
+Directories
+directory assoc_rules
+
+directory bayes
+
+directory conjugate_gradient
+
+directory convex
+
+directory crf
+
+directory elastic_net
+
+directory glm
+
+directory graph
+
+directory kmeans
+
+directory knn
+
+directory lda
+
+directory linalg
+
+directory linear_systems
+
+directory pca
+
+directory pmml
+
+directory prob
+
+directory recursive_partitioning
+
+directory regress
+
+directory sample
+
+directory stats
+
+directory summary
+
+directory svm
+
+directory tsa
+
+directory utilities
+
+directory validation
+
+
+
+
+
+
+  
+madlibsrcportspostgresmodules
+Generated on Mon Aug 6 2018 21:55:39 for MADlib by
+http://www.doxygen.org/index.html;>
+ 1.8.14 
+  
+
+
+

http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/dir_d2a50e0bf8f1a4defa2e0ba0a193a480.html
--
diff --git a/docs/rc/dir_d2a50e0bf8f1a4defa2e0ba0a193a480.html 
b/docs/rc/dir_d2a50e0bf8f1a4defa2e0ba0a193a480.html
new file mode 100644
index 000..dc25b29
--- /dev/null
+++ b/docs/rc/dir_d2a50e0bf8f1a4defa2e0ba0a193a480.html
@@ -0,0 +1,142 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: pmml Directory Reference
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new 

[04/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/hypothesis__tests_8sql__in.html
--
diff --git a/docs/rc/hypothesis__tests_8sql__in.html 
b/docs/rc/hypothesis__tests_8sql__in.html
new file mode 100644
index 000..e172197
--- /dev/null
+++ b/docs/rc/hypothesis__tests_8sql__in.html
@@ -0,0 +1,1262 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: hypothesis_tests.sql_in File Reference
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('hypothesis__tests_8sql__in.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Functions  
+  
+hypothesis_tests.sql_in File Reference  
+
+
+
+SQL functions for statistical hypothesis tests.  
+More...
+
+
+Functions
+float8 []t_test_one_transition
 (float8[] state, float8 value)
+
+float8 []t_test_merge_states
 (float8[] state1, float8[] state2)
+
+t_test_resultt_test_one_final
 (float8[] state)
+
+f_test_resultf_test_final
 (float8[] state)
+
+aggregate float8 []t_test_one
 (float8 value)
+Perform one-sample or 
dependent paired Student t-test.  More...
+
+float8 []t_test_two_transition
 (float8[] state, boolean first, float8 value)
+
+t_test_resultt_test_two_pooled_final
 (float8[] state)
+
+aggregate float8 []t_test_two_pooled
 (boolean first, float8 value)
+Perform two-sample pooled 
(i.e., equal variances) Student t-test.  More...
+
+t_test_resultt_test_two_unpooled_final
 (float8[] state)
+
+aggregate float8 []t_test_two_unpooled
 (boolean first, float8 value)
+Perform unpooled (i.e., 
unequal variances) t-test (also known as Welch's t-test)  More...
+
+aggregate float8 []f_test
 (boolean first, float8 value)
+Perform Fisher F-test.  More...
+
+float8 []chi2_gof_test_transition
 (float8[] state, bigint observed, float8 expected, bigint df)
+
+float8 []chi2_gof_test_transition
 (float8[] state, bigint observed, float8 expected)
+
+float8 []chi2_gof_test_transition
 (float8[] state, bigint observed)
+
+float8 []chi2_gof_test_merge_states
 (float8[] state1, float8[] state2)
+
+chi2_test_resultchi2_gof_test_final
 (float8[] state)
+
+aggregate float8 []chi2_gof_test
 (bigint observed, float8 expected=1, bigint df=0)
+Perform Pearson's 
chi-squared goodness-of-fit test.  More...
+
+aggregate float8 []chi2_gof_test
 (bigint observed, float8 expected)
+
+aggregate float8 []chi2_gof_test
 (bigint observed)
+
+float8 []ks_test_transition
 (float8[] state, boolean first, float8 value, bigint numFirst, bigint 
numSecond)
+
+ks_test_resultks_test_final
 (float8[] state)
+
+float8 []mw_test_transition
 (float8[] state, boolean first, float8 value)
+Perform Kolmogorov-Smirnov 
test.  More...
+
+mw_test_resultmw_test_final
 (float8[] state)
+
+float8 []wsr_test_transition
 (float8[] state, float8 value, float8 precision)
+Perform Mann-Whitney test.  
More...
+
+float8 []wsr_test_transition
 (float8[] state, float8 value)
+
+wsr_test_resultwsr_test_final
 (float8[] state)
+
+float8 []one_way_anova_transition
 (float8[] state, integer group, float8 value)
+Perform 
Wilcoxon-Signed-Rank test.  More...
+
+float8 []one_way_anova_merge_states
 (float8[] state1, float8[] state2)
+
+one_way_anova_resultone_way_anova_final
 (float8[] state)
+
+aggregate float8 []one_way_anova
 (integer group, float8 value)
+Perform one-way analysis of 
variance.  More...
+
+
+Detailed 

[29/51] [partial] madlib-site git commit: Add v1.15 RC1 docs for release voting

2018-08-07 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/9a2b301d/docs/rc/group__grp__balance__sampling.html
--
diff --git a/docs/rc/group__grp__balance__sampling.html 
b/docs/rc/group__grp__balance__sampling.html
new file mode 100644
index 000..20a971d
--- /dev/null
+++ b/docs/rc/group__grp__balance__sampling.html
@@ -0,0 +1,607 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: Balanced Sampling
+
+
+
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(initResizable);
+/* @license-end */
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+  $(document).ready(function() { init_search(); });
+/* @license-end */
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.2/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.15
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+/* @license-end */
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+/* @license 
magnet:?xt=urn:btih:cf05388f2679ee054f2beb29a391d25f4e673ac3&dn=gpl-2.0.txt 
GPL-v2 */
+$(document).ready(function(){initNavTree('group__grp__balance__sampling.html','');});
+/* @license-end */
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Balanced SamplingSampling  
+
+
+Contents 
+
+Balanced Sampling 
+
+Examples 
+
+Literature 
+
+Related Topics 
+
+Some classification algorithms only perform optimally when the number 
of samples in each class is roughly the same. Highly skewed datasets are common 
in many domains (e.g., fraud detection), so resampling to offset this imbalance 
can produce a better decision boundary.
+This module offers a number of resampling techniques including 
undersampling majority classes, oversampling minority classes, and combinations 
of the two.
+Balanced 
Sampling
+
+balance_sample( source_table,
+output_table,
+class_col,
+class_sizes,
+output_table_size,
+grouping_cols,
+with_replacement,
+keep_null
+  )
+Arguments 
+source_table 
+TEXT. Name of the table containing the input data.
+
+
+output_table 
+TEXT. Name of output table that contains the sampled 
data. The output table contains all columns present in the source table, plus a 
new generated id called "__madlib_id__" added as the first column. 
+
+
+class_col 
+TEXT, Name of the column containing the class to be 
balanced. 
+
+
+class_sizes (optional) 
+VARCHAR, default ‘uniform’. Parameter to define the 
size of the different class values. (Class values are sometimes also called 
levels). Can be set to the following:
+
+
+‘uniform’: All class values will be resampled to have the same 
number of rows.  
+
+'undersample': Undersample such that all class values end up with the 
same number of observations as the minority class. Done without replacement by 
default unless the parameter ‘with_replacement’ is set to TRUE.  
+
+'oversample': Oversample with replacement such that all class values 
end up with the same number of observations as the majority class. Not affected 
by the parameter ‘with_replacement’ since oversampling is always done with 
replacement.  Short forms of the above will work too, e.g., 'uni' works the 
same as 'uniform'. 
+
+Alternatively, you can also explicitly set class size in a string 
containing a comma-delimited list. Order does not matter and all class values 
do not need to be specified. Use the format “class_value_1=x, 
class_value_2=y, …” where 'class_value' in the list must exist in the 
column 'class_col'. Set to an integer representing the desired number of 
observations. E.g., ‘red=3000, blue=4000’ means you want to resample the 
dataset to result in exactly 3000 red and 4000 blue rows in the 
‘output_table’.  
+NoteThe allowed names for class values 
follows object naming rules in PostgreSQL [1]. Quoted identifiers are allowed 
and should be enclosed in 

svn commit: r28605 - /dev/madlib/1.15-RC1/

2018-08-07 Thread riyer
Author: riyer
Date: Tue Aug  7 21:03:59 2018
New Revision: 28605

Log:
Add 1.15 RC1 files

Added:
dev/madlib/1.15-RC1/
dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg   (with props)
dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg.asc
dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg.sha512
dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm   (with props)
dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm.asc
dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm.sha512
dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux.rpm   (with props)
dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux.rpm.asc
dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux.rpm.sha512
dev/madlib/1.15-RC1/apache-madlib-1.15-src.tar.gz   (with props)
dev/madlib/1.15-RC1/apache-madlib-1.15-src.tar.gz.asc
dev/madlib/1.15-RC1/apache-madlib-1.15-src.tar.gz.sha512

Added: dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg
==
Binary file - no diff available.

Propchange: dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg
--
svn:mime-type = application/octet-stream

Added: dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg.asc
==
--- dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg.asc (added)
+++ dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg.asc Tue Aug  7 
21:03:59 2018
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCAAdFiEE+pCcIAZjKnTgT/pQYwVq5BLE49cFAltpL1QACgkQYwVq5BLE
+49cDJA/+IhMzkgxv6zX1Omuo8ofNMCetHJC4RmB8rwxem7DnLVUgwYNn+xK7lpAU
+Yn9nm/XtFGXVqJ4CWGzaDL/iW2fsUqI5LX22CgeRaRD/iXasYB5TWMKvspaYY5RW
+23Y7lYv3ea/+Gxnjj3uG7BwqxJ5YvtNiWoKWpq8PhSgo1souBivMGLGVS1DK55Wy
+gnZuGULY9qq3cr0n5N7HDRS0e3bzKWqpm5xcGAtz2O5hW7tVDqT2FBrJmOG8mkPQ
+GZ7cRPbeIeAi+CQzuvm522DtqPepJJW99UAl+0oksHgB6ag+iS80bufF27Fr9P0n
+18Lq59/mJwdeUIxK95ak2AWjjmuuFzLY5QB06kJ5Mze96m4SA/VFJ9qdGljcDesX
+BkwKNboi/zQSrUY5xVWNPWn3Qe5v0FUH8H0K1laqkczkeN+TGh8BlmOUF9DGbZ3l
+L8spewzlbjuUAVUX9Q5Sren4qiliTj7UR4+hhggDvHIAAQQjCsOj78dOzce3Px8c
+BrYRHCHbzBS6vg75DRj3P2KItpeRvwdZfNBaG/F0cPpBP/Yuwma62SdGATLdg6Fj
++mMcYysmJLTrPsN0fu+Q7YasWgkPJthnaIkdxpbpEFkh74ZZaYcpDZZw7HW3FBB7
+qm8DQiMrL5wED9khZtvWNuqrMjlCuIN+/j8d8N7508DMtPtSkVE=
+=wEHu
+-END PGP SIGNATURE-

Added: dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg.sha512
==
--- dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg.sha512 (added)
+++ dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Darwin.dmg.sha512 Tue Aug  7 
21:03:59 2018
@@ -0,0 +1 @@
+494c374d272ac707dd503b1c1e33900ca0cca56f48e7ad84a7bed4f01090dbc09155fb09998bfb8db2b448ab84b527e619fbfafc90e3369b4b49cc5a27d4d5aa
  apache-madlib-1.15-bin-Darwin.dmg

Added: dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm
==
Binary file - no diff available.

Propchange: dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm
--
svn:mime-type = application/octet-stream

Added: dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm.asc
==
--- dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm.asc (added)
+++ dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm.asc Tue Aug  7 
21:03:59 2018
@@ -0,0 +1,16 @@
+-BEGIN PGP SIGNATURE-
+
+iQIzBAABCAAdFiEE+pCcIAZjKnTgT/pQYwVq5BLE49cFAltqAGsACgkQYwVq5BLE
+49eFQg//T9SFevZM1eqUwWoM8DhuRdHDNJRaLRuCqTn4RY6cwd4quMLbJ6tMrMcY
+Da0JFfKHUk916eDvgDAyVbbYNfLI6+Td2xdXRZKJdkf8ju1XeLK3hx196C/g+DF+
+ldHILlIoizcLFsypSqOxSwqqIzZ4V+ZdHLsoGILsTQKdok5AuLRYmcJFu7bxbLWI
+gx4tKTFhTJzzDC00Sq9eBIabsWUQhiR7WpmwswRtuOAcvJQH4rwjPjozeBqLGLt5
+/+554enRlTbQw+2URj5DybIYjEVba58sMN8cj83FPu0745e+2kTDW6oZ5TXXGc15
+Rh4PDkSd0+AoUWX64ccT1n/AINMwm1f3g7CWU1lrzXnwY9H9+eABFwtYNBsoPJQU
+bp8QhvjrJMupRKaD89l3JpaRgwb1dxl57V0wKAqpfPBcXS2iElfpq2IZ9DyWOskz
+/pIpgXNFt/JNkww6wxFVyPxZJMBpjDzKMY9UBBqtXcrwx7C6J6OlYWeZLFNpSS/+
+4oVoRJEncN25p9pR4mXlzLKnGQW0pjVrKZocAy55g0WXIilwGiauCO6cQO9cufnF
+6698eIdj5K0ytmdxSsOiLv75j3tynne55aDF8xQTPsa4IDycpc8t/WlQnBmxT2Cs
+y85kVrNoY05+57hxSE1entDMigjbqN0nSrUk2Cp3Mjd47rnrRPA=
+=pmUc
+-END PGP SIGNATURE-

Added: dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm.sha512
==
--- dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm.sha512 (added)
+++ dev/madlib/1.15-RC1/apache-madlib-1.15-bin-Linux-GPDB43.rpm.sha512 Tue Aug  
7 21:03:59 2018
@@ -0,0 +1 @@
+0fceb36d99e1364bddb5398c08c5fc99bc0b9e897948a7ae259827f8b2924e3840d160cae2150a258db1a6361a095ccc4ae311ccfe8201c29d00ee329ae0ab4d
  apache

svn commit: r28587 - /dev/madlib/KEYS

2018-08-06 Thread riyer
Author: riyer
Date: Tue Aug  7 05:42:30 2018
New Revision: 28587

Log:
Update KEYS with key for Rahul Iyer

Modified:
dev/madlib/KEYS

Modified: dev/madlib/KEYS
==
--- dev/madlib/KEYS (original)
+++ dev/madlib/KEYS Tue Aug  7 05:42:30 2018
@@ -178,64 +178,6 @@ oDf5rTpHlVJwfO8Trw==
 =Dqcb
 -END PGP PUBLIC KEY BLOCK-
 
-pub   4096R/6C725F40 2016-08-30 [expires: 2018-08-30]
-uid   [ultimate] Rahul Iyer 
-sig 36C725F40 2016-08-30  Rahul Iyer 
-sub   4096R/92694331 2016-08-30 [expires: 2018-08-30]
-sig  6C725F40 2016-08-30  Rahul Iyer 
-
--BEGIN PGP PUBLIC KEY BLOCK-
-Version: GnuPG v2
-
-mQINBFfGHG0BEADwQsuVp75Msqp7z1qiRj1IPC+HVtVA/M8sljTrSGLixtrhtNW9
-Qgj8xISz5AEv7bz8r+qT1xIlKfkFujJkWsrngKKwN7/ausa5AaBTn2KzG8/2KL30
-08uNbBV2vZ901S/zcELe2X0aDU2V0v3LNG3mLMyTqB1/k/D8Y2dRMRYo6TaPdnXi
-2FyFPkWWRvG8TtlZCzUPBxxq/gGc7Xs8Dy2p2QwdII+TBLQBAfmAxbGkwMlDUO53
-VTig6BDIsd6wOYL+ZuV0dkNesksdRfpLlBUv8Q7AZfbs03HdpDCzjCOCY69kFOa6
-P4biGHWepbf6cE2GxtU8XY89cbN0Wt7pKzj6c5bAuGRwlrGJ6g8ZdVsGR9XZfe3N
-5Oe6gL/oDRCe49DPp8o92j58K6f2HY2wnr80LVeKBdKE5dmZ4Z3twL7aw9i+HeXQ
-tjlBbXUdoXR4ESVJvTP6/cYgL2wxsKUVqd7Dzj+Yoy1xfVyfI/DcHOdlQ8ztlZkn
-lkEDh9aQr+GulkplySgyQpIB3Xumc34hkDdg3LT0natG/+ZZEkbeUsEcPOrTTaTW
-0y3GDtK4g6EGiMD084yC8fct5B8J6ePmttIxZIIveHz2VgqKdtxnC6rSwVU9XG2F
-fZfr8SWMwHWX/QrJX5dDhilV4sild3UPt1kBrv2wJ1+RUt6H8dfCc0akEwARAQAB
-tB1SYWh1bCBJeWVyIDxyaXllckBhcGFjaGUub3JnPokCPwQTAQgAKQUCV8YcbQIb
-AwUJA8JnAAcLCQgHAwIBBhUIAgkKCwQWAgMBAh4BAheAAAoJEFK3lclscl9APrgP
-/iniiFZyU3Rs+9ORZGtNeXPzfmTleHwzs2tQt5DGaCx+V7TFa7lEnXd0JTcfO6aJ
-NskmKNLXLnhDJFWDB3p/BoboTtkU8HmnzffJdrECEpTZE49dMlpbmLNbWo0ZooWw
-UbEWxJFQ/RX3poQ09OAK+2F7rhxe9NNZwIW0z9RtcCr26ttoOxgsAstPuPGjjxyx
-QZKVSXA7ArBkt31YB3v2Vl7f6XrZfyVYDDO+dtUQxnwwJrxJi8LB+v8yMoOW7pkx
-LO5WMMouQYyCE9rsZjT5Q0a3ji+01pdMbcnc9tGBGBpS0ftKd9H5Q1JzfpFwPXYI
-CiYBefaSxej+3okulqLE4Mg7PEFEY2tWs3Vruz3ivfWxAiSwR8ggXWD0GpPiQt/X
-HiCh0HTshjW0G3irA13ji9iRtzquOdEAmXmtBRIUNgvhx97oxSOaFvWaz9bumZKW
-QDDVvvkGcAMwh4m22Kni9fiSATHP4xjxF+Dq87EEZFpG4hiDblyJFwVuzsPs3/lq
-fxAefh9BrcEoArxYQna20FP9lKEsnaTSIeQhPkd5x+8MTLTOH4ehBDdBoy2yVMqn
-F5yh9UQ52zdIqFOymUuqD7z1MZzPet27IrvpezgDiFaY2PXLtxL4ulUOfvDbQwV5
-u8y7s1JomFmFilOMu5dEqMtXWDZJOafMG1ZTDygEzpU1uQINBFfGHG0BEADcmAZN
-bYU+LbSuKW2MmeW+iVkjX21B+8lRKtSevssa5q5xd2ug9Aw8QWIJFSUl0IA7ZV3l
-esnV251gJUf0gDFcAMZs4zjxAA6Pfhh1+M7vpVNyZCp1g25eE1fc156miPVHOd+1
-xLTfuGY9fwfPhpyAW1nWRaM+ZYyfcU61fvZ8DxSgGnFTY7iPXUIy2bU84F/QyM7c
-+tPyHbIBnmn9CvrFP+1tAOPOOvomjsgGzCN2q681fKnzQVSfnGq4QKV9Bp98Myww
-yoVf54DqLTzoJgApeashCDwJwKDkfbhXCxhAv7yh3cco+vz6kadQlo244uGwF0ID
-WP9PfG/+PEFiZwyC+fQ6MOolnMHd2XrQ3T7nZ6su1mLNnuLC3fwBGO1dT05B5TPs
-mPGS6r3SGXNlbq2Db5A6CbnLxUZtcFtw4BGeH40JAMMhjkkKWPTVif0zkYBeeil3
-r6TWgm5Q7Y9x6Wa+VSQmCVkt6sdfV2oHVZBj/gtFdVS9shs54TP+neOexvvm8Th0
-DwcZCsBcxzQY4D+cmI/O2XwNp/bwUrD0yzDCET3MqJwl52rgMan7wi3/PEj3kIoz
-ncbYj8ja2rAPQjWKg4ffzx247q95KUKi4XGc4qWCUKDndxARHl7+bALNLpTIbjzR
-0ZlLtnAwusnY+Szi200z0lkq/2D6Xl5Ea5jm8QARAQABiQIlBBgBCAAPBQJXxhxt
-AhsMBQkDwmcAAAoJEFK3lclscl9A4G4P/j+lACZ5uInAz8qdN6UijYeMGQ5JWXe8
-Kt9Ja59KuR3e7Rc+6vSo9qOOCNrJgm+WEqeIMwQyLxwp9Oi4nkpTZCvB2zJRK48I
-hvckn8q/dWiBwWL+mnwYHI5JpM2bVsttfXhhRs5Y9XJxXATglFDG8sjZ/uU5xy/+
-R8zIrtVFXRlBD1faDEVVTDOrnHdxIgA9vV2THyls8HSiSpXNfoZorye5T/Srg7fn
-6gELbLQUplqhh1l5/8WRwJvkV1m+REK5UDJzj+vEWXdavArWLW0E1Sq3k3jdiU48
-sxkbEVj3HbIL7cMQRBTjb/mI3wYnAKr97lPpi+D6wP93GIwMubnzAaSIkZpV1z2y
-bcnaDwLF+FxH6Eli+3JjOc34Wfg0swUbHt3kJfRn0YTQnSoWV7wVGJE+mg1JhCTN
-a9DnMofVYNTt5IUqCI+xFa+N4ytIOOuTGno/qof36QYzjXNQ1Bx0i+Nb9+OdHwLT
-lgs1twXUptUKVH6OwWelydTSmUySuyuRDfqz/l+kSy6nPSO5Xnovsr1rUFcqOsiD
-zBKqOUGiTyapUmTLIXzYhumH8iwLCpAbek//d9mbnpGV2X+k1gPcWNlZbooX9zLV
-aRfQK1e3kTz70Q2hKGKLcKO+LHjXsG+OHZCYsS9eNea8gPPVtkppUT1jatFVkfmW
-LULzSorJ3Mjh
-=QHs0
--END PGP PUBLIC KEY BLOCK-
 pub   4096R/28D2C789 2017-05-01
 uid  Rashmi Raghu (CODE SIGNING KEY) 
 sig 328D2C789 2017-05-01  Rashmi Raghu (CODE SIGNING KEY) 

@@ -412,3 +354,62 @@ MmpdxJaKiBbKOa7Yh58uhPPk0IKt0v7bkOonp+on
 52fY8qaQ7r/SZnc=
 =HiuU
 -END PGP PUBLIC KEY BLOCK-
+
+pub   rsa4096 2018-08-07 [SC]
+  FA909C2006632A74E04FFA5063056AE412C4E3D7
+uid   [ultimate] Rahul Iyer 
+sig 363056AE412C4E3D7 2018-08-07  Rahul Iyer 
+sub   rsa4096 2018-08-07 [E]
+sig  63056AE412C4E3D7 2018-08-07  Rahul Iyer 
+
+-BEGIN PGP PUBLIC KEY BLOCK-
+
+mQINBFtpLsABEACg7KSDhekwPSjcS9iuWRVSfQJFzQa1dBIGaoACDGAfpP4zF/OS
+zjYe+RCAtttNIdmXRxmPw4xRrm27Sl1CMpogfe98I+nIMDhSAV2z4kJMDZwRzWY9
+OR0bgOl26yQolnQ77MKaBImkR6i7p0SQMSyEEtM/qwt0y4X/3msXF260X9pZgUwl
+31ls5sdPPk+yttkvCJeubJ5kMo+oI+SebVe47OUmcT0h8zbAHr5oTAhaQ+lA8Tzf
+tLalJfdcYDe4LkCQ5XyOPtIQgnhGAQXBbVKc2c1kX4UqpMFIQv8DaOnyTAnrviWG
+chIx0Tk3a3RMDSMd9Inye5hjT8sRxT/4Ya4dFGMFXVDA+0gIElxaE2s1b8dgKyCN
+48THNnXLvsql7O+gM2CibIb1lmQ+H9alxJDg/hE58SI8lZcl9qdD+lvSydSEVXij
+0VrEhwgPDWY/3AulIW1XMR0Hsiy9OKqsvVGlbUyNkr8ndm5DQVICFBeCdhaO1xt9
+CNTsBWJSnd1aw0Q/Yk4YME0YhUGY5Z/NzB2vMP/MUbeU8VXK21qSRgPWPZM1qY/T

svn commit: r28586 - /release/madlib/KEYS

2018-08-06 Thread riyer
Author: riyer
Date: Tue Aug  7 05:42:14 2018
New Revision: 28586

Log:
Update KEYS with key for Rahul Iyer

Modified:
release/madlib/KEYS

Modified: release/madlib/KEYS
==
--- release/madlib/KEYS (original)
+++ release/madlib/KEYS Tue Aug  7 05:42:14 2018
@@ -178,64 +178,6 @@ oDf5rTpHlVJwfO8Trw==
 =Dqcb
 -END PGP PUBLIC KEY BLOCK-
 
-pub   4096R/6C725F40 2016-08-30 [expires: 2018-08-30]
-uid   [ultimate] Rahul Iyer 
-sig 36C725F40 2016-08-30  Rahul Iyer 
-sub   4096R/92694331 2016-08-30 [expires: 2018-08-30]
-sig  6C725F40 2016-08-30  Rahul Iyer 
-
--BEGIN PGP PUBLIC KEY BLOCK-
-Version: GnuPG v2
-
-mQINBFfGHG0BEADwQsuVp75Msqp7z1qiRj1IPC+HVtVA/M8sljTrSGLixtrhtNW9
-Qgj8xISz5AEv7bz8r+qT1xIlKfkFujJkWsrngKKwN7/ausa5AaBTn2KzG8/2KL30
-08uNbBV2vZ901S/zcELe2X0aDU2V0v3LNG3mLMyTqB1/k/D8Y2dRMRYo6TaPdnXi
-2FyFPkWWRvG8TtlZCzUPBxxq/gGc7Xs8Dy2p2QwdII+TBLQBAfmAxbGkwMlDUO53
-VTig6BDIsd6wOYL+ZuV0dkNesksdRfpLlBUv8Q7AZfbs03HdpDCzjCOCY69kFOa6
-P4biGHWepbf6cE2GxtU8XY89cbN0Wt7pKzj6c5bAuGRwlrGJ6g8ZdVsGR9XZfe3N
-5Oe6gL/oDRCe49DPp8o92j58K6f2HY2wnr80LVeKBdKE5dmZ4Z3twL7aw9i+HeXQ
-tjlBbXUdoXR4ESVJvTP6/cYgL2wxsKUVqd7Dzj+Yoy1xfVyfI/DcHOdlQ8ztlZkn
-lkEDh9aQr+GulkplySgyQpIB3Xumc34hkDdg3LT0natG/+ZZEkbeUsEcPOrTTaTW
-0y3GDtK4g6EGiMD084yC8fct5B8J6ePmttIxZIIveHz2VgqKdtxnC6rSwVU9XG2F
-fZfr8SWMwHWX/QrJX5dDhilV4sild3UPt1kBrv2wJ1+RUt6H8dfCc0akEwARAQAB
-tB1SYWh1bCBJeWVyIDxyaXllckBhcGFjaGUub3JnPokCPwQTAQgAKQUCV8YcbQIb
-AwUJA8JnAAcLCQgHAwIBBhUIAgkKCwQWAgMBAh4BAheAAAoJEFK3lclscl9APrgP
-/iniiFZyU3Rs+9ORZGtNeXPzfmTleHwzs2tQt5DGaCx+V7TFa7lEnXd0JTcfO6aJ
-NskmKNLXLnhDJFWDB3p/BoboTtkU8HmnzffJdrECEpTZE49dMlpbmLNbWo0ZooWw
-UbEWxJFQ/RX3poQ09OAK+2F7rhxe9NNZwIW0z9RtcCr26ttoOxgsAstPuPGjjxyx
-QZKVSXA7ArBkt31YB3v2Vl7f6XrZfyVYDDO+dtUQxnwwJrxJi8LB+v8yMoOW7pkx
-LO5WMMouQYyCE9rsZjT5Q0a3ji+01pdMbcnc9tGBGBpS0ftKd9H5Q1JzfpFwPXYI
-CiYBefaSxej+3okulqLE4Mg7PEFEY2tWs3Vruz3ivfWxAiSwR8ggXWD0GpPiQt/X
-HiCh0HTshjW0G3irA13ji9iRtzquOdEAmXmtBRIUNgvhx97oxSOaFvWaz9bumZKW
-QDDVvvkGcAMwh4m22Kni9fiSATHP4xjxF+Dq87EEZFpG4hiDblyJFwVuzsPs3/lq
-fxAefh9BrcEoArxYQna20FP9lKEsnaTSIeQhPkd5x+8MTLTOH4ehBDdBoy2yVMqn
-F5yh9UQ52zdIqFOymUuqD7z1MZzPet27IrvpezgDiFaY2PXLtxL4ulUOfvDbQwV5
-u8y7s1JomFmFilOMu5dEqMtXWDZJOafMG1ZTDygEzpU1uQINBFfGHG0BEADcmAZN
-bYU+LbSuKW2MmeW+iVkjX21B+8lRKtSevssa5q5xd2ug9Aw8QWIJFSUl0IA7ZV3l
-esnV251gJUf0gDFcAMZs4zjxAA6Pfhh1+M7vpVNyZCp1g25eE1fc156miPVHOd+1
-xLTfuGY9fwfPhpyAW1nWRaM+ZYyfcU61fvZ8DxSgGnFTY7iPXUIy2bU84F/QyM7c
-+tPyHbIBnmn9CvrFP+1tAOPOOvomjsgGzCN2q681fKnzQVSfnGq4QKV9Bp98Myww
-yoVf54DqLTzoJgApeashCDwJwKDkfbhXCxhAv7yh3cco+vz6kadQlo244uGwF0ID
-WP9PfG/+PEFiZwyC+fQ6MOolnMHd2XrQ3T7nZ6su1mLNnuLC3fwBGO1dT05B5TPs
-mPGS6r3SGXNlbq2Db5A6CbnLxUZtcFtw4BGeH40JAMMhjkkKWPTVif0zkYBeeil3
-r6TWgm5Q7Y9x6Wa+VSQmCVkt6sdfV2oHVZBj/gtFdVS9shs54TP+neOexvvm8Th0
-DwcZCsBcxzQY4D+cmI/O2XwNp/bwUrD0yzDCET3MqJwl52rgMan7wi3/PEj3kIoz
-ncbYj8ja2rAPQjWKg4ffzx247q95KUKi4XGc4qWCUKDndxARHl7+bALNLpTIbjzR
-0ZlLtnAwusnY+Szi200z0lkq/2D6Xl5Ea5jm8QARAQABiQIlBBgBCAAPBQJXxhxt
-AhsMBQkDwmcAAAoJEFK3lclscl9A4G4P/j+lACZ5uInAz8qdN6UijYeMGQ5JWXe8
-Kt9Ja59KuR3e7Rc+6vSo9qOOCNrJgm+WEqeIMwQyLxwp9Oi4nkpTZCvB2zJRK48I
-hvckn8q/dWiBwWL+mnwYHI5JpM2bVsttfXhhRs5Y9XJxXATglFDG8sjZ/uU5xy/+
-R8zIrtVFXRlBD1faDEVVTDOrnHdxIgA9vV2THyls8HSiSpXNfoZorye5T/Srg7fn
-6gELbLQUplqhh1l5/8WRwJvkV1m+REK5UDJzj+vEWXdavArWLW0E1Sq3k3jdiU48
-sxkbEVj3HbIL7cMQRBTjb/mI3wYnAKr97lPpi+D6wP93GIwMubnzAaSIkZpV1z2y
-bcnaDwLF+FxH6Eli+3JjOc34Wfg0swUbHt3kJfRn0YTQnSoWV7wVGJE+mg1JhCTN
-a9DnMofVYNTt5IUqCI+xFa+N4ytIOOuTGno/qof36QYzjXNQ1Bx0i+Nb9+OdHwLT
-lgs1twXUptUKVH6OwWelydTSmUySuyuRDfqz/l+kSy6nPSO5Xnovsr1rUFcqOsiD
-zBKqOUGiTyapUmTLIXzYhumH8iwLCpAbek//d9mbnpGV2X+k1gPcWNlZbooX9zLV
-aRfQK1e3kTz70Q2hKGKLcKO+LHjXsG+OHZCYsS9eNea8gPPVtkppUT1jatFVkfmW
-LULzSorJ3Mjh
-=QHs0
--END PGP PUBLIC KEY BLOCK-
 pub   4096R/28D2C789 2017-05-01
 uid  Rashmi Raghu (CODE SIGNING KEY) 
 sig 328D2C789 2017-05-01  Rashmi Raghu (CODE SIGNING KEY) 

@@ -412,3 +354,62 @@ MmpdxJaKiBbKOa7Yh58uhPPk0IKt0v7bkOonp+on
 52fY8qaQ7r/SZnc=
 =HiuU
 -END PGP PUBLIC KEY BLOCK-
+
+pub   rsa4096 2018-08-07 [SC]
+  FA909C2006632A74E04FFA5063056AE412C4E3D7
+uid   [ultimate] Rahul Iyer 
+sig 363056AE412C4E3D7 2018-08-07  Rahul Iyer 
+sub   rsa4096 2018-08-07 [E]
+sig  63056AE412C4E3D7 2018-08-07  Rahul Iyer 
+
+-BEGIN PGP PUBLIC KEY BLOCK-
+
+mQINBFtpLsABEACg7KSDhekwPSjcS9iuWRVSfQJFzQa1dBIGaoACDGAfpP4zF/OS
+zjYe+RCAtttNIdmXRxmPw4xRrm27Sl1CMpogfe98I+nIMDhSAV2z4kJMDZwRzWY9
+OR0bgOl26yQolnQ77MKaBImkR6i7p0SQMSyEEtM/qwt0y4X/3msXF260X9pZgUwl
+31ls5sdPPk+yttkvCJeubJ5kMo+oI+SebVe47OUmcT0h8zbAHr5oTAhaQ+lA8Tzf
+tLalJfdcYDe4LkCQ5XyOPtIQgnhGAQXBbVKc2c1kX4UqpMFIQv8DaOnyTAnrviWG
+chIx0Tk3a3RMDSMd9Inye5hjT8sRxT/4Ya4dFGMFXVDA+0gIElxaE2s1b8dgKyCN
+48THNnXLvsql7O+gM2CibIb1lmQ+H9alxJDg/hE58SI8lZcl9qdD+lvSydSEVXij
+0VrEhwgPDWY/3AulIW1XMR0Hsiy9OKqsvVGlbUyNkr8ndm5DQVICFBeCdhaO1xt9
+CNTsBWJSnd1aw0Q/Yk4YME0YhUGY5Z/NzB2vMP/MUbeU8VXK21qSRgPWPZM1qY/T

[1/2] madlib git commit: DT/RF: Add function to report importance scores

2018-08-01 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master e2534e44e -> 186390f7c


DT/RF: Add function to report importance scores

JIRA: MADLIB-925

This commit adds a new MADlib function (get_var_importance) to report the
importance scores in decision tree and random forest by unnesting the
importance values along with corresponding features.

Closes #295

Co-authored-by: Rahul Iyer 
Co-authored-by: Jingyi Mei 
Co-authored-by: Orhan Kislal 


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/1aac377f
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/1aac377f
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/1aac377f

Branch: refs/heads/master
Commit: 1aac377f68d20290374c004a3a8bb2da82ab1fcc
Parents: e2534e4
Author: Nandish Jayaram 
Authored: Tue Jul 3 12:22:07 2018 -0700
Committer: Rahul Iyer 
Committed: Wed Aug 1 12:58:22 2018 -0700

--
 .../recursive_partitioning/decision_tree.cpp|  11 +-
 .../recursive_partitioning/decision_tree.hpp|   2 +-
 .../recursive_partitioning/random_forest.cpp|  15 ++
 .../recursive_partitioning/random_forest.hpp|   1 +
 .../recursive_partitioning/decision_tree.py_in  |  10 +-
 .../recursive_partitioning/decision_tree.sql_in | 102 +++---
 .../recursive_partitioning/random_forest.py_in  | 187 ++-
 .../recursive_partitioning/random_forest.sql_in | 168 +
 .../test/decision_tree.ic.sql_in|   3 +-
 .../test/decision_tree.sql_in   |  46 -
 .../test/random_forest.sql_in   |  20 +-
 .../test/unit_tests/plpy_mock.py_in |  43 +
 .../test/unit_tests/test_random_forest.py_in| 173 +
 13 files changed, 697 insertions(+), 84 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/1aac377f/src/modules/recursive_partitioning/decision_tree.cpp
--
diff --git a/src/modules/recursive_partitioning/decision_tree.cpp 
b/src/modules/recursive_partitioning/decision_tree.cpp
index d249946..0a7f7a5 100644
--- a/src/modules/recursive_partitioning/decision_tree.cpp
+++ b/src/modules/recursive_partitioning/decision_tree.cpp
@@ -488,7 +488,7 @@ print_decision_tree::run(AnyType ){
 }
 
 AnyType
-get_variable_importance::run(AnyType ){
+compute_variable_importance::run(AnyType ){
 Tree dt = args[0].getAs();
 const int n_cat_features = args[1].getAs();
 const int n_con_features = args[2].getAs();
@@ -497,19 +497,12 @@ get_variable_importance::run(AnyType ){
 ColumnVector con_var_importance = ColumnVector::Zero(n_con_features);
 dt.computeVariableImportance(cat_var_importance, con_var_importance);
 
-// Variable importance is scaled to represent a percentage. Even though
-// the importance values are split between categorical and continuous, the
-// percentages are relative to the combined set.
ColumnVector combined_var_imp(n_cat_features + n_con_features);
combined_var_imp << cat_var_importance, con_var_importance;
-
-// Avoid divide by zero by adding a small number
-double total_var_imp = combined_var_imp.sum();
-double VAR_IMP_EPSILON = 1e-6;
-combined_var_imp *=  (100.0 / (total_var_imp + VAR_IMP_EPSILON));
 return combined_var_imp;
 }
 
+
 AnyType
 display_text_tree::run(AnyType ){
 Tree dt = args[0].getAs();

http://git-wip-us.apache.org/repos/asf/madlib/blob/1aac377f/src/modules/recursive_partitioning/decision_tree.hpp
--
diff --git a/src/modules/recursive_partitioning/decision_tree.hpp 
b/src/modules/recursive_partitioning/decision_tree.hpp
index ae62bfa..8cb6703 100644
--- a/src/modules/recursive_partitioning/decision_tree.hpp
+++ b/src/modules/recursive_partitioning/decision_tree.hpp
@@ -14,7 +14,7 @@ DECLARE_UDF(recursive_partitioning, 
compute_surr_stats_transition)
 DECLARE_UDF(recursive_partitioning, dt_surr_apply)
 
 DECLARE_UDF(recursive_partitioning, print_decision_tree)
-DECLARE_UDF(recursive_partitioning, get_variable_importance)
+DECLARE_UDF(recursive_partitioning, compute_variable_importance)
 DECLARE_UDF(recursive_partitioning, predict_dt_response)
 DECLARE_UDF(recursive_partitioning, predict_dt_prob)
 

http://git-wip-us.apache.org/repos/asf/madlib/blob/1aac377f/src/modules/recursive_partitioning/random_forest.cpp
--
diff --git a/src/modules/recursive_partitioning/random_forest.cpp 
b/src/modules/recursive_partitioning/random_forest.cpp
index 70ebbaa..a12f095 100644
--- a/src/modules/recursive_partitioning/random_forest.cpp
+++ b/src/modules/recursive_partitioning/random_forest.cpp
@@ -204,6 +204,21 @@ rf_con_imp_score::run(AnyType ) {
 // 

[2/2] madlib git commit: DT/RF: Fix user doc examples

2018-08-01 Thread riyer
DT/RF: Fix user doc examples


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/186390f7
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/186390f7
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/186390f7

Branch: refs/heads/master
Commit: 186390f7c2af5ad886a4d5b77d0792b68cd3414d
Parents: 1aac377
Author: Frank McQuillan 
Authored: Wed Aug 1 12:49:10 2018 -0700
Committer: Rahul Iyer 
Committed: Wed Aug 1 12:58:44 2018 -0700

--
 .../recursive_partitioning/decision_tree.sql_in | 16 ++--
 .../recursive_partitioning/random_forest.sql_in | 12 +++-
 2 files changed, 17 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/186390f7/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in
--
diff --git 
a/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in 
b/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in
index 469f1b2..5926152 100644
--- a/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in
+++ b/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in
@@ -284,14 +284,17 @@ tree_train(
   impurity_var_importance
   DOUBLE PRECISION[]. Impurity importance of each variable.
   The order of the variables is the same as
-  that of 'independent_varnames' column in the summary table (see below).
+  that of the 'independent_varnames' column in the summary table (see 
below).
 
   The impurity importance of any feature is the decrease in impurity by a
   node containing the feature as a primary split, summed over the whole
   tree. If surrogates are used, then the importance value includes the
   impurity decrease scaled by the adjusted surrogate agreement.
-  Reported importance values are normalized to sum to 100 across
-  all variables.
+  Importance values are displayed as raw values as per the 
'split_criterion'
+  parameter.
+  To see importance values normalized to sum to 100 across
+  all variables, use the importance display helper function 
+  described later on this page. 
   Please refer to [1] for more information on variable importance.
   
   
@@ -727,7 +730,7 @@ independent_var_types   | text, boolean, double 
precision
 n_folds | 0
 null_proxy  |
 
-View the impurity importance table using the helper function:
+View the normalized impurity importance table using the helper function:
 
 \\x off
 DROP TABLE IF EXISTS imp_output;
@@ -,10 +1114,11 @@ which shows ordering of levels of categorical variables 
'vs' and 'cyl':
 SELECT pruning_cp, cat_levels_in_text, cat_n_levels, impurity_var_importance, 
tree_depth FROM train_output;
 
 
+-[ RECORD 1 
]---+
 pruning_cp  | 0
 cat_levels_in_text  | {0,1,4,6,8}
 cat_n_levels| {2,3}
-impurity_var_importance | 
{0,51.8593201959496,10.976977929129,5.31897402755374,31.8447278473677}
+impurity_var_importance | 
{0,22.6309172500675,4.79024943310651,2.321153,13.8967382920111}
 tree_depth  | 4
 
 View the summary table:
@@ -1147,7 +1151,7 @@ independent_var_types   | integer, integer, double 
precision, double precisi
 n_folds | 0
 null_proxy  |
 
-View the impurity importance table using the helper function:
+View the normalized impurity importance table using the helper function:
 
 \\x off
 DROP TABLE IF EXISTS imp_output;

http://git-wip-us.apache.org/repos/asf/madlib/blob/186390f7/src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in
--
diff --git 
a/src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in 
b/src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in
index 39b6f5d..5b5a0f0 100644
--- a/src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in
+++ b/src/ports/postgres/modules/recursive_partitioning/random_forest.sql_in
@@ -164,7 +164,9 @@ forest_train(training_table_name,
 Due to nature of permutation, the importance value can end up being
 negative if the number of levels for a categorical variable is small and is
 unbalanced. In such a scenario, the importance values are shifted to ensure
-that the lowest importance value is 0.
+that the lowest importance value is 0.  To see importance values 
normalized 
+to sum to 100 across all variables, use the importance display helper 
function 
+described later on this page. 
 
   
 
@@ -758,7 +760,7 @@ the variables in 'independent_varnames'

madlib git commit: DT/RF: Don't eliminate single-level cat variable

2018-08-01 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master 20f95b33b -> e2534e44e


DT/RF: Don't eliminate single-level cat variable

JIRA: MADLIB-1258

When DT/RF is run with grouping, a subset of the groups could eliminate
a categorical variable leading to multiple issues downstream, including
invalid importance values and incorrect prediction.

This commit keeps all categorical variables (even if it contains just
one level). The accumulator state would use additional space during
tree_train for this categorical variable, even though the variable is
never consumed by the tree. This inefficiency is still preferred since
it yields clean code and error-free prediction/importance reporting.

Additional changes:
- get_expr_type (validate_args.py) has been updated to return type for
multiple expressions at the same time. This prevents calling a separate
query for each expression, thus saving time.
- Cat features are not stored per tree (in the grouping case) anymore
since the features are now consistent across trees.

Closes #301

Co-authored-by: Nandish Jayaram 


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/e2534e44
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/e2534e44
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/e2534e44

Branch: refs/heads/master
Commit: e2534e44ea36aedec843a3a7c48236d0e1104e2c
Parents: 20f95b3
Author: Rahul Iyer 
Authored: Thu Jul 26 12:17:58 2018 -0700
Committer: Rahul Iyer 
Committed: Wed Aug 1 12:51:13 2018 -0700

--
 src/modules/recursive_partitioning/DT_impl.hpp  |  91 
 .../recursive_partitioning/decision_tree.cpp|  21 +-
 .../recursive_partitioning/decision_tree.py_in  | 217 +--
 .../recursive_partitioning/random_forest.py_in  | 120 +-
 .../test/decision_tree.sql_in   |  83 +++
 .../test/random_forest.sql_in   |  46 ++--
 .../modules/utilities/validate_args.py_in   |  49 +++--
 7 files changed, 319 insertions(+), 308 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/e2534e44/src/modules/recursive_partitioning/DT_impl.hpp
--
diff --git a/src/modules/recursive_partitioning/DT_impl.hpp 
b/src/modules/recursive_partitioning/DT_impl.hpp
index 69bdc88..75e4ce4 100644
--- a/src/modules/recursive_partitioning/DT_impl.hpp
+++ b/src/modules/recursive_partitioning/DT_impl.hpp
@@ -518,6 +518,7 @@ DecisionTree::expand(const Accumulator ,
 double gain = impurityGain(
 state.cat_stats.row(stats_i).
 segment(fv_index, sps * 2), sps);
+
 if (gain > max_impurity_gain){
 max_impurity_gain = gain;
 max_feat = f;
@@ -665,21 +666,29 @@ DecisionTree::pickSurrogates(
 
 // 1. Compute the max count and corresponding split threshold for
 // each categorical and continuous feature
+
 ColumnVector cat_max_thres = ColumnVector::Zero(n_cats);
 ColumnVector cat_max_count = ColumnVector::Zero(n_cats);
 IntegerVector cat_max_is_reverse = IntegerVector::Zero(n_cats);
 Index prev_cum_levels = 0;
 for (Index each_cat=0; each_cat < n_cats; each_cat++){
 Index n_levels = state.cat_levels_cumsum(each_cat) - 
prev_cum_levels;
-Index max_label;
-(cat_stats_counts.row(stats_i).segment(
-prev_cum_levels * 2, n_levels * 2)).maxCoeff(_label);
-cat_max_thres(each_cat) = static_cast(max_label / 2);
-cat_max_count(each_cat) =
-cat_stats_counts(stats_i, prev_cum_levels*2 + 
max_label);
-// every odd col is for reverse, hence i % 2 == 1 for reverse 
index i
-cat_max_is_reverse(each_cat) = (max_label % 2 == 1) ? 1 : 0;
-prev_cum_levels = state.cat_levels_cumsum(each_cat);
+if (n_levels > 0){
+Index max_label;
+(cat_stats_counts.row(stats_i).segment(
+prev_cum_levels * 2, n_levels * 
2)).maxCoeff(_label);
+
+// For each split, there are two stats =>
+//  max_label / 2 gives the split index. A floor
+// operation is unnecessary since the threshold will yield
+// the same results for n and n+0.5.
+cat_max_thres(each_cat) = static_cast(max_label / 
2);
+cat_max_count(each_cat) =
+cat_stats_counts(stats_i, prev_cum_levels*2 + 
max_label);
+// every odd col is for reverse, hence i % 2 == 1 for 
reverse index i
+ 

madlib git commit: Madpack: Fix missing test logs bug.

2018-07-26 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master 836759e69 -> a0cfcf8f7


Madpack: Fix missing test logs bug.

Due to a recent commit, madpack cleaned log files of test operations as
well as the atomic operations. As a result, log files are missing even
after install/dev check fails. This commit fixes this issue.

Closes #300

Co-authored-by: Jingyi Mei 


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/a0cfcf8f
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/a0cfcf8f
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/a0cfcf8f

Branch: refs/heads/master
Commit: a0cfcf8f7fc31179ce0b22b18ca77bad2e65a0e4
Parents: 836759e
Author: Orhan Kislal 
Authored: Wed Jul 25 15:05:08 2018 -0700
Committer: Rahul Iyer 
Committed: Thu Jul 26 12:18:07 2018 -0700

--
 src/madpack/madpack.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/a0cfcf8f/src/madpack/madpack.py
--
diff --git a/src/madpack/madpack.py b/src/madpack/madpack.py
index 5382bd8..385ab36 100755
--- a/src/madpack/madpack.py
+++ b/src/madpack/madpack.py
@@ -712,8 +712,8 @@ def _process_py_sql_files_in_modules(modset, args_dict):
 cur_tmpdir)
 else:
 error_(this, "Something is wrong, shouldn't be here: %s" % 
src_file, True)
-shutil.rmtree(cur_tmpdir)
-
+if calling_operation == DB_CREATE_OBJECTS:
+shutil.rmtree(cur_tmpdir)
 # 
--
 def _execute_per_module_db_create_obj_algo(schema, maddir_mod_py, module,
sqlfile, algoname, cur_tmpdir,



madlib git commit: Multiple: Clean and update documentation

2018-07-26 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master 2aac41897 -> 836759e69


Multiple: Clean and update documentation

Closes #298


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/836759e6
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/836759e6
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/836759e6

Branch: refs/heads/master
Commit: 836759e69ae617ffc0cd7640cb7ca76b25e69c1d
Parents: 2aac418
Author: Frank McQuillan 
Authored: Tue Jul 24 17:20:18 2018 -0700
Committer: Rahul Iyer 
Committed: Thu Jul 26 11:33:34 2018 -0700

--
 doc/mainpage.dox.in | 174 +--
 src/ports/postgres/modules/convex/mlp.sql_in|   3 +-
 .../modules/utilities/encode_categorical.sql_in |   3 +-
 .../postgres/modules/utilities/path.sql_in  |   3 +-
 .../postgres/modules/utilities/pivot.sql_in |   3 +-
 .../modules/utilities/sessionize.sql_in |   3 +-
 .../postgres/modules/utilities/utilities.sql_in |   5 +-
 7 files changed, 94 insertions(+), 100 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/836759e6/doc/mainpage.dox.in
--
diff --git a/doc/mainpage.dox.in b/doc/mainpage.dox.in
index c2c9a7a..8f97491 100644
--- a/doc/mainpage.dox.in
+++ b/doc/mainpage.dox.in
@@ -2,13 +2,9 @@
 @mainpage
 Apache MADlib is an open-source library for scalable
 in-database analytics. It provides data-parallel implementations of
-mathematical, statistical and machine learning methods for structured
+mathematical, statistical, graph and machine learning methods for structured
 and unstructured data.
 
-The MADlib mission: to foster widespread development of scalable analytic
-skills, by harnessing efforts from commercial practice, academic research,
-and open-source development.
-
 Useful links:
 
 http://madlib.apache.org;>MADlib web site
@@ -21,32 +17,22 @@ Useful links:
 v1.13,
 v1.12,
 v1.11,
-v1.10.0,
-v1.9.1,
-v1.9,
-v1.8,
-v1.7.1,
-v1.7,
-v1.6,
-v1.5,
-v1.4,
-v1.3,
-v1.2
+v1.10
 
 
 
 Please refer to the
 https://github.com/apache/madlib/blob/master/README.md;>ReadMe
 file for information about incorporated third-party material. License 
information
-regarding MADlib and included third-party libraries can be found inside the
+regarding MADlib and included third-party libraries can be found in the
 https://github.com/apache/madlib/blob/master/LICENSE;>
 License directory.
 
 @defgroup grp_datatrans Data Types and Transformations
-@{Data types and transformation operations @}
+@details Data types and operations that transform and shape data.
 @defgroup grp_arraysmatrix Arrays and Matrices
 @ingroup grp_datatrans
-@brief Mathematical operations for arrays and matrices
+@brief Mathematical operations for arrays and matrices.
 @details
 These modules provide basic mathematical operations to be run on array and 
matrices.
 
@@ -100,13 +86,14 @@ complete matrix stored as a distributed table.
 @defgroup grp_matrix Matrix Operations
 
 @defgroup grp_matrix_factorization Matrix Factorization
-@brief Matrix Factorization methods including Singular Value 
Decomposition and Low-rank Matrix Factorization
+@brief Linear algebra methods that factorize a matrix into a product 
of matrices.
+@details Linear algebra methods that factorize a matrix into a product 
of matrices.
 @{
 @defgroup grp_lmf Low-Rank Matrix Factorization
 @defgroup grp_svd Singular Value Decomposition
 @}
 
-@defgroup grp_linalg Norms and Distance functions
+@defgroup grp_linalg Norms and Distance Functions
 @defgroup grp_svec Sparse Vectors
 @}
 
@@ -126,49 +113,58 @@ complete matrix stored as a distributed table.
 @ingroup grp_datatrans
 
 @defgroup grp_graph Graph
-Contains graph algorithms.
+@brief Graph algorithms and measures associated with graphs.
+@details Graph algorithms and measures associated with graphs.
 @{
 @defgroup grp_apsp All Pairs Shortest Path
 @defgroup grp_bfs Breadth-First Search
 @defgroup grp_hits HITS
+
 @defgroup grp_graph_measures Measures
-Graph Measures
+@brief A collection of metrics computed on a graph.
+@details A collection of metrics computed on a graph.
 @{
 @defgroup grp_graph_avg_path_length Average Path Length
 @defgroup grp_graph_closeness Closeness
 @defgroup grp_graph_diameter Graph Diameter
 @defgroup grp_graph_vertex_degrees In-Out Degree
 @}
+
 @defgroup grp_pagerank PageRank
 @defgroup grp_sssp Single Source Shortest Path
 @defgroup grp_wcc Weakly Connected Components
 @}
 
 @defgroup grp_mdl Model 

madlib git commit: Cols2Vec: Add Apache License header

2018-07-18 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master 4349e7722 -> ebd453cbb


Cols2Vec: Add Apache License header


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/ebd453cb
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/ebd453cb
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/ebd453cb

Branch: refs/heads/master
Commit: ebd453cbbfaaed1f06308d8f10f108337da5a783
Parents: 4349e77
Author: Rahul Iyer 
Authored: Wed Jul 18 16:30:57 2018 -0700
Committer: Rahul Iyer 
Committed: Wed Jul 18 16:30:57 2018 -0700

--
 .../postgres/modules/utilities/cols2vec.py_in| 19 +++
 1 file changed, 19 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/ebd453cb/src/ports/postgres/modules/utilities/cols2vec.py_in
--
diff --git a/src/ports/postgres/modules/utilities/cols2vec.py_in 
b/src/ports/postgres/modules/utilities/cols2vec.py_in
index b38b3d6..4f2b1c9 100644
--- a/src/ports/postgres/modules/utilities/cols2vec.py_in
+++ b/src/ports/postgres/modules/utilities/cols2vec.py_in
@@ -1,3 +1,22 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
 """
 @file cols2vec.py_in
 



[1/2] madlib git commit: Utilities: Add cols2vec() to convert columns to array

2018-07-16 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master 3b527b82a -> 950114ccd


Utilities: Add cols2vec() to convert columns to array

JIRA: MADLIB-1239

This commit adds a new function called cols2vec that can be used to
convert features from multiple columns of an input table into a feature
array in a single column.


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/2828d86a
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/2828d86a
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/2828d86a

Branch: refs/heads/master
Commit: 2828d86a64bedddb6849913eaaf7734042922e6e
Parents: 3b527b8
Author: Himanshu Pandey 
Authored: Fri Jun 15 01:33:27 2018 -0700
Committer: Rahul Iyer 
Committed: Sun Jul 15 23:35:14 2018 -0700

--
 doc/mainpage.dox.in |   3 +
 .../postgres/modules/utilities/cols2vec.py_in   | 111 +++
 .../postgres/modules/utilities/cols2vec.sql_in  | 191 +++
 .../modules/utilities/test/cols2vec.sql_in  |  89 +
 4 files changed, 394 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/2828d86a/doc/mainpage.dox.in
--
diff --git a/doc/mainpage.dox.in b/doc/mainpage.dox.in
index e41e6c9..341f115 100644
--- a/doc/mainpage.dox.in
+++ b/doc/mainpage.dox.in
@@ -284,6 +284,9 @@ Contains graph algorithms.
 @defgroup @grp_utilities Utilities
 @ingroup grp_other_functions
 
+@defgroup grp_cols2vec Columns to Vector
+@ingroup grp_utility_functions
+
 @defgroup grp_early_stage Early Stage Development
 @brief A collection of implementations which are in early stage of development.
 There may be some issues that will be addressed in a future version.

http://git-wip-us.apache.org/repos/asf/madlib/blob/2828d86a/src/ports/postgres/modules/utilities/cols2vec.py_in
--
diff --git a/src/ports/postgres/modules/utilities/cols2vec.py_in 
b/src/ports/postgres/modules/utilities/cols2vec.py_in
new file mode 100644
index 000..ced53e9
--- /dev/null
+++ b/src/ports/postgres/modules/utilities/cols2vec.py_in
@@ -0,0 +1,111 @@
+"""
+@file cols2vec.py_in
+
+@brief Utility to convert Columns to array
+
+"""
+
+import plpy
+from utilities.control import MinWarning
+from utilities.utilities import split_quoted_delimited_str
+from utilities.utilities import _string_to_array
+from utilities.utilities import _assert
+from utilities.validate_args import columns_exist_in_table
+from utilities.validate_args import is_var_valid
+from utilities.validate_args import get_cols
+from utilities.validate_args import quote_ident
+from utilities.utilities import py_list_to_sql_string
+
+
+m4_changequote(`')
+
+
+def validate_cols2vec_args(source_table, output_table,
+   list_of_features, list_of_features_to_exclude, 
cols_to_output, **kwargs):
+"""
+Function to validate input parameters
+"""
+if list_of_features.strip() != '*':
+if not (list_of_features and list_of_features.strip()):
+plpy.error("Features to include is empty")
+_assert(
+columns_exist_in_table(
+source_table, split_quoted_delimited_str(list_of_features)),
+"Invalid columns to list of features {0}".format(list_of_features))
+
+if cols_to_output and cols_to_output.strip() != '*':
+_assert(
+columns_exist_in_table(
+source_table, _string_to_array(cols_to_output)),
+"Invalid columns to output list {0}".format(cols_to_output))
+
+
+def cols2vec(schema_madlib, source_table, output_table, list_of_features,
+ list_of_features_to_exclude=None, cols_to_output=None, **kwargs):
+"""
+Args:
+@param schema_madlib:   Name of MADlib schema
+@param model:   Name of table containing the tree 
model
+@param source_table:Name of table containing 
prediction data
+@param output_table:Name of table to output the results
+@param list_of_features:Comma-separated string of column 
names or
+expressions to put into feature 
array.
+Can also be a '*' implying all 
columns
+are to be put into feature array.
+@param list_of_features_to_exclude: Comma-separated string of column 
names
+to exclude from the feature array
+@param cols_to_output:  Comma-separated string of column 
names
+from the source table to keep in 
the output 

[2/2] madlib git commit: Utilties: Refactor and clean cols2vec from 2828d86

2018-07-16 Thread riyer
Utilties: Refactor and clean cols2vec from 2828d86

JIRA: MADLIB-1239

Closes #288


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/950114cc
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/950114cc
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/950114cc

Branch: refs/heads/master
Commit: 950114ccdbbdd81750624a41390d5a35d11c008a
Parents: 2828d86
Author: Rahul Iyer 
Authored: Thu Jul 12 16:44:57 2018 -0700
Committer: Rahul Iyer 
Committed: Sun Jul 15 23:36:01 2018 -0700

--
 doc/mainpage.dox.in |   5 +-
 .../postgres/modules/utilities/cols2vec.py_in   | 110 ++--
 .../postgres/modules/utilities/cols2vec.sql_in  | 173 ++-
 .../modules/utilities/test/cols2vec.sql_in  |  54 +++---
 .../postgres/modules/utilities/utilities.py_in  |   6 +-
 5 files changed, 183 insertions(+), 165 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/950114cc/doc/mainpage.dox.in
--
diff --git a/doc/mainpage.dox.in b/doc/mainpage.dox.in
index 341f115..c2c9a7a 100644
--- a/doc/mainpage.dox.in
+++ b/doc/mainpage.dox.in
@@ -262,6 +262,9 @@ Contains graph algorithms.
 
 
 @defgroup grp_other_functions Other Functions
+@defgroup grp_cols2vec Columns to Vector
+@ingroup grp_other_functions
+
 @defgroup grp_linear_solver Linear Solvers
 @ingroup grp_other_functions
 @{A collection of methods that implement solutions for systems of 
consistent linear equations. @}
@@ -284,8 +287,6 @@ Contains graph algorithms.
 @defgroup @grp_utilities Utilities
 @ingroup grp_other_functions
 
-@defgroup grp_cols2vec Columns to Vector
-@ingroup grp_utility_functions
 
 @defgroup grp_early_stage Early Stage Development
 @brief A collection of implementations which are in early stage of development.

http://git-wip-us.apache.org/repos/asf/madlib/blob/950114cc/src/ports/postgres/modules/utilities/cols2vec.py_in
--
diff --git a/src/ports/postgres/modules/utilities/cols2vec.py_in 
b/src/ports/postgres/modules/utilities/cols2vec.py_in
index ced53e9..b38b3d6 100644
--- a/src/ports/postgres/modules/utilities/cols2vec.py_in
+++ b/src/ports/postgres/modules/utilities/cols2vec.py_in
@@ -6,15 +6,17 @@
 """
 
 import plpy
-from utilities.control import MinWarning
-from utilities.utilities import split_quoted_delimited_str
-from utilities.utilities import _string_to_array
-from utilities.utilities import _assert
-from utilities.validate_args import columns_exist_in_table
-from utilities.validate_args import is_var_valid
-from utilities.validate_args import get_cols
-from utilities.validate_args import quote_ident
-from utilities.utilities import py_list_to_sql_string
+from control import MinWarning
+from internal.db_utils import quote_literal
+from utilities import split_quoted_delimited_str
+from utilities import _string_to_array
+from utilities import _assert
+from utilities import add_postfix
+from validate_args import columns_exist_in_table
+from validate_args import is_var_valid
+from validate_args import get_cols
+from validate_args import quote_ident
+from utilities import py_list_to_sql_string
 
 
 m4_changequote(`')
@@ -31,12 +33,12 @@ def validate_cols2vec_args(source_table, output_table,
 _assert(
 columns_exist_in_table(
 source_table, split_quoted_delimited_str(list_of_features)),
-"Invalid columns to list of features {0}".format(list_of_features))
+"Invalid columns in list_of_features {0}".format(list_of_features))
 
 if cols_to_output and cols_to_output.strip() != '*':
 _assert(
 columns_exist_in_table(
-source_table, _string_to_array(cols_to_output)),
+source_table, split_quoted_delimited_str(cols_to_output)),
 "Invalid columns to output list {0}".format(cols_to_output))
 
 
@@ -44,68 +46,64 @@ def cols2vec(schema_madlib, source_table, output_table, 
list_of_features,
  list_of_features_to_exclude=None, cols_to_output=None, **kwargs):
 """
 Args:
-@param schema_madlib:   Name of MADlib schema
-@param model:   Name of table containing the tree 
model
-@param source_table:Name of table containing 
prediction data
-@param output_table:Name of table to output the results
-@param list_of_features:Comma-separated string of column 
names or
-expressions to put into feature 
array.
-Can also be a '*' implying all 
columns
-

madlib git commit: Utils: Simplify proxy quote function

2018-07-13 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master 5e47c8e4c -> 3b527b82a


Utils: Simplify proxy quote function

Commit 5e47c8e added a wrapper quote_literal function that called
plpy.quote_literal if available, else returned dollar-quoted string.
We can use Python's introspection to switch between these two
options at runtime instead of a compile-time preprocessor switch.


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/3b527b82
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/3b527b82
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/3b527b82

Branch: refs/heads/master
Commit: 3b527b82a316fc893ad0695e8805307387351634
Parents: 5e47c8e
Author: Rahul Iyer 
Authored: Fri Jul 13 10:59:33 2018 -0700
Committer: Rahul Iyer 
Committed: Fri Jul 13 10:59:33 2018 -0700

--
 src/ports/greenplum/cmake/GreenplumUtils.cmake | 3 +--
 src/ports/postgres/cmake/PostgreSQLUtils.cmake | 4 
 src/ports/postgres/modules/internal/db_utils.py_in | 8 
 3 files changed, 5 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/3b527b82/src/ports/greenplum/cmake/GreenplumUtils.cmake
--
diff --git a/src/ports/greenplum/cmake/GreenplumUtils.cmake 
b/src/ports/greenplum/cmake/GreenplumUtils.cmake
index 5ec271e..0fc1637 100644
--- a/src/ports/greenplum/cmake/GreenplumUtils.cmake
+++ b/src/ports/greenplum/cmake/GreenplumUtils.cmake
@@ -9,9 +9,8 @@ function(define_greenplum_features IN_VERSION OUT_FEATURES)
 list(APPEND ${OUT_FEATURES} __HAS_FUNCTION_PROPERTIES__)
 endif()
 
-if(NOT ${IN_VERSION} VERSION_LESS "6.0")
+if(${IN_VERSION} VERSION_GREATER "4.3")
 list(APPEND ${OUT_FEATURES} __HAS_BOOL_TO_TEXT_CAST__)
-list(APPEND ${OUT_FEATURES} __HAS_PLPY_QUOTE_FUNCTIONS__)
 endif()
 
 # Pass values to caller

http://git-wip-us.apache.org/repos/asf/madlib/blob/3b527b82/src/ports/postgres/cmake/PostgreSQLUtils.cmake
--
diff --git a/src/ports/postgres/cmake/PostgreSQLUtils.cmake 
b/src/ports/postgres/cmake/PostgreSQLUtils.cmake
index e08effe..0139015 100644
--- a/src/ports/postgres/cmake/PostgreSQLUtils.cmake
+++ b/src/ports/postgres/cmake/PostgreSQLUtils.cmake
@@ -6,10 +6,6 @@ function(define_postgresql_features IN_VERSION OUT_FEATURES)
 list(APPEND ${OUT_FEATURES} __HAS_BOOL_TO_TEXT_CAST__)
 endif()
 
-if(NOT ${IN_VERSION} VERSION_LESS "9.1")
-list(APPEND ${OUT_FEATURES} __HAS_PLPY_QUOTE_FUNCTIONS__)
-endif()
-
 # Pass values to caller
 set(${OUT_FEATURES} "${${OUT_FEATURES}}" PARENT_SCOPE)
 endfunction(define_postgresql_features)

http://git-wip-us.apache.org/repos/asf/madlib/blob/3b527b82/src/ports/postgres/modules/internal/db_utils.py_in
--
diff --git a/src/ports/postgres/modules/internal/db_utils.py_in 
b/src/ports/postgres/modules/internal/db_utils.py_in
index 4c41515..c75babf 100644
--- a/src/ports/postgres/modules/internal/db_utils.py_in
+++ b/src/ports/postgres/modules/internal/db_utils.py_in
@@ -24,8 +24,6 @@ from utilities.validate_args import get_expr_type
 m4_changequote(`')
 
 QUOTE_DELIMITER="$__madlib__$"
-HAS_PLPY_QUOTE_FUNCTIONS = m4_ifdef(,
-, );
 
 
 def get_distinct_col_levels(source_table, col_name, col_type=None):
@@ -73,9 +71,11 @@ def quote_literal(input_str):
 provided as a proxy for that platform. For all other platforms this
 function, forwards the argument to plpy.quote_literal.
 """
-if HAS_PLPY_QUOTE_FUNCTIONS:
+try:
 return plpy.quote_literal(input_str)
-else:
+except AttributeError:
+# plpy.quote_literal is not supported, we work around by returning
+# dollar-quoted string with obscure tag
 return "{qd}{input_str}{qd}".format(qd=QUOTE_DELIMITER,
 input_str=input_str)
 # 
--



madlib git commit: Utils: Add a Python quote_literal for GP platforms

2018-07-13 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master e64dba4eb -> 5e47c8e4c


Utils: Add a Python quote_literal for GP platforms

Versions prior to GPBD 6 or Postgresql 9.1 do not provide
plpy.quote_literal which is necessary for building a SQL text array from
a Python list of strings.  We work around this limitation by creating
our own quote_literal function that just returns plpy.quote_literal
output for platforms that provide the function. For other platforms, we
compromise by using dollar-quoting (with a obscure tag between the
dollars).


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/5e47c8e4
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/5e47c8e4
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/5e47c8e4

Branch: refs/heads/master
Commit: 5e47c8e4cce205c5ecfda5e2e1d6bdc0a7330603
Parents: e64dba4
Author: Rahul Iyer 
Authored: Thu Jul 12 22:46:07 2018 -0700
Committer: Rahul Iyer 
Committed: Fri Jul 13 00:40:41 2018 -0700

--
 src/ports/greenplum/cmake/GreenplumUtils.cmake  |  3 +-
 src/ports/postgres/cmake/PostgreSQLUtils.cmake  |  4 +
 src/ports/postgres/modules/convex/mlp_igd.py_in |  4 +-
 .../postgres/modules/internal/db_utils.py_in| 77 +++-
 4 files changed, 50 insertions(+), 38 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/5e47c8e4/src/ports/greenplum/cmake/GreenplumUtils.cmake
--
diff --git a/src/ports/greenplum/cmake/GreenplumUtils.cmake 
b/src/ports/greenplum/cmake/GreenplumUtils.cmake
index 0fc1637..5ec271e 100644
--- a/src/ports/greenplum/cmake/GreenplumUtils.cmake
+++ b/src/ports/greenplum/cmake/GreenplumUtils.cmake
@@ -9,8 +9,9 @@ function(define_greenplum_features IN_VERSION OUT_FEATURES)
 list(APPEND ${OUT_FEATURES} __HAS_FUNCTION_PROPERTIES__)
 endif()
 
-if(${IN_VERSION} VERSION_GREATER "4.3")
+if(NOT ${IN_VERSION} VERSION_LESS "6.0")
 list(APPEND ${OUT_FEATURES} __HAS_BOOL_TO_TEXT_CAST__)
+list(APPEND ${OUT_FEATURES} __HAS_PLPY_QUOTE_FUNCTIONS__)
 endif()
 
 # Pass values to caller

http://git-wip-us.apache.org/repos/asf/madlib/blob/5e47c8e4/src/ports/postgres/cmake/PostgreSQLUtils.cmake
--
diff --git a/src/ports/postgres/cmake/PostgreSQLUtils.cmake 
b/src/ports/postgres/cmake/PostgreSQLUtils.cmake
index 0139015..e08effe 100644
--- a/src/ports/postgres/cmake/PostgreSQLUtils.cmake
+++ b/src/ports/postgres/cmake/PostgreSQLUtils.cmake
@@ -6,6 +6,10 @@ function(define_postgresql_features IN_VERSION OUT_FEATURES)
 list(APPEND ${OUT_FEATURES} __HAS_BOOL_TO_TEXT_CAST__)
 endif()
 
+if(NOT ${IN_VERSION} VERSION_LESS "9.1")
+list(APPEND ${OUT_FEATURES} __HAS_PLPY_QUOTE_FUNCTIONS__)
+endif()
+
 # Pass values to caller
 set(${OUT_FEATURES} "${${OUT_FEATURES}}" PARENT_SCOPE)
 endfunction(define_postgresql_features)

http://git-wip-us.apache.org/repos/asf/madlib/blob/5e47c8e4/src/ports/postgres/modules/convex/mlp_igd.py_in
--
diff --git a/src/ports/postgres/modules/convex/mlp_igd.py_in 
b/src/ports/postgres/modules/convex/mlp_igd.py_in
index 3ab7f45..7df44ec 100644
--- a/src/ports/postgres/modules/convex/mlp_igd.py_in
+++ b/src/ports/postgres/modules/convex/mlp_igd.py_in
@@ -33,7 +33,7 @@ from convex.utils_regularization import 
__utils_normalize_data_grouping
 
 from internal.db_utils import get_distinct_col_levels
 from internal.db_utils import get_one_hot_encoded_expr
-from internal.db_utils import quote_literal_python_list
+from internal.db_utils import quote_literal
 from utilities.control import MinWarning
 from utilities.in_mem_group_control import GroupIterationController
 from utilities.utilities import _array_to_string
@@ -145,7 +145,7 @@ def mlp(schema_madlib, source_table, output_table, 
independent_varname,
 dim=2)
 if is_classification:
 if pp_summary_dict["class_values"]:
-classes = 
quote_literal_python_list(pp_summary_dict["class_values"])
+classes = [quote_literal(c) for c in 
pp_summary_dict["class_values"]]
 num_output_nodes = len(classes)
 else:
 # Assume that the dependent variable is already one-hot-encoded

http://git-wip-us.apache.org/repos/asf/madlib/blob/5e47c8e4/src/ports/postgres/modules/internal/db_utils.py_in
--
diff --git a/src/ports/postgres/modules/internal/db_utils.py_in 
b/src/ports/postgres/modules/internal/db_utils.py_in
index e82ba91..4c41515 100644
--- a/src/ports/postgres/modules/internal/db_utils.py_in

madlib git commit: Madpack: Fix glob expansion for dev-check

2018-07-12 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master a47cd1ff5 -> e64dba4eb


Madpack: Fix glob expansion for dev-check


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/e64dba4e
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/e64dba4e
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/e64dba4e

Branch: refs/heads/master
Commit: e64dba4ebe2c2918a1c3a54cb83e55e1875a7261
Parents: a47cd1f
Author: Rahul Iyer 
Authored: Thu Jul 12 17:11:54 2018 -0700
Committer: Rahul Iyer 
Committed: Thu Jul 12 22:18:02 2018 -0700

--
 src/madpack/madpack.py | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/e64dba4e/src/madpack/madpack.py
--
diff --git a/src/madpack/madpack.py b/src/madpack/madpack.py
index f21f2c0..1444c26 100755
--- a/src/madpack/madpack.py
+++ b/src/madpack/madpack.py
@@ -329,10 +329,11 @@ def _parse_result_logfile(retval, logfile, sql_abspath,
 "|Time: %d milliseconds" % (milliseconds)
 
 if result == 'FAIL':
-info_(this, "Failed executing %s" % sql_abspath, True)
-info_(this, "Check the log at %s" % logfile, True)
+error_(this, "Failed executing %s" % sql_abspath, stop=False)
+error_(this, "Check the log at %s" % logfile, stop=False)
 return result
 
+
 def _check_db_port(portid):
 """
 Make sure we are connected to the expected DB platform
@@ -888,11 +889,13 @@ def run_install_check(args, testcase, madpack_cmd):
   % (test_user, test_schema, schema)
 
 # Loop through all test SQL files for this module
+ic_sql_files = set(glob.glob(maddir_mod_sql + '/' + module + 
'/test/*.ic.sql_in'))
 if is_install_check:
-sql_files = maddir_mod_sql + '/' + module + '/test/*.ic.sql_in'
+sql_files = ic_sql_files
 else:
-sql_files = maddir_mod_sql + '/' + module + 
'/test/*[!ic].sql_in'
-for sqlfile in sorted(glob.glob(sql_files), reverse=True):
+all_sql_files = set(glob.glob(maddir_mod_sql + '/' + module + 
'/test/*.sql_in'))
+sql_files = all_sql_files - ic_sql_files
+for sqlfile in sorted(sql_files):
 algoname = os.path.basename(sqlfile).split('.')[0]
 # run only algo specified
 if (modset and modset[module] and



[2/5] madlib git commit: SVM: Compute average loss per row instead of total loss

2018-07-12 Thread riyer
SVM: Compute average loss per row instead of total loss


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/ceab57f3
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/ceab57f3
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/ceab57f3

Branch: refs/heads/master
Commit: ceab57f31ddf15a1de8621a22633e052ba0028ff
Parents: ac4a51f
Author: Rahul Iyer 
Authored: Tue Jul 10 13:47:39 2018 -0700
Committer: Rahul Iyer 
Committed: Thu Jul 12 13:31:22 2018 -0700

--
 src/modules/convex/linear_svm_igd.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/ceab57f3/src/modules/convex/linear_svm_igd.cpp
--
diff --git a/src/modules/convex/linear_svm_igd.cpp 
b/src/modules/convex/linear_svm_igd.cpp
index f396250..79dc496 100644
--- a/src/modules/convex/linear_svm_igd.cpp
+++ b/src/modules/convex/linear_svm_igd.cpp
@@ -192,7 +192,7 @@ internal_linear_svm_igd_result::run(AnyType ) {
 
 AnyType tuple;
 tuple << state.task.model
-<< static_cast(state.algo.loss)
+<< static_cast(state.algo.loss / state.algo.numRows)
 << state.algo.gradient.norm()
 << static_cast(state.algo.numRows);
 



[4/5] madlib git commit: Multiple: Update docs related to CV

2018-07-12 Thread riyer
Multiple: Update docs related to CV

JIRA: MADLIB-1250

This commit updates documentation to reflect latest changes in cross
validation. An additional minor change is made to MLP docs to use 'AVG'
instead of 'SUM/COUNT'.


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/11ecdc7e
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/11ecdc7e
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/11ecdc7e

Branch: refs/heads/master
Commit: 11ecdc7e6309c6ebdb070ffeda6ac2cbaafa18c2
Parents: 834f543
Author: Frank McQuillan 
Authored: Wed Jul 11 11:02:02 2018 -0700
Committer: Rahul Iyer 
Committed: Thu Jul 12 13:39:31 2018 -0700

--
 src/ports/postgres/modules/convex/mlp.sql_in|   4 +-
 .../modules/elastic_net/elastic_net.sql_in  |  34 +-
 src/ports/postgres/modules/svm/svm.sql_in   | 622 +--
 3 files changed, 453 insertions(+), 207 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/11ecdc7e/src/ports/postgres/modules/convex/mlp.sql_in
--
diff --git a/src/ports/postgres/modules/convex/mlp.sql_in 
b/src/ports/postgres/modules/convex/mlp.sql_in
index 13ae4a0..9fba404 100644
--- a/src/ports/postgres/modules/convex/mlp.sql_in
+++ b/src/ports/postgres/modules/convex/mlp.sql_in
@@ -1164,7 +1164,7 @@ SELECT * FROM lin_housing JOIN mlp_regress_prediction 
USING (id) ORDER BY id;
 
 RMS error:
 
-SELECT SQRT(SUM(ABS(y-estimated_y))/COUNT(y)) as rms_error FROM lin_housing
+SELECT SQRT(AVG((y-estimated_y)*(y-estimated_y))) as rms_error FROM 
lin_housing 
 JOIN mlp_regress_prediction USING (id);
 
 
@@ -1256,7 +1256,7 @@ SELECT *, ABS(y-estimated_y) as abs_diff FROM lin_housing 
JOIN mlp_regress_predi
 
 RMS error:
 
-SELECT SQRT(SUM(ABS(y-estimated_y))/COUNT(y)) as rms_error FROM lin_housing
+SELECT SQRT(AVG((y-estimated_y)*(y-estimated_y))) as rms_error FROM 
lin_housing 
 JOIN mlp_regress_prediction USING (id);
 
 

http://git-wip-us.apache.org/repos/asf/madlib/blob/11ecdc7e/src/ports/postgres/modules/elastic_net/elastic_net.sql_in
--
diff --git a/src/ports/postgres/modules/elastic_net/elastic_net.sql_in 
b/src/ports/postgres/modules/elastic_net/elastic_net.sql_in
index 5ea2efb..838a6bd 100644
--- a/src/ports/postgres/modules/elastic_net/elastic_net.sql_in
+++ b/src/ports/postgres/modules/elastic_net/elastic_net.sql_in
@@ -231,8 +231,14 @@ cross validation is used.  Also, cross validation is not 
supported if grouping i
 
 Hyperparameter optimization can be carried out using the built-in cross
 validation mechanism, which is activated by assigning a value greater than 1 to
-the parameter \e n_folds.  Negative misclassification error is used
-for classification and negative root mean squared error is used for regression.
+the parameter \e n_folds. 
+
+The cross validation scores are the mean and standard deviation
+of the accuracy when predicted on the validation fold, 
+averaged over all folds and all rows.  For classification, the accuracy
+metric used is the ratio of correct classifications.  For regression, the 
+accuracy metric used is the negative of mean squared error (negative to 
+make it a concave problem, thus selecting \e max means the highest accuracy). 
 
 The values of a parameter to cross validate should be provided in a list. For
 example, to regularize with the L1 norm and use a lambda value
@@ -775,20 +781,20 @@ iteration_run | 1
 
 -# Details of the cross validation:
 
-SELECT * FROM houses_en3_cv ORDER BY lambda_value DESC, alpha ASC;
+SELECT * FROM houses_en3_cv ORDER BY mean_neg_loss DESC;
 
 
- alpha | lambda_value |mean_neg_loss   |   std_neg_loss
+--++---
-   0.0 | 10.0 | -1.617365261170+55 | 1.26711815498+55
-   0.0 |100.0 | -63555.0502789 |3973.78527042
-   0.0 |  0.1 | -37136.5397256 |9022.78236248
-   0.1 | 10.0 | -3.260479720340+53 | 9.10745448826+53
-   0.1 |100.0 | -63445.8310011 |3965.83900962
-   0.1 |  0.1 | -37192.0390897 |9058.79757772
-   1.0 | 10.0 | -64569.8882099 | 4051.1856361
-   1.0 |100.0 | -38121.9154268 |9332.65800111
-   1.0 |  0.1 | -38117.5477067 |9384.36765881
+ alpha | lambda_value |  mean_neg_loss |   std_neg_loss
   
+---+--+--+
+   0.0 |  0.1 | -36094.4685768 |  10524.4473253
+   0.1 |  0.1 | -36136.2448004 |  10682.4136993
+   1.0 |100.0 | -37007.9496501 |  12679.3781975
+   1.0 |  0.1 | -37018.1019927 |  12716.7438015
+   0.1 |100.0 | 

[3/5] madlib git commit: CV: Fix incorrect dict index + change output columns

2018-07-12 Thread riyer
CV: Fix incorrect dict index + change output columns

JIRA: MADLIB-1250

Cross validation had a minor bug that didn't fully index into a two-level
nested dictionary. This led to a KeyError while writing CV results to an
output table. This has been fixed in this commit.

Additionally, the CV output table columns are called 'mean_score' and
'std_dev_score', instead of 'mean_neg_loss' and 'std_neg_loss' to not
confuse with the loss function used in the primary modeling technique.

Closes #287


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/834f543e
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/834f543e
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/834f543e

Branch: refs/heads/master
Commit: 834f543eefdbf321c2ce014d64f909138559c357
Parents: ceab57f
Author: Rahul Iyer 
Authored: Tue Jul 3 14:28:21 2018 -0700
Committer: Rahul Iyer 
Committed: Thu Jul 12 13:39:31 2018 -0700

--
 src/ports/postgres/modules/svm/test/svm.sql_in  |  4 +++-
 .../validation/internal/cross_validation.py_in  | 16 
 2 files changed, 11 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/834f543e/src/ports/postgres/modules/svm/test/svm.sql_in
--
diff --git a/src/ports/postgres/modules/svm/test/svm.sql_in 
b/src/ports/postgres/modules/svm/test/svm.sql_in
index ad4b9ac..cba370c 100644
--- a/src/ports/postgres/modules/svm/test/svm.sql_in
+++ b/src/ports/postgres/modules/svm/test/svm.sql_in
@@ -581,7 +581,9 @@ SELECT svm_classification(
  'gaussian',
  'n_components=3, fit_intercept=true',
  NULL,
- 'max_iter=2, n_folds=3, lambda=[0.01, 0.1, 0.5]');
+ 'init_stepsize=[0.01, 0.1], max_iter=2, n_folds=3, lambda=[0.01, 0.1, 
0.5], validation_result=m7_cv');
+
+SELECT * FROM m7_cv;
 
 SELECT svm_predict('m7','svm_test_data', 'id', 'svm_test_7');
 SELECT

http://git-wip-us.apache.org/repos/asf/madlib/blob/834f543e/src/ports/postgres/modules/validation/internal/cross_validation.py_in
--
diff --git 
a/src/ports/postgres/modules/validation/internal/cross_validation.py_in 
b/src/ports/postgres/modules/validation/internal/cross_validation.py_in
index 84e52e9..c173533 100644
--- a/src/ports/postgres/modules/validation/internal/cross_validation.py_in
+++ b/src/ports/postgres/modules/validation/internal/cross_validation.py_in
@@ -67,8 +67,8 @@ class ValidationResult(object):
  List of dictionaries.
  Each dictionary contains the following three keys:
 
- - mean_neg_loss: float, average of scores using sub_args
- - std_neg_loss: float, standard deviation of scores using 
sub_args
+ - mean_score: float, average of scores using sub_args
+ - std_dev_score: float, standard deviation of scores using 
sub_args
  - sub_args: dict, the values of arguments being validated
 """
 def __init__(self, cv_history=None):
@@ -98,12 +98,12 @@ class ValidationResult(object):
 
 def add_one(self, mean, std, sub_args):
 """Add one record to the history"""
-record = dict(mean_neg_loss=mean, std_neg_loss=std, sub_args=sub_args)
+record = dict(mean_score=mean, std_dev_score=std, sub_args=sub_args)
 self._cv_history.append(record)
 
 def sorted(self):
 """Sort the history w.r.t. mean value and return a new 
ValidationResult object"""
-ch = sorted(self._cv_history, reverse=True, 
key=itemgetter('mean_neg_loss'))
+ch = sorted(self._cv_history, reverse=True, 
key=itemgetter('mean_score'))
 return ValidationResult(ch)
 
 def first(self, attr=None):
@@ -112,7 +112,7 @@ class ValidationResult(object):
 Parameters
 ==
 attr : string, optional
-   Any string in {'mean_neg_loss', 'std_neg_loss', 'sub_args'} or 
None
+   Any string in {'mean_score', 'std_dev_score', 'sub_args'} or 
None
 
 Returns
 ===
@@ -133,13 +133,14 @@ class ValidationResult(object):
 def output_tbl(self, tbl_name):
 """Create a table tbl_name that contains the history
 
-The columns of tbl_name are mean_neg_loss, std_neg_loss and the leaf 
keys in sub_args.
+The columns of tbl_name are mean_score, std_dev_score and the leaf 
keys in sub_args.
 All column types are assumed to be double precision.
 """
 if not tbl_name or not str(tbl_name).strip():
 return
 
-header = self._cv_history[0]['sub_args'].keys() + ['mean_neg_loss', 
'std_neg_loss']
+header = (self._cv_history[0]['sub_args']['params_dict'].keys() +
+  

[5/5] madlib git commit: Utilities: Add check for any array type

2018-07-12 Thread riyer
Utilities: Add check for any array type

Co-authored-by: Nikhil Kak 

Closes #293


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/a47cd1ff
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/a47cd1ff
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/a47cd1ff

Branch: refs/heads/master
Commit: a47cd1ff533a271e32470074986872e7bd278cbe
Parents: 11ecdc7
Author: Arvind Sridhar 
Authored: Mon Jul 9 16:14:48 2018 -0700
Committer: Rahul Iyer 
Committed: Thu Jul 12 14:03:08 2018 -0700

--
 .../test/unit_tests/test_utilities.py_in|  3 +++
 .../postgres/modules/utilities/utilities.py_in  | 22 +---
 2 files changed, 17 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/a47cd1ff/src/ports/postgres/modules/utilities/test/unit_tests/test_utilities.py_in
--
diff --git 
a/src/ports/postgres/modules/utilities/test/unit_tests/test_utilities.py_in 
b/src/ports/postgres/modules/utilities/test/unit_tests/test_utilities.py_in
index 407a3c0..2d2c481 100644
--- a/src/ports/postgres/modules/utilities/test/unit_tests/test_utilities.py_in
+++ b/src/ports/postgres/modules/utilities/test/unit_tests/test_utilities.py_in
@@ -243,6 +243,9 @@ class UtilitiesTestCase(unittest.TestCase):
 self.assertFalse(s.is_valid_psql_type('boolean[]', s.INCLUDE_ARRAY | 
s.ONLY_ARRAY))
 self.assertFalse(s.is_valid_psql_type('boolean', s.ONLY_ARRAY))
 self.assertFalse(s.is_valid_psql_type('boolean[]', s.ONLY_ARRAY))
+self.assertTrue(s.is_valid_psql_type('boolean[]', s.ANY_ARRAY))
+self.assertTrue(s.is_valid_psql_type('boolean[]', s.INTEGER | 
s.ANY_ARRAY))
+self.assertFalse(s.is_valid_psql_type('boolean', s.ANY_ARRAY))
 
 if __name__ == '__main__':
 unittest.main()

http://git-wip-us.apache.org/repos/asf/madlib/blob/a47cd1ff/src/ports/postgres/modules/utilities/utilities.py_in
--
diff --git a/src/ports/postgres/modules/utilities/utilities.py_in 
b/src/ports/postgres/modules/utilities/utilities.py_in
index 55b6983..d571b40 100644
--- a/src/ports/postgres/modules/utilities/utilities.py_in
+++ b/src/ports/postgres/modules/utilities/utilities.py_in
@@ -175,34 +175,40 @@ TEXT = set(['text', 'varchar', 'character varying', 
'char', 'character'])
 BOOLEAN = set(['boolean'])
 INCLUDE_ARRAY = set([unique_string('__include_array__')])
 ONLY_ARRAY = set([unique_string('__only_array__')])
+ANY_ARRAY = set([unique_string('__any_array__')])
+
 
 def is_valid_psql_type(arg, valid_types):
 """ Verify if argument is a valid type
 
 Args:
 @param arg: str. Name of the Postgres type to validate
-@param valid_types: set. Set of type names to look into.
-This is typically created using the global types
-created in this module.
-Two non-type flags are provided:
+@param valid_types: set. Set of valid type names to search.
+This is typically created using the global names
+in this module.
+Three non-type flags are provided
+(in descending order of precedence):
+- ANY_ARRAY: check if arg is any array type
 - ONLY_ARRAY: indicates that only array forms
 of the valid types should be checked
 - INCLUDE_ARRAY: indicates that array and 
scalar
 forms of the valid types should be checked
-If both ONLY_ARRAY and INCLUDE_ARRAY are present,
-then ONLY_ARRAY takes precedence
-
 Examples: 1. valid_types = BOOLEAN | INTEGER | TEXT
   2. valid_types = BOOLEAN | INTEGER | ONLY_ARRAY
   3. valid_types = NUMERIC | INCLUDE_ARRAY
 """
+if not arg or not valid_types:
+return False
+if ANY_ARRAY <= valid_types:
+return arg.rstrip().endswith('[]')
 if ONLY_ARRAY <= valid_types:
-return ('[]' in arg and arg.rstrip('[]') in valid_types)
+return (arg.rstrip().endswith('[]') and arg.rstrip('[] ') in 
valid_types)
 if INCLUDE_ARRAY <= valid_types:
 # Remove the [] from end of the arg type
 # The single space is needed to ensure trailing white space is stripped
 arg = arg.rstrip('[] ')
 return (arg in valid_types)
+# 
--
 
 
 def is_psql_numeric_type(arg, exclude=None):



[1/5] madlib git commit: Build: Remove symlinks during rpm uninstall

2018-07-12 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master 5f80ba978 -> a47cd1ff5


Build: Remove symlinks during rpm uninstall

JIRA: MADLIB-1175

`rpm --install` creates three symlinks to `Versions/`, `.../bin`, and
`.../doc`. These symlinks should be deleted during `rpm --erase`.
Additionally, we also delete `Versions/` if it is empty after the erase.

Closes #286

Co-Authored-by: Arvind Sridhar 


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/ac4a51f0
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/ac4a51f0
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/ac4a51f0

Branch: refs/heads/master
Commit: ac4a51f0a8aacd72f884f6cebb27f43b19948ccd
Parents: 5f80ba9
Author: Rahul Iyer 
Authored: Tue Jul 3 12:02:55 2018 -0700
Committer: Rahul Iyer 
Committed: Thu Jul 12 13:28:54 2018 -0700

--
 deploy/CMakeLists.txt|  1 +
 deploy/rpm_post_uninstall.sh | 26 ++
 2 files changed, 27 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/ac4a51f0/deploy/CMakeLists.txt
--
diff --git a/deploy/CMakeLists.txt b/deploy/CMakeLists.txt
index 32023bd..f8000df 100644
--- a/deploy/CMakeLists.txt
+++ b/deploy/CMakeLists.txt
@@ -54,6 +54,7 @@ add_subdirectory(gppkg)
 # -- Finally do the packaging! 
-
 
 set(CPACK_RPM_POST_INSTALL_SCRIPT_FILE 
"${CMAKE_CURRENT_SOURCE_DIR}/rpm_post.sh")
+set(CPACK_RPM_POST_UNINSTALL_SCRIPT_FILE 
"${CMAKE_CURRENT_SOURCE_DIR}/rpm_post_uninstall.sh")
 set(CPACK_PREFLIGHT_SCRIPT ${CMAKE_CURRENT_SOURCE_DIR}/preflight.sh)
 set(CPACK_POSTFLIGHT_SCRIPT ${CMAKE_CURRENT_SOURCE_DIR}/postflight.sh)
 set(CPACK_MONOLITHIC_INSTALL 1)

http://git-wip-us.apache.org/repos/asf/madlib/blob/ac4a51f0/deploy/rpm_post_uninstall.sh
--
diff --git a/deploy/rpm_post_uninstall.sh b/deploy/rpm_post_uninstall.sh
new file mode 100755
index 000..c67fd34
--- /dev/null
+++ b/deploy/rpm_post_uninstall.sh
@@ -0,0 +1,26 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# remove symlinks created during rpm install
+find $RPM_INSTALL_PREFIX/madlib/Current -depth -type l -exec rm {} \; 
2>/dev/null
+find $RPM_INSTALL_PREFIX/madlib/bin -depth -type l -exec rm {} \; 2>/dev/null
+find $RPM_INSTALL_PREFIX/madlib/doc -depth -type l -exec rm {} \; 2>/dev/null
+
+# remove "Versions" directory if it's empty
+rmdir $RPM_INSTALL_PREFIX/madlib/Versions 2>/dev/null



madlib git commit: Utilites: Add CTAS while dropping some columns

2018-07-12 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master 59ad96a04 -> 5f80ba978


Utilites: Add CTAS while dropping some columns

JIRA: MADLIB-1241

This commit adds function to create a new table from existing table
while dropping some of the columns of the original table.

Closes #282


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/5f80ba97
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/5f80ba97
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/5f80ba97

Branch: refs/heads/master
Commit: 5f80ba9781efb3526e422a0867ae1d34b49c7ac8
Parents: 59ad96a
Author: Rahul Iyer 
Authored: Thu Jul 12 10:08:52 2018 -0700
Committer: Rahul Iyer 
Committed: Thu Jul 12 10:08:52 2018 -0700

--
 doc/mainpage.dox.in | 16 ++--
 .../utilities/test/drop_madlib_temp.ic.sql_in   | 23 --
 .../utilities/test/drop_madlib_temp.sql_in  | 16 
 .../modules/utilities/test/utilities.ic.sql_in  | 58 +
 .../modules/utilities/test/utilities.sql_in | 87 
 .../postgres/modules/utilities/utilities.py_in  | 37 -
 .../postgres/modules/utilities/utilities.sql_in | 43 +-
 7 files changed, 227 insertions(+), 53 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/5f80ba97/doc/mainpage.dox.in
--
diff --git a/doc/mainpage.dox.in b/doc/mainpage.dox.in
index 8681eb4..e41e6c9 100644
--- a/doc/mainpage.dox.in
+++ b/doc/mainpage.dox.in
@@ -261,12 +261,9 @@ Contains graph algorithms.
 @ingroup grp_topic_modelling
 
 
-@defgroup grp_utility_functions Utility Functions
-@defgroup @grp_utilities Developer Database Functions
-@ingroup grp_utility_functions
-
+@defgroup grp_other_functions Other Functions
 @defgroup grp_linear_solver Linear Solvers
-@ingroup grp_utility_functions
+@ingroup grp_other_functions
 @{A collection of methods that implement solutions for systems of 
consistent linear equations. @}
 
 @defgroup grp_dense_linear_solver Dense Linear Systems
@@ -276,13 +273,16 @@ Contains graph algorithms.
 @ingroup grp_linear_solver
 
 @defgroup grp_minibatch_preprocessing Mini-Batch Preprocessor
-@ingroup grp_utility_functions
+@ingroup grp_other_functions
 
 @defgroup grp_pmml PMML Export
-@ingroup grp_utility_functions
+@ingroup grp_other_functions
 
 @defgroup grp_text_utilities Term Frequency
-@ingroup grp_utility_functions
+@ingroup grp_other_functions
+
+@defgroup @grp_utilities Utilities
+@ingroup grp_other_functions
 
 @defgroup grp_early_stage Early Stage Development
 @brief A collection of implementations which are in early stage of development.

http://git-wip-us.apache.org/repos/asf/madlib/blob/5f80ba97/src/ports/postgres/modules/utilities/test/drop_madlib_temp.ic.sql_in
--
diff --git 
a/src/ports/postgres/modules/utilities/test/drop_madlib_temp.ic.sql_in 
b/src/ports/postgres/modules/utilities/test/drop_madlib_temp.ic.sql_in
deleted file mode 100644
index 7879385..000
--- a/src/ports/postgres/modules/utilities/test/drop_madlib_temp.ic.sql_in
+++ /dev/null
@@ -1,23 +0,0 @@
-/* --- 
*//**
- *
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *   http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- *
- *//* --- 
*/
-
--- cleanup
-SELECT cleanup_madlib_temp_tables(quote_ident(current_schema()));

http://git-wip-us.apache.org/repos/asf/madlib/blob/5f80ba97/src/ports/postgres/modules/utilities/test/drop_madlib_temp.sql_in
--
diff --git a/src/ports/postgres/modules/utilities/test/drop_madlib_temp.sql_in 
b/src/ports/postgres/modules/utilities/test/drop_madlib_temp.sql_in
deleted file mode 100644
index f902361..000
--- a/src/ports/postgres/modules/utilities/test/drop_madlib_temp.sql_in
+++ 

[1/2] madlib git commit: Upgrade: Fix multiple bugs

2018-06-15 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master 8e34f68d7 -> b88f60464


Upgrade: Fix multiple bugs

1. Appended schema_madlib to the mlp_igd_final return type. The missing
schema name caused the upgrade to fail from 1.12 to 1.x if there was a
dependency on mlp_igd_final.

2. A new changelist was created for changes from v1.14 to 1.15-dev.  We
will rename this at the 1.15 release from 1.14_1.15-dev.yaml to
1.14_1.15.yaml.

3. Commit 8e34f68 added a new function called `_write_to_file` that
takes 2 arguments.  Some of the calls to this function were not passing
the first file handle argument.

Closes #278

Co-authored-by : Orhan Kislal 


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/89bcdb78
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/89bcdb78
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/89bcdb78

Branch: refs/heads/master
Commit: 89bcdb785816716f2ef6c5e4599edbf95584595d
Parents: 8e34f68
Author: Nikhil Kak 
Authored: Fri Jun 15 08:18:34 2018 -0700
Committer: Rahul Iyer 
Committed: Fri Jun 15 08:22:34 2018 -0700

--
 src/madpack/changelist_1.12_1.13.yaml |  2 +-
 src/madpack/changelist_1.14_1.15-dev.yaml | 58 ++
 src/madpack/upgrade_util.py   | 14 +++
 3 files changed, 66 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/89bcdb78/src/madpack/changelist_1.12_1.13.yaml
--
diff --git a/src/madpack/changelist_1.12_1.13.yaml 
b/src/madpack/changelist_1.12_1.13.yaml
index 0e6c3df..49169c3 100644
--- a/src/madpack/changelist_1.12_1.13.yaml
+++ b/src/madpack/changelist_1.12_1.13.yaml
@@ -49,7 +49,7 @@ udf:
 rettype: void
 argument: character varying, character varying, character varying, 
character varying
 - mlp_igd_final:
-rettype: mlp_step_result
+rettype: schema_madlib.mlp_step_result
 argument: double precision[]
 - mlp_igd_transition:
 rettype: double precision[]

http://git-wip-us.apache.org/repos/asf/madlib/blob/89bcdb78/src/madpack/changelist_1.14_1.15-dev.yaml
--
diff --git a/src/madpack/changelist_1.14_1.15-dev.yaml 
b/src/madpack/changelist_1.14_1.15-dev.yaml
new file mode 100644
index 000..88bb886
--- /dev/null
+++ b/src/madpack/changelist_1.14_1.15-dev.yaml
@@ -0,0 +1,58 @@
+# 
--
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+# 
--
+
+# Changelist for MADlib version 1.14 to 1.15
+
+# This file contains all changes that were introduced in a new version of
+# MADlib. This changelist is used by the upgrade script to detect what objects
+# should be upgraded (while retaining all other objects from the previous 
version)
+
+# New modules (actually .sql_in files) added in upgrade version
+# For these files the sql_in code is retained as is with the functions in the
+# file installed on the upgrade version. All other files (that don't have
+# updates), are cleaned up to remove object replacements
+new module:
+# - Changes from 1.14 to 1.15 
+
+
+# Changes in the types (UDT) including removal and modification
+udt:
+
+# List of the UDF changes that affect the user externally. This includes change
+# in function name, return type, argument order or types, or removal of
+# the function. In each case, the original function is as good as removed and a
+# new function is created. In such cases, we should abort the upgrade if there
+# are user views dependent on this function, since the original function will
+# not be present in the upgraded version.
+udf:
+# - Changes from 1.14 to 1.15 --
+
+
+# Changes to aggregates (UDA) including removal and modification
+# Overloaded functions should be mentioned separately
+uda:
+
+# Casts (UDC) updated/removed
+udc:

[1/3] madlib git commit: DT: Don't use NULL value to get dep_var type

2018-05-31 Thread riyer
Repository: madlib
Updated Branches:
  refs/heads/master ccc3a1832 -> abef95ec9


DT: Don't use NULL value to get dep_var type

JIRA: MADLIB-1233

Function `_is_dep_categorical` is used to obtain the type of the
dependent variable expression. This function gets a random value using
`LIMIT 1` and checks the type of the corresponding value in Python.
Further this does not filter out NULL values.
Since NULL values are not filtered out,
it's possible the `LIMIT 1` returns a "None" type in Python, leading to
incorrect results.

This commit updates the type extraction by checking the type in the
database instead of in Python and also filters out NULL values.
Additionally it checks if at least one non-NULL value is obtained, else
throws an appropriate error.


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/26f61e91
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/26f61e91
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/26f61e91

Branch: refs/heads/master
Commit: 26f61e9110f12804c76ca707f52f1774d8844a7c
Parents: ccc3a18
Author: Rahul Iyer 
Authored: Tue May 1 14:24:34 2018 -0700
Committer: Rahul Iyer 
Committed: Thu May 31 17:03:30 2018 -0700

--
 .../recursive_partitioning/decision_tree.py_in  |  18 +-
 .../recursive_partitioning/decision_tree.sql_in | 206 +--
 .../modules/utilities/validate_args.py_in   |  27 ++-
 3 files changed, 135 insertions(+), 116 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/26f61e91/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in
--
diff --git 
a/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in 
b/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in
index 6f64234..48b8fab 100644
--- a/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in
+++ b/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in
@@ -31,7 +31,7 @@ from utilities.utilities import _assert
 from utilities.utilities import extract_keyvalue_params
 from utilities.utilities import unique_string
 from utilities.utilities import add_postfix
-from utilities.utilities import is_psql_numeric_type
+from utilities.utilities import is_psql_numeric_type, is_psql_boolean_type
 from utilities.utilities import split_quoted_delimited_str
 from utilities.utilities import py_list_to_sql_string
 # 
@@ -56,6 +56,11 @@ def _tree_validate_args(
 "Decision tree error: Invalid data table.")
 _assert(table_exists(training_table_name),
 "Decision tree error: Data table is missing.")
+_assert(not table_is_empty(training_table_name,
+   _get_filter_str(dependent_variable, 
grouping_cols)),
+"Decision tree error: Data table ({0}) is empty "
+"(after filtering invalid tuples)".
+format(training_table_name))
 
 _assert(not table_exists(output_table_name, only_first_schema=True),
 "Decision tree error: Output table already exists.")
@@ -567,10 +572,13 @@ def _is_dep_categorical(training_table_name, 
dependent_variable):
 @brief Sample the dependent variable to check whether it is
 a categorical variable.
 """
-sample_dep = plpy.execute("SELECT " + dependent_variable +
-  " AS dep FROM " +
-  training_table_name + " LIMIT 1")[0]['dep']
-return (not isinstance(sample_dep, float), isinstance(sample_dep, bool))
+sample_dep = get_expr_type(dependent_variable, training_table_name)
+is_dep_numeric = is_psql_numeric_type(sample_dep,
+  exclude=['smallint',
+   'integer',
+   'bigint'])
+is_dep_bool = is_psql_boolean_type(sample_dep)
+return (not is_dep_numeric, is_dep_bool)
 # 
 
 

http://git-wip-us.apache.org/repos/asf/madlib/blob/26f61e91/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in
--
diff --git 
a/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in 
b/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in
index a3c4963..8e69d9b 100644
--- a/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in
+++ b/src/ports/postgres/modules/recursive_partitioning/decision_tree.sql_in
@@ -25,7 +25,7 @@ m4_include(`SQLCommon.m4')
 
 
 @brief
-Decision trees are tree-based supervised learning methods 
+Decision trees are tree-based supervised learning methods

[2/3] madlib git commit: DT: Ensure summary table has correct features

2018-05-31 Thread riyer
DT: Ensure summary table has correct features

JIRA: MADLIB-1236

If a cat_feature is dropped (due to just a single level), that feature
should not be included in the summary table list, since tree_predict
uses the features in summary table while reading source table. This
commit ensures the right features are populated in the summary table.

Closes #268


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/ef52d871
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/ef52d871
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/ef52d871

Branch: refs/heads/master
Commit: ef52d87198d73db272ef033f5c7c0f26b2956a0b
Parents: 26f61e9
Author: Rahul Iyer 
Authored: Thu May 3 11:38:27 2018 -0700
Committer: Rahul Iyer 
Committed: Thu May 31 17:03:37 2018 -0700

--
 .../recursive_partitioning/decision_tree.py_in  | 51 
 1 file changed, 30 insertions(+), 21 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/ef52d871/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in
--
diff --git 
a/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in 
b/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in
index 48b8fab..04fde7e 100644
--- a/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in
+++ b/src/ports/postgres/modules/recursive_partitioning/decision_tree.py_in
@@ -56,11 +56,6 @@ def _tree_validate_args(
 "Decision tree error: Invalid data table.")
 _assert(table_exists(training_table_name),
 "Decision tree error: Data table is missing.")
-_assert(not table_is_empty(training_table_name,
-   _get_filter_str(dependent_variable, 
grouping_cols)),
-"Decision tree error: Data table ({0}) is empty "
-"(after filtering invalid tuples)".
-format(training_table_name))
 
 _assert(not table_exists(output_table_name, only_first_schema=True),
 "Decision tree error: Output table already exists.")
@@ -95,6 +90,12 @@ def _tree_validate_args(
 _assert(max_depth >= 0 and max_depth < 100,
 "Decision tree error: maximum tree depth must be positive and less 
than 100.")
 
+_assert(not table_is_empty(training_table_name,
+   _get_filter_str(dependent_variable, 
grouping_cols)),
+"Decision tree error: Data table ({0}) is empty "
+"(after filtering invalid tuples)".
+format(training_table_name))
+
 _assert(cp >= 0, "Decision tree error: cp must be non-negative.")
 _assert(min_split > 0, "Decision tree error: min_split must be positive.")
 _assert(min_bucket > 0, "Decision tree error: min_bucket must be 
positive.")
@@ -510,8 +511,7 @@ def tree_train(schema_madlib, training_table_name, 
output_table_name,
 
 
 def _create_output_tables(schema_madlib, training_table_name, 
output_table_name,
-  tree_states, bins,
-  split_criterion, cat_features, con_features,
+  tree_states, bins, split_criterion,
   id_col_name, dependent_variable, list_of_features,
   is_classification, n_all_rows, n_rows, dep_list, cp,
   all_cols_types, grouping_cols=None,
@@ -519,19 +519,19 @@ def _create_output_tables(schema_madlib, 
training_table_name, output_table_name,
   n_folds=0, null_proxy=None, **kwargs):
 if not grouping_cols:
 _create_result_table(schema_madlib, tree_states[0],
- bins['cat_origin'], bins['cat_n'], cat_features,
- con_features, output_table_name,
+ bins['cat_origin'], bins['cat_n'], 
bins['cat_features'],
+ bins['con_features'], output_table_name,
  use_existing_tables, running_cv, n_folds)
 else:
 _create_grp_result_table(
-schema_madlib, tree_states, bins, cat_features,
-con_features, output_table_name, grouping_cols, 
training_table_name,
-use_existing_tables, running_cv, n_folds)
+schema_madlib, tree_states, bins, bins['cat_features'],
+bins['con_features'], output_table_name, grouping_cols,
+training_table_name, use_existing_tables, running_cv, n_folds)
 
 failed_groups = sum(row['finished'] != 1 for row in tree_states)
 _create_summary_table(
 schema_madlib, split_criterion, training_table_name,
-output_table_name, id_col_name, cat_features, con_features,
+output_table_name, id_col_name, bins['cat_features'], 
bins['con_features'],
 

[3/3] madlib git commit: Logregr: Report error if output table is empty

2018-05-31 Thread riyer
Logregr: Report error if output table is empty

JIRA MADLIB-1172

When the model cannot be generated due to ill-conditioned input data,
the output table doesn't get populated.  In this case, we report back an
error instead of creating the empty table.

Closes #270


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/abef95ec
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/abef95ec
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/abef95ec

Branch: refs/heads/master
Commit: abef95ec99d2797fa7a51c8d4548d88a656d7364
Parents: ef52d87
Author: Himanshu Pandey 
Authored: Thu May 31 18:44:41 2018 -0700
Committer: Rahul Iyer 
Committed: Thu May 31 18:47:32 2018 -0700

--
 .../postgres/modules/regress/logistic.py_in | 157 +--
 .../modules/regress/test/logistic.sql_in|  69 ++--
 2 files changed, 92 insertions(+), 134 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/madlib/blob/abef95ec/src/ports/postgres/modules/regress/logistic.py_in
--
diff --git a/src/ports/postgres/modules/regress/logistic.py_in 
b/src/ports/postgres/modules/regress/logistic.py_in
index 76cbb6a..77ea465 100644
--- a/src/ports/postgres/modules/regress/logistic.py_in
+++ b/src/ports/postgres/modules/regress/logistic.py_in
@@ -153,7 +153,8 @@ def __logregr_validate_args(schema_madlib, tbl_source, 
tbl_output, dep_col,
 plpy.error("Logregr error: Invalid output table name!")
 
 if (table_exists(tbl_output, only_first_schema=True)):
-plpy.error("Output table name already exists. Drop the table before 
calling the function.")
+plpy.error("Output table name already exists. Drop the table before "
+   "calling the function.")
 
 if not dep_col or dep_col.strip().lower() in ('null', ''):
 plpy.error("Logregr error: Invalid dependent column name!")
@@ -164,7 +165,6 @@ def __logregr_validate_args(schema_madlib, tbl_source, 
tbl_output, dep_col,
 if not ind_col or ind_col.lower() in ('null', ''):
 plpy.error("Logregr error: Invalid independent column name!")
 
-
 if grouping_col is not None:
 if grouping_col == '':
 plpy.error("Logregr error: Invalid grouping columns name!")
@@ -173,14 +173,14 @@ def __logregr_validate_args(schema_madlib, tbl_source, 
tbl_output, dep_col,
 plpy.error("Logregr error: Grouping column does not exist!")
 
 intersect = frozenset(_string_to_array(grouping_col)).intersection(
-frozenset(('coef', 'log_likelihood', 'std_err', 'z_stats',
-   'p_values', 'odds_ratios', 'condition_no',
-   'num_processed', 'num_missing_rows_skipped',
-   'variance_covariance')))
+frozenset(('coef', 'log_likelihood', 'std_err', 'z_stats',
+   'p_values', 'odds_ratios', 'condition_no',
+   'num_processed', 'num_missing_rows_skipped',
+   'variance_covariance')))
 if len(intersect) > 0:
 plpy.error("Logregr error: Conflicted grouping column name.\n"
"Predefined name(s) {0} are not allow!".format(
-', '.join(intersect)))
+   ', '.join(intersect)))
 
 if max_iter <= 0:
 plpy.error("Logregr error: Maximum number of iterations must be 
positive!")
@@ -231,12 +231,12 @@ def __logregr_train_compute(schema_madlib, tbl_source, 
tbl_output, dep_col,
 'cg': "__logregr_cg_result",
 'igd': "__logregr_igd_result"}
 
-plpy.execute("select 
{schema_madlib}.create_schema_pg_temp()".format(**args))
-plpy.execute(
-"""
-drop table if exists pg_temp.{tbl_logregr_args};
-create table pg_temp.{tbl_logregr_args} as
-select
+plpy.execute("SELECT {schema_madlib}.create_schema_pg_temp()".
+ format(**args))
+plpy.execute("""
+DROP TABLE IF EXISTS pg_temp.{tbl_logregr_args};
+CREATE TABLE pg_temp.{tbl_logregr_args} as
+SELECT
 {max_iter} as max_iter,
 {tolerance} as tolerance
 """.format(**args))
@@ -257,7 +257,8 @@ def __logregr_train_compute(schema_madlib, tbl_source, 
tbl_output, dep_col,
   dep_col, ind_col, optimizer,
   grouping_col=grouping_col,
   grouping_str=grouping_str,
-  
col_grp_iteration=args["col_grp_iteration"],
+  col_grp_iteration=args[
+  "col_grp_iteration"],
 

[3/3] madlib git commit: Multiple: Remove support for HAWQ from all modules

2018-05-11 Thread riyer
Multiple: Remove support for HAWQ from all modules

With HAWQ support removed for the past few versions, we can eliminate
all the code that was specifically written for that port. This
includes madpack changes for upgrade and reinstall, workarounds in
multiple modules for table updates, and special consideration in
Iteration Controllers.

Closes #267


Project: http://git-wip-us.apache.org/repos/asf/madlib/repo
Commit: http://git-wip-us.apache.org/repos/asf/madlib/commit/34ca6188
Tree: http://git-wip-us.apache.org/repos/asf/madlib/tree/34ca6188
Diff: http://git-wip-us.apache.org/repos/asf/madlib/diff/34ca6188

Branch: refs/heads/master
Commit: 34ca6188ecb51577d5994699636a231d2615c548
Parents: 0b8507d
Author: Rahul Iyer 
Authored: Sun Apr 29 21:18:35 2018 -0700
Committer: Rahul Iyer 
Committed: Fri May 11 10:30:36 2018 -0700

--
 HAWQ_Install.txt|  70 
 RELEASE_NOTES   |  21 +-
 ReadMe_Build.txt|  22 +-
 deploy/hawq_install.sh  | 199 
 methods/array_ops/src/pg_gp/array_ops.sql_in|   2 -
 .../svec_util/src/pg_gp/sql/svec_test.sql_in|   5 -
 pom.xml |  13 -
 src/config/Ports.yml|   5 +-
 src/madpack/madpack.py  | 153 +++--
 src/madpack/upgrade_util.py |  21 +-
 src/madpack/utilities.py|   2 -
 .../elastic_net/elastic_net_optimizer_fista.hpp |   8 +-
 src/modules/linalg/metric.cpp   |  94 --
 src/modules/linalg/metric.hpp   |   7 -
 src/ports/CMakeLists.txt|   1 -
 src/ports/hawq/1.2/CMakeLists.txt   |   5 -
 src/ports/hawq/1.2/config/CMakeLists.txt|  19 --
 src/ports/hawq/1.2/config/Modules.yml   |  47 ---
 src/ports/hawq/1.3/CMakeLists.txt   |   5 -
 src/ports/hawq/1.3/config/CMakeLists.txt|  19 --
 src/ports/hawq/1.3/config/Modules.yml   |  46 ---
 src/ports/hawq/2/CMakeLists.txt |  19 --
 src/ports/hawq/CMakeLists.txt   | 309 --
 src/ports/hawq/cmake/FindHAWQ.cmake |  26 --
 src/ports/hawq/cmake/FindHAWQ_1_2.cmake |   2 -
 src/ports/hawq/cmake/FindHAWQ_1_3.cmake |   2 -
 src/ports/hawq/cmake/FindHAWQ_2.cmake   |  21 --
 src/ports/hawq/cmake/HAWQUtils.cmake|  16 -
 src/ports/postgres/cmake/PostgreSQLUtils.cmake  |   6 -
 .../modules/assoc_rules/assoc_rules.sql_in  |  18 +-
 src/ports/postgres/modules/bayes/bayes.py_in|  15 +-
 src/ports/postgres/modules/convex/lmf.sql_in|   5 -
 src/ports/postgres/modules/convex/lmf_igd.py_in |   4 +-
 src/ports/postgres/modules/crf/crf.py_in|   3 +-
 .../postgres/modules/crf/crf_feature_gen.py_in  |   2 -
 .../modules/elastic_net/elastic_net.sql_in  |  57 ++--
 src/ports/postgres/modules/graph/apsp.py_in | 234 --
 src/ports/postgres/modules/graph/pagerank.py_in |   2 +-
 src/ports/postgres/modules/graph/sssp.py_in |  91 +-
 src/ports/postgres/modules/graph/wcc.py_in  |  55 +---
 src/ports/postgres/modules/kmeans/kmeans.py_in  |  74 ++---
 src/ports/postgres/modules/kmeans/kmeans.sql_in |   8 +-
 .../postgres/modules/kmeans/test/kmeans.sql_in  |   4 +-
 src/ports/postgres/modules/lda/lda.py_in|   2 +-
 src/ports/postgres/modules/linalg/linalg.sql_in |  27 --
 src/ports/postgres/modules/linalg/svd.py_in |  19 +-
 src/ports/postgres/modules/pca/pca.py_in|  20 +-
 .../modules/regress/clustered_variance.py_in|  76 +
 .../postgres/modules/regress/marginal.py_in |  39 ---
 .../modules/regress/multilogistic.py_in |  84 +
 .../postgres/modules/regress/test/linear.sql_in |   6 -
 .../modules/regress/test/logistic.sql_in|   6 +-
 .../modules/regress/test/marginal.sql_in|   2 -
 .../modules/regress/test/multilogistic.sql_in   |   6 +-
 .../postgres/modules/regress/test/robust.sql_in |   4 -
 .../modules/stats/cox_prop_hazards.py_in|  70 +---
 .../modules/stats/cox_prop_hazards.sql_in   |  10 -
 .../modules/stats/test/cox_prop_hazards.sql_in  |   4 +-
 src/ports/postgres/modules/tsa/arima.py_in  |   6 +-
 .../postgres/modules/utilities/control.py_in| 234 --
 .../modules/utilities/control_composite.py_in   | 127 ++--
 .../modules/utilities/group_control.py_in   | 321 +--
 .../postgres/modules/utilities/utilities.py_in  |  30 +-
 .../modules/utilities/validate_args.py_in   |   2 +-
 .../validation/test/cross_validation.sql_in |   3 -
 65 files changed, 460 insertions(+), 2375 deletions(-)
--



[2/3] madlib git commit: Multiple: Remove support for HAWQ from all modules

2018-05-11 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib/blob/34ca6188/src/ports/postgres/modules/convex/lmf_igd.py_in
--
diff --git a/src/ports/postgres/modules/convex/lmf_igd.py_in 
b/src/ports/postgres/modules/convex/lmf_igd.py_in
index a2d42f6..300b6d6 100644
--- a/src/ports/postgres/modules/convex/lmf_igd.py_in
+++ b/src/ports/postgres/modules/convex/lmf_igd.py_in
@@ -58,9 +58,8 @@ def compute_lmf_igd(schema_madlib, rel_args, rel_state, 
rel_source,
 (_src.{col_row})::integer,
 (_src.{col_column})::integer,
 (_src.{col_value})::integer,
-m4_ifdef(`__HAWQ__', `{{__state__}}', `
 (SELECT _state FROM {rel_state}
-WHERE _iteration = {iteration})'),
+WHERE _iteration = {iteration}),
 (_args.row_dim)::integer,
 (_args.column_dim)::integer,
 (_args.max_rank)::integer,
@@ -75,4 +74,3 @@ def compute_lmf_igd(schema_madlib, rel_args, rel_state, 
rel_source,
 """):
 break
 return iterationCtrl.iteration
-

http://git-wip-us.apache.org/repos/asf/madlib/blob/34ca6188/src/ports/postgres/modules/crf/crf.py_in
--
diff --git a/src/ports/postgres/modules/crf/crf.py_in 
b/src/ports/postgres/modules/crf/crf.py_in
index dfd4754..2eaa1e8 100644
--- a/src/ports/postgres/modules/crf/crf.py_in
+++ b/src/ports/postgres/modules/crf/crf.py_in
@@ -80,8 +80,7 @@ def __runIterativeAlg(stateType, initialState, source, 
updateExpr,
 SET client_min_messages = error;
 DROP TABLE IF EXISTS _madlib_iterative_alg;
 CREATE TEMPORARY TABLE _madlib_iterative_alg (
-_madlib_iteration INTEGER
-m4_ifdef(`__HAWQ__', `', ` PRIMARY KEY'),
+_madlib_iteration INTEGER PRIMARY KEY,
 _madlib_state {stateType}
 )
 m4_ifdef(`__POSTGRESQL__', `', `DISTRIBUTED BY (_madlib_iteration)');

http://git-wip-us.apache.org/repos/asf/madlib/blob/34ca6188/src/ports/postgres/modules/crf/crf_feature_gen.py_in
--
diff --git a/src/ports/postgres/modules/crf/crf_feature_gen.py_in 
b/src/ports/postgres/modules/crf/crf_feature_gen.py_in
index 3fa6d40..f55037f 100644
--- a/src/ports/postgres/modules/crf/crf_feature_gen.py_in
+++ b/src/ports/postgres/modules/crf/crf_feature_gen.py_in
@@ -238,12 +238,10 @@ def generate_test_features(schema_madlib, 
test_segment_tbl,
 
 rtbl_name_idx = add_postfix(rtbl_name, "_idx")
 
-m4_ifdef(`__HAWQ__', `', `
 plpy.execute("""
 CREATE INDEX {rtbl_name_idx} ON {viterbi_rtbl} (seg_text)
 """.format(rtbl_name_idx = rtbl_name_idx,
 viterbi_rtbl = viterbi_rtbl))
-')
 
 origClientMinMessages =  plpy.execute("""SELECT setting AS setting
  FROM pg_settings WHERE name = 
\'client_min_messages\';""")

http://git-wip-us.apache.org/repos/asf/madlib/blob/34ca6188/src/ports/postgres/modules/elastic_net/elastic_net.sql_in
--
diff --git a/src/ports/postgres/modules/elastic_net/elastic_net.sql_in 
b/src/ports/postgres/modules/elastic_net/elastic_net.sql_in
index f367774..5ea2efb 100644
--- a/src/ports/postgres/modules/elastic_net/elastic_net.sql_in
+++ b/src/ports/postgres/modules/elastic_net/elastic_net.sql_in
@@ -129,7 +129,7 @@ A value of 1 means L1 regularization, and a value of 0 
means L2 regularization.<
 FLOAT8. Regularization parameter (must be positive).
 
 standardize (optional)
-BOOLEAN, default: TRUE. Whether to normalize the data or not. 
+BOOLEAN, default: TRUE. Whether to normalize the data or not.
 Setting to TRUE usually yields better results and faster convergence.
 
 grouping_col (optional)
@@ -141,14 +141,14 @@ a single model is generated for all data.
 @note Expressions are not currently supported for 'grouping_col'.
 
 optimizer (optional)
-TEXT, default: 'fista'. Name of optimizer, either 'fista' or 'igd'.  
-FISTA [2] is an algorithm with a fast global rate of convergence for 
+TEXT, default: 'fista'. Name of optimizer, either 'fista' or 'igd'.
+FISTA [2] is an algorithm with a fast global rate of convergence for
 solving linear inverse problems. Incremental gradient descent (IGD)
 is a stochastic approach to minimizing an objective function [4].
 
 optimizer_params (optional)
-TEXT, default: NULL. Optimizer parameters, delimited with commas. 
-These parameters differ depending on the value of \e optimizer parameter. 
+TEXT, default: NULL. Optimizer parameters, delimited with commas.
+These parameters differ depending on the value of \e optimizer parameter.
 See the 

[42/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation

2018-05-02 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/dense__linear__systems_8sql__in.html
--
diff --git a/docs/v1.14/dense__linear__systems_8sql__in.html 
b/docs/v1.14/dense__linear__systems_8sql__in.html
new file mode 100644
index 000..c47dffe
--- /dev/null
+++ b/docs/v1.14/dense__linear__systems_8sql__in.html
@@ -0,0 +1,640 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: dense_linear_systems.sql_in File Reference
+
+
+
+
+
+
+
+
+  $(document).ready(initResizable);
+
+
+
+
+
+  $(document).ready(function() { init_search(); });
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+http://cdn.mathjax.org/mathjax/latest/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.14
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+$(document).ready(function(){initNavTree('dense__linear__systems_8sql__in.html','');});
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Functions  
+  
+dense_linear_systems.sql_in File Reference  
+
+
+
+SQL functions for linear systems.  
+More...
+
+
+Functions
+bytea8dense_residual_norm_transition
 (bytea8 state, float8[] a, float8 b, float8[] x)
+
+bytea8dense_residual_norm_merge_states
 (bytea8 state1, bytea8 state2)
+
+residual_norm_resultdense_residual_norm_final
 (bytea8 state)
+
+aggregate residual_norm_resultdense_residual_norm
 (float8[] left_hand_side, float8 right_hand_side, float8[] solution)
+Compute the residual after 
solving the dense linear systems.  More...
+
+float8 []dense_direct_linear_system_transition
 (float8[] state, integer row_id, float8[] a, float8 b, integer num_rows, 
integer algorithm)
+
+float8 []dense_direct_linear_system_merge_states
 (float8[] state1, float8[] state2)
+
+dense_linear_solver_resultdense_direct_linear_system_final
 (float8[] state)
+
+aggregate dense_linear_solver_resultdense_direct_linear_system
 (integer row_id, float8[] left_hand_side, float8 right_hand_side, integer 
numEquations, integer algorithm)
+Solve a system of linear 
equations using the direct method.  More...
+
+varcharlinear_solver_dense
 (varchar input_string)
+Help function, to print out 
the supported families.  More...
+
+varcharlinear_solver_dense
 ()
+
+voidlinear_solver_dense
 (varchar source_table, varchar out_table, varchar row_id, varchar 
left_hand_side, varchar right_hand_side, varchar grouping_cols, varchar 
optimizer, varchar optimizer_options)
+A wrapper function for the 
various marginal linear_systemsion analyzes.  More...
+
+voidlinear_solver_dense
 (varchar source_table, varchar out_table, varchar row_id, varchar 
left_hand_side, varchar right_hand_side)
+Marginal effects with 
default variables.  More...
+
+
+Detailed 
Description
+DateJuly 
2013
+See alsoComputes the solution of a 
consistent linear system, for more details see the module description at Dense Linear 
Systems 
+Function Documentation
+
+dense_direct_linear_system()
+
+
+
+  
+
+  aggregate dense_linear_solver_result 
dense_direct_linear_system 
+  (
+  integer
+  row_id, 
+
+
+  
+  
+  float8 []
+  left_hand_side, 
+
+
+  
+  
+  float8
+  right_hand_side, 
+
+
+  
+  
+  integer
+  numEquations, 
+
+
+  
+  
+  integer
+  algorithm
+
+
+  
+  )
+  
+
+  
+
+Parameters
+  
+row_idColumn containing the row_id 

+left_hand_sideColumn containing the 
left hand side of the system 
+right_hand_sideColumn containing the 
right hand side of the system 
+numEquationsNumber of equations 

+algorithmAlgorithm used for the dense 
linear solver
+  
+  
+
+ReturnsA composite value:
+solution FLOAT8[]  - Array of marginal effects
+residual_norm FLOAT8 - Norm of the residual
+iters INTEGER - Iterations taken
+
+
+Usage
+Get all the diagnostic statistics:
+  SELECT linear_system_dense(row_id,
+  left_hand_side,
+   

[47/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation

2018-05-02 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/clustered__variance__coxph_8sql__in.html
--
diff --git a/docs/v1.14/clustered__variance__coxph_8sql__in.html 
b/docs/v1.14/clustered__variance__coxph_8sql__in.html
new file mode 100644
index 000..3cf0e4f
--- /dev/null
+++ b/docs/v1.14/clustered__variance__coxph_8sql__in.html
@@ -0,0 +1,489 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: clustered_variance_coxph.sql_in File Reference
+
+
+
+
+
+
+
+
+  $(document).ready(initResizable);
+
+
+
+
+
+  $(document).ready(function() { init_search(); });
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+http://cdn.mathjax.org/mathjax/latest/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.14
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+$(document).ready(function(){initNavTree('clustered__variance__coxph_8sql__in.html','');});
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Functions  
+  
+clustered_variance_coxph.sql_in File Reference  
+
+
+
+SQL functions for clustered robust cox proportional hazards regression.  
+More...
+
+
+Functions
+varcharclustered_variance_coxph
 ()
+
+varcharclustered_variance_coxph
 (varchar message)
+
+voidclustered_variance_coxph
 (text model_table, text output_table, text clustervar)
+
+float8 []coxph_a_b_transition
 (float8[], integer, boolean, float8[], float8)
+
+float8 []coxph_a_b_merge
 (float8[], float8[])
+
+__coxph_a_b_resultcoxph_a_b_final
 (float8[])
+
+aggregate __coxph_a_b_resultcoxph_a_b
 (integer, boolean, float8[], float8)
+
+float8 []coxph_compute_w
 (float8[] x, boolean status, float8[] coef, float8[] h, float8 s, float8 a, 
float8[] b)
+
+__coxph_cl_var_resultcoxph_compute_clustered_stats
 (float8[] coef, float8[] hessian, float8[] a)
+
+voidrobust_variance_coxph
 (varchar model_table, varchar output_table, varchar clustervar)
+
+
+Detailed 
Description
+DateOct 
2013
+See alsoFor a brief introduction to 
clustered robust cox regression, see the module description Clustered Variance 
+Function Documentation
+
+clustered_variance_coxph()
 [1/3]
+
+
+
+  
+
+  varchar clustered_variance_coxph 
+  (
+  )
+  
+
+  
+
+
+
+
+
+clustered_variance_coxph()
 [2/3]
+
+
+
+  
+
+  varchar clustered_variance_coxph 
+  (
+  varchar
+  message)
+  
+
+  
+
+
+
+
+
+clustered_variance_coxph()
 [3/3]
+
+
+
+  
+
+  void clustered_variance_coxph 
+  (
+  text
+  model_table, 
+
+
+  
+  
+  text
+  output_table, 
+
+
+  
+  
+  text
+  clustervar
+
+
+  
+  )
+  
+
+  
+
+
+
+
+
+coxph_a_b()
+
+
+
+  
+
+  aggregate __coxph_a_b_result coxph_a_b 
+  (
+  integer
+  , 
+
+
+  
+  
+  boolean
+  , 
+
+
+  
+  
+  float8
+  [], 
+
+
+  
+  
+  float8
+  
+
+
+  
+  )
+  
+
+  
+
+
+
+
+
+coxph_a_b_final()
+
+
+
+  
+
+  __coxph_a_b_result coxph_a_b_final 
+  (
+  float8
+  [])
+  
+
+  
+
+
+
+
+
+coxph_a_b_merge()
+
+
+
+  
+
+  float8 [] coxph_a_b_merge 
+  (
+  float8
+  [], 
+
+
+  
+  
+  float8
+  []
+
+
+  
+  )
+  
+
+  
+
+
+
+
+
+coxph_a_b_transition()
+
+
+
+  
+
+  float8 [] coxph_a_b_transition 
+  (
+  float8
+  [], 
+
+
+  
+  
+  integer
+  , 
+
+
+  
+  
+  boolean
+  , 
+
+
+  
+  
+  float8
+  [], 
+
+   

[31/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation

2018-05-02 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/group__grp__bayes.html
--
diff --git a/docs/v1.14/group__grp__bayes.html 
b/docs/v1.14/group__grp__bayes.html
new file mode 100644
index 000..5cfa013
--- /dev/null
+++ b/docs/v1.14/group__grp__bayes.html
@@ -0,0 +1,488 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: Naive Bayes Classification
+
+
+
+
+
+
+
+
+  $(document).ready(initResizable);
+
+
+
+
+
+  $(document).ready(function() { init_search(); });
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+http://cdn.mathjax.org/mathjax/latest/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.14
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+$(document).ready(function(){initNavTree('group__grp__bayes.html','');});
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Naive Bayes ClassificationEarly Stage 
Development  
+
+
+Contents 
+
+Training Function(s) 
+
+Classify Function(s) 
+
+Probabilities Function(s) 
+
+Ad Hoc Computation 
+
+Implementation Notes 
+
+Examples 
+
+Technical Background 
+
+Related Topics 
+
+Warning This MADlib method 
is still in early stage development. There may be some issues that will be 
addressed in a future version. Interface and implementation is subject to 
change. 
+Naive Bayes refers to a stochastic model where all independent variables \( 
a_1, \dots, a_n \) (often referred to as attributes in this context) 
independently contribute to the probability that a data point belongs to a 
certain class \( c \).
+Naives Bayes classification estimates feature probabilities and class 
priors using maximum likelihood or Laplacian smoothing. For numeric attributes, 
Gaussian smoothing can be used to estimate the feature probabilities.These 
parameters are then used to classify new data.
+Training 
Function(s)
+For data with only categorical attributes, precompute feature probabilities 
and class priors using the following function:
+
+create_nb_prepared_data_tables ( trainingSource,
+ trainingClassColumn,
+ trainingAttrColumn,
+ numAttrs,
+ featureProbsName,
+ classPriorsName
+   )
+For data containing both categorical and numeric attributes, use the 
following form to precompute the Gaussian parameters (mean and variance) for 
numeric attributes alongside the feature probabilities for categorical 
attributes and class priors.
+
+create_nb_prepared_data_tables ( trainingSource,
+ trainingClassColumn,
+ trainingAttrColumn,
+ numericAttrsColumnIndices,
+ numAttrs,
+ featureProbsName,
+ numericAttrParamsName,
+ classPriorsName
+   )
+The trainingSource is expected to be of the following form: 
{TABLE|VIEW} trainingSource (
+...
+trainingClassColumn INTEGER,
+trainingAttrColumn INTEGER[] OR NUMERIC[] OR FLOAT8[],
+...
+)numericAttrsColumnIndices should be of type TEXT, specified 
as an array of indices (starting from 1) in the trainingAttrColumn 
attributes-array that correspond to numeric attributes.
+The two output tables are:
+featureProbsName  stores feature probabilities
+classPriorsName  stores the class priors
+
+In addition to the above, if the function specifying numeric attributes is 
used, an additional table numericAttrParamsName is created which 
stores the Gaussian parameters for the numeric attributes.
+Classify Function(s)
+Perform Naive Bayes classification: 
+create_nb_classify_view ( featureProbsName,
+  classPriorsName,
+  classifySource,
+  classifyKeyColumn,
+  classifyAttrColumn,
+  numAttrs,
+  destName
+)
+For data with numeric 

[44/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation

2018-05-02 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/cross__validation_8sql__in.html
--
diff --git a/docs/v1.14/cross__validation_8sql__in.html 
b/docs/v1.14/cross__validation_8sql__in.html
new file mode 100644
index 000..5d0c8b1
--- /dev/null
+++ b/docs/v1.14/cross__validation_8sql__in.html
@@ -0,0 +1,710 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: cross_validation.sql_in File Reference
+
+
+
+
+
+
+
+
+  $(document).ready(initResizable);
+
+
+
+
+
+  $(document).ready(function() { init_search(); });
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+http://cdn.mathjax.org/mathjax/latest/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.14
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+$(document).ready(function(){initNavTree('cross__validation_8sql__in.html','');});
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Functions  
+  
+cross_validation.sql_in File Reference  
+
+
+
+SQL functions for cross validation.  
+More...
+
+
+Functions
+voidcross_validation_general
 (varchar modelling_func, varchar[] modelling_params, varchar[] 
modelling_params_type, varchar param_explored, varchar[] explore_values, 
varchar predict_func, varchar[] predict_params, varchar[] predict_params_type, 
varchar metric_func, varchar[] metric_params, varchar[] metric_params_type, 
varchar data_tbl, varchar data_id, boolean id_is_random, varchar 
validation_result, varchar[] data_cols, integer n_folds)
+
+voidcross_validation_general
 (varchar modelling_func, varchar[] modelling_params, varchar[] 
modelling_params_type, varchar param_explored, varchar[] explore_values, 
varchar predict_func, varchar[] predict_params, varchar[] predict_params_type, 
varchar metric_func, varchar[] metric_params, varchar[] metric_params_type, 
varchar data_tbl, varchar data_id, boolean id_is_random, varchar 
validation_result, varchar[] data_cols)
+
+voidcv_linregr_train
 (varchar tbl_source, varchar col_ind_var, varchar col_dep_var, varchar 
tbl_result)
+A wrapper for linear 
regression.  More...
+
+voidcv_linregr_predict
 (varchar tbl_model, varchar tbl_newdata, varchar col_ind_var, varchar col_id, 
varchar tbl_predict)
+A wrapper for linear 
regression prediction.  More...
+
+voidmse_error
 (varchar tbl_prediction, varchar tbl_actual, varchar id_actual, varchar 
values_actual, varchar tbl_error)
+
+voidmisclassification_avg
 (varchar tbl_prediction, varchar tbl_actual, varchar id_actual, varchar 
values_actual, varchar tbl_error)
+
+voidcv_logregr_predict
 (varchar tbl_model, varchar tbl_newdata, varchar col_ind_var, varchar col_id, 
varchar tbl_predict)
+A prediction function for 
logistic regression The result is stored in the table of tbl_predict.  More...
+
+integerlogregr_accuracy
 (float8[] coef, float8[] col_ind, boolean col_dep)
+Metric function for 
logistic regression.  More...
+
+voidcv_logregr_accuracy
 (varchar tbl_predict, varchar tbl_source, varchar col_id, varchar col_dep_var, 
varchar tbl_accuracy)
+Metric function for 
logistic regression.  More...
+
+
+Detailed 
Description
+DateJanuary 
2011
+See alsoFor a brief introduction to the 
usage of cross validation, see the module description Cross Validation. 
+Function Documentation
+
+cross_validation_general()
 [1/2]
+
+
+
+  
+
+  void cross_validation_general 
+  (
+  varchar
+  modelling_func, 
+
+
+  
+  
+  varchar []
+  modelling_params, 
+
+
+  
+  
+  varchar []
+  modelling_params_type, 
+
+
+  
+  
+  varchar
+  param_explored, 
+
+
+  
+  
+  varchar []
+  explore_values, 
+
+
+  
+  
+  varchar
+  predict_func, 
+
+
+  
+  
+  varchar []
+  predict_params, 
+
+
+  
+  
+  varchar []
+  predict_params_type, 
+
+
+  
+  
+  varchar
+  

[40/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation

2018-05-02 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/dir_71a41f8b7207fbbc465a4e4d95589314.html
--
diff --git a/docs/v1.14/dir_71a41f8b7207fbbc465a4e4d95589314.html 
b/docs/v1.14/dir_71a41f8b7207fbbc465a4e4d95589314.html
new file mode 100644
index 000..22fd7cd
--- /dev/null
+++ b/docs/v1.14/dir_71a41f8b7207fbbc465a4e4d95589314.html
@@ -0,0 +1,135 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: ports Directory Reference
+
+
+
+
+
+
+
+
+  $(document).ready(initResizable);
+
+
+
+
+
+  $(document).ready(function() { init_search(); });
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+http://cdn.mathjax.org/mathjax/latest/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.14
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+$(document).ready(function(){initNavTree('dir_71a41f8b7207fbbc465a4e4d95589314.html','');});
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+ports Directory Reference  
+
+
+
+
+Directories
+directory postgres
+
+
+
+
+
+
+  
+incubator-madlibsrcports
+Generated on Wed May 2 2018 13:00:12 for MADlib by
+http://www.doxygen.org/index.html;>
+ 1.8.13 
+  
+
+
+

http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/dir_745a5b6eaaef3a7f811e3c789eb52f97.html
--
diff --git a/docs/v1.14/dir_745a5b6eaaef3a7f811e3c789eb52f97.html 
b/docs/v1.14/dir_745a5b6eaaef3a7f811e3c789eb52f97.html
new file mode 100644
index 000..51697ac
--- /dev/null
+++ b/docs/v1.14/dir_745a5b6eaaef3a7f811e3c789eb52f97.html
@@ -0,0 +1,135 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: src Directory Reference
+
+
+
+
+
+
+
+
+  $(document).ready(initResizable);
+
+
+
+
+
+  $(document).ready(function() { init_search(); });
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+http://cdn.mathjax.org/mathjax/latest/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.14
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+$(document).ready(function(){initNavTree('dir_745a5b6eaaef3a7f811e3c789eb52f97.html','');});
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+src Directory Reference  
+
+
+
+
+Directories
+directory pg_gp
+
+
+
+
+
+
+  
+incubator-madlibmethodssvecsrc
+Generated on Wed May 2 2018 13:00:12 for MADlib by
+http://www.doxygen.org/index.html;>
+ 1.8.13 
+  
+
+
+

http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/dir_7826d1d18040ad5cc29c8c0a0584577d.html
--
diff --git a/docs/v1.14/dir_7826d1d18040ad5cc29c8c0a0584577d.html 
b/docs/v1.14/dir_7826d1d18040ad5cc29c8c0a0584577d.html
new file mode 100644
index 000..01ecb91
--- /dev/null
+++ b/docs/v1.14/dir_7826d1d18040ad5cc29c8c0a0584577d.html
@@ -0,0 +1,135 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: src Directory Reference
+
+
+
+
+
+
+
+
+  $(document).ready(initResizable);
+
+
+
+
+
+  $(document).ready(function() { init_search(); });
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+http://cdn.mathjax.org/mathjax/latest/MathJax.js";>
+
+
+
+
+
+
+  

[37/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation

2018-05-02 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/eigen_navtree_hacks.js
--
diff --git a/docs/v1.14/eigen_navtree_hacks.js 
b/docs/v1.14/eigen_navtree_hacks.js
new file mode 100644
index 000..ee72246
--- /dev/null
+++ b/docs/v1.14/eigen_navtree_hacks.js
@@ -0,0 +1,236 @@
+var arrowRight = '';
+
+// generate a table of contents in the side-nav based on the h1/h2 tags of the 
current page.
+function generate_autotoc() {
+  var headers = $("h1, h2");
+  if(headers.length > 1) {
+var toc = $("#side-nav").append('Table 
of contents');
+toc = $("#nav-toc");
+var footerHeight = footer.height();
+toc = toc.append('');
+toc = toc.find('ul');
+var indices = new Array();
+indices[0] = 0;
+indices[1] = 0;
+
+var h1counts = $("h1").length;
+headers.each(function(i) {
+  var current = $(this);
+  var levelTag = current[0].tagName.charAt(1);
+  if(h1counts==0)
+levelTag--;
+  var cur_id = current.attr("id");
+
+  indices[levelTag-1]+=1;
+  var prefix = indices[0];
+  if (levelTag >1) {
+prefix+="."+indices[1];
+  }
+
+  // Uncomment to add number prefixes
+  // current.html(prefix + "   " + current.html());
+  for(var l = levelTag; l < 2; ++l){
+  indices[l] = 0;
+  }
+
+  if(cur_id == undefined) {
+current.attr('id', 'title' + i);
+current.addClass('anchor');
+toc.append("" + 
current.text() + "");
+  } else {
+toc.append("" + 
current.text() + "");
+  }
+});
+resizeHeight();
+  }
+}
+
+
+var global_navtree_object;
+
+// Overloaded to remove links to sections/subsections
+function getNode(o, po)
+{
+  po.childrenVisited = true;
+  var l = po.childrenData.length-1;
+  for (var i in po.childrenData) {
+var nodeData = po.childrenData[i];
+if((!nodeData[1]) ||  (nodeData[1].indexOf('#')==-1)) // <- we added this 
line
+  po.children[i] = newNode(o, po, nodeData[0], nodeData[1], nodeData[2], 
i==l);
+  }
+}
+
+// Overloaded to adjust the size of the navtree wrt the toc
+function resizeHeight()
+{
+  var toc = $("#nav-toc");
+  var tocHeight = toc.height();  // <- we added this line
+  var headerHeight = header.height();
+  var footerHeight = footer.height();
+  var windowHeight = $(window).height() - headerHeight - footerHeight;
+  content.css({height:windowHeight + "px"});
+  navtree.css({height:(windowHeight-tocHeight) + "px"}); // <- we modified 
this line
+  sidenav.css({height:(windowHeight) + "px",top: headerHeight+"px"});
+}
+
+// Overloaded to save the root node into global_navtree_object
+function initNavTree(toroot,relpath)
+{
+  var o = new Object();
+  global_navtree_object = o; // <- we added this line
+  o.toroot = toroot;
+  o.node = new Object();
+  o.node.li = document.getElementById("nav-tree-contents");
+  o.node.childrenData = NAVTREE;
+  o.node.children = new Array();
+  o.node.childrenUL = document.createElement("ul");
+  o.node.getChildrenUL = function() { return o.node.childrenUL; };
+  o.node.li.appendChild(o.node.childrenUL);
+  o.node.depth = 0;
+  o.node.relpath = relpath;
+  o.node.expanded = false;
+  o.node.isLast = true;
+  o.node.plus_img = document.createElement("span");
+  o.node.plus_img.className = 'arrow';
+  o.node.plus_img.innerHTML = arrowRight;
+
+  if (localStorageSupported()) {
+var navSync = $('#nav-sync');
+if (cachedLink()) {
+  showSyncOff(navSync,relpath);
+  navSync.removeClass('sync');
+} else {
+  showSyncOn(navSync,relpath);
+}
+navSync.click(function(){ toggleSyncButton(relpath); });
+  }
+
+  navTo(o,toroot,window.location.hash,relpath);
+
+  $(window).bind('hashchange', function(){
+ if (window.location.hash && window.location.hash.length>1){
+   var a;
+   if ($(location).attr('hash')){
+ var clslink=stripPath($(location).attr('pathname'))+':'+
+   $(location).attr('hash').substring(1);
+ a=$('.item a[class$="'+clslink+'"]');
+   }
+   if (a==null || !$(a).parent().parent().hasClass('selected')){
+ $('.item').removeClass('selected');
+ $('.item').removeAttr('id');
+   }
+   var link=stripPath2($(location).attr('pathname'));
+   navTo(o,link,$(location).attr('hash'),relpath);
+ } else if (!animationInProgress) {
+   $('#doc-content').scrollTop(0);
+   $('.item').removeClass('selected');
+   $('.item').removeAttr('id');
+   navTo(o,toroot,window.location.hash,relpath);
+ }
+  })
+
+  $(window).load(showRoot);
+}
+
+// return false if the the node has no children at all, or has only 
section/subsection children
+function checkChildrenData(node) {
+  if (!(typeof(node.childrenData)==='string')) {
+for (var i in node.childrenData) {
+  var url = node.childrenData[i][1];
+  if(url.indexOf("#")==-1)
+return true;
+}
+return false;
+  

[34/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation

2018-05-02 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/glm_8sql__in.html
--
diff --git a/docs/v1.14/glm_8sql__in.html b/docs/v1.14/glm_8sql__in.html
new file mode 100644
index 000..c9194c6
--- /dev/null
+++ b/docs/v1.14/glm_8sql__in.html
@@ -0,0 +1,1919 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: glm.sql_in File Reference
+
+
+
+
+
+
+
+
+  $(document).ready(initResizable);
+
+
+
+
+
+  $(document).ready(function() { init_search(); });
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+http://cdn.mathjax.org/mathjax/latest/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.14
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+$(document).ready(function(){initNavTree('glm_8sql__in.html','');});
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Functions  
+  
+glm.sql_in File Reference  
+
+
+
+SQL functions for GLM (Poisson)  
+More...
+
+
+Functions
+bytea8__glm_merge_states
 (bytea8 state1, bytea8 state2)
+
+bytea8__glm_final 
(bytea8 state)
+
+bytea8__glm_poisson_log_transition
 (bytea8, float8, float8[], bytea8)
+
+aggregate bytea8__glm_poisson_log_agg
 (float8 y, float8[] x, bytea8 previous_state)
+
+bytea8__glm_poisson_identity_transition
 (bytea8, float8, float8[], bytea8)
+
+aggregate bytea8__glm_poisson_identity_agg
 (float8 y, float8[] x, bytea8 previous_state)
+
+bytea8__glm_poisson_sqrt_transition
 (bytea8, float8, float8[], bytea8)
+
+aggregate bytea8__glm_poisson_sqrt_agg
 (float8 y, float8[] x, bytea8 previous_state)
+
+bytea8__glm_gaussian_identity_transition
 (bytea8, float8, float8[], bytea8)
+
+aggregate bytea8__glm_gaussian_identity_agg
 (float8 y, float8[] x, bytea8 previous_state)
+
+bytea8__glm_gaussian_log_transition
 (bytea8, float8, float8[], bytea8)
+
+aggregate bytea8__glm_gaussian_log_agg
 (float8 y, float8[] x, bytea8 previous_state)
+
+bytea8__glm_gaussian_inverse_transition
 (bytea8, float8, float8[], bytea8)
+
+aggregate bytea8__glm_gaussian_inverse_agg
 (float8 y, float8[] x, bytea8 previous_state)
+
+bytea8__glm_gamma_log_transition
 (bytea8, float8, float8[], bytea8)
+
+aggregate bytea8__glm_gamma_log_agg
 (float8 y, float8[] x, bytea8 previous_state)
+
+bytea8__glm_gamma_inverse_transition
 (bytea8, float8, float8[], bytea8)
+
+aggregate bytea8__glm_gamma_inverse_agg
 (float8 y, float8[] x, bytea8 previous_state)
+
+bytea8__glm_gamma_identity_transition
 (bytea8, float8, float8[], bytea8)
+
+aggregate bytea8__glm_gamma_identity_agg
 (float8 y, float8[] x, bytea8 previous_state)
+
+bytea8__glm_binomial_probit_transition
 (bytea8, float8, float8[], bytea8)
+
+aggregate bytea8__glm_binomial_probit_agg
 (float8 y, float8[] x, bytea8 previous_state)
+
+bytea8__glm_inverse_gaussian_identity_transition
 (bytea8, float8, float8[], bytea8)
+
+bytea8__glm_binomial_logit_transition
 (bytea8, float8, float8[], bytea8)
+
+aggregate bytea8__glm_binomial_logit_agg
 (float8 y, float8[] x, bytea8 previous_state)
+
+__glm_result_type__glm_result_z_stats
 (bytea8 state)
+
+aggregate bytea8__glm_inverse_gaussian_identity_agg
 (float8 y, float8[] x, bytea8 previous_state)
+
+bytea8__glm_inverse_gaussian_log_transition
 (bytea8, float8, float8[], bytea8)
+
+aggregate bytea8__glm_inverse_gaussian_log_agg
 (float8 y, float8[] x, bytea8 previous_state)
+
+bytea8__glm_inverse_gaussian_inverse_transition
 (bytea8, float8, float8[], bytea8)
+
+aggregate bytea8__glm_inverse_gaussian_inverse_agg
 (float8 y, float8[] x, bytea8 previous_state)
+
+bytea8__glm_inverse_gaussian_sqr_inverse_transition
 (bytea8, float8, float8[], bytea8)
+
+aggregate bytea8__glm_inverse_gaussian_sqr_inverse_agg
 (float8 y, float8[] x, bytea8 previous_state)
+
+__glm_result_type__glm_result_t_stats
 (bytea8 state)
+
+float8__glm_loglik_diff
 (bytea8 state1, bytea8 state2)
+
+voidglm (varchar 
source_table, varchar model_table, varchar dependent_varname, varchar 
independent_varname, varchar family_params, varchar grouping_col, varchar 
optim_params, boolean verbose)
+
+voidglm (varchar 
source_table, varchar model_table, varchar dependent_varname, varchar 
independent_varname, varchar family_params, varchar 

[33/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation

2018-05-02 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/graph_legend.html
--
diff --git a/docs/v1.14/graph_legend.html b/docs/v1.14/graph_legend.html
new file mode 100644
index 000..65dbfe7
--- /dev/null
+++ b/docs/v1.14/graph_legend.html
@@ -0,0 +1,154 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: Graph Legend
+
+
+
+
+
+
+
+
+  $(document).ready(initResizable);
+
+
+
+
+
+  $(document).ready(function() { init_search(); });
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+http://cdn.mathjax.org/mathjax/latest/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.14
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+$(document).ready(function(){initNavTree('graph_legend.html','');});
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Graph Legend  
+
+
+This page explains how to interpret the graphs that are generated by 
doxygen.
+Consider the following example: /*! Invisible class because of truncation */class Invisible { };/*! Truncated class, inheritance relation is hidden */class Truncated : public Invisible { };/* Class not documented with doxygen 
comments */class Undocumented { };/*! Class that is inherited using public 
inheritance */class PublicBase : public Truncated { 
};/*! A template class 
*/templateclass T class Templ { };/*! Class that is inherited using 
protected inheritance */class ProtectedBase { 
};/*! Class that is 
 inherited using private inheritance */class 
PrivateBase { };/*! Class that 
is used by the Inherited class */class Used { 
};/*! Super class that inherits 
a number of other classes */class Inherited : public 
PublicBase,  protected 
ProtectedBase,  private 
PrivateBase,  public 
Undocumented,  public 
Templint{  
private:Used *m_usedClass;}; This will result in the 
following graph:
+This browser is not able to show SVG: try 
Firefox, Chrome, Safari, or Opera instead. The 
boxes in the above graph have the following meaning: 
+
+
+A filled gray box represents the struct or class for which the graph is 
generated. 
+
+A box with a black border denotes a documented struct or class. 
+
+A box with a gray border denotes an undocumented struct or class. 
+
+A box with a red border denotes a documented struct or class forwhich not all 
inheritance/containment relations are shown. A graph is truncated if it does 
not fit within the specified boundaries. 
+
+The arrows have the following meaning: 
+
+
+A dark blue arrow is used to visualize a public inheritance relation between 
two classes. 
+
+A dark green arrow is used for protected inheritance. 
+
+A dark red arrow is used for private inheritance. 
+
+A purple dashed arrow is used if a class is contained or used by another 
class. The arrow is labelled with the variable(s) through which the pointed 
class or struct is accessible. 
+
+A yellow dashed arrow denotes a relation between a template instance and the 
template class it was instantiated from. The arrow is labelled with the 
template parameters of the instance. 
+
+
+
+
+
+  
+Generated on Wed May 2 2018 13:00:12 for MADlib by
+http://www.doxygen.org/index.html;>
+ 1.8.13 
+  
+
+
+

http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/graph_legend.md5
--
diff --git a/docs/v1.14/graph_legend.md5 b/docs/v1.14/graph_legend.md5
new file mode 100644
index 000..a06ed05
--- /dev/null
+++ b/docs/v1.14/graph_legend.md5
@@ -0,0 +1 @@
+387ff8eb65306fa251338d3c9bd7bfff
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/graph_legend.svg
--
diff --git a/docs/v1.14/graph_legend.svg b/docs/v1.14/graph_legend.svg
new file mode 100644
index 000..273f5fd
--- /dev/null
+++ b/docs/v1.14/graph_legend.svg
@@ -0,0 +1,138 @@
+
+http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd;>
+
+
+http://www.w3.org/2000/svg; 
xmlns:xlink="http://www.w3.org/1999/xlink;>
+
+Graph Legend
+
+
+
+Node9
+

[36/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation

2018-05-02 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/elastic__net_8sql__in.html
--
diff --git a/docs/v1.14/elastic__net_8sql__in.html 
b/docs/v1.14/elastic__net_8sql__in.html
new file mode 100644
index 000..168da54
--- /dev/null
+++ b/docs/v1.14/elastic__net_8sql__in.html
@@ -0,0 +1,2476 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: elastic_net.sql_in File Reference
+
+
+
+
+
+
+
+
+  $(document).ready(initResizable);
+
+
+
+
+
+  $(document).ready(function() { init_search(); });
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+http://cdn.mathjax.org/mathjax/latest/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.14
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+$(document).ready(function(){initNavTree('elastic__net_8sql__in.html','');});
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+Functions  
+  
+elastic_net.sql_in File Reference  
+
+
+
+SQL functions for elastic net regularization.  
+More...
+
+
+Functions
+voidelastic_net_train
 (text tbl_source, text tbl_result, text col_dep_var, text col_ind_var, text 
regress_family, float8 alpha, float8 lambda_value, boolean standardize, text 
grouping_col, text optimizer, text optimizer_params, text excluded, integer 
max_iter, float8 tolerance)
+Interface for elastic net.  
More...
+
+voidelastic_net_train
 (text tbl_source, text tbl_result, text col_ind_var, text col_dep_var, text 
regress_family, float8 alpha, float8 lambda_value, boolean standardization, 
text grouping_columns, text optimizer, text optimizer_params, text excluded, 
integer max_iter)
+
+voidelastic_net_train
 (text tbl_source, text tbl_result, text col_ind_var, text col_dep_var, text 
regress_family, float8 alpha, float8 lambda_value, boolean standardization, 
text grouping_columns, text optimizer, text optimizer_params, text 
excluded)
+
+voidelastic_net_train
 (text tbl_source, text tbl_result, text col_ind_var, text col_dep_var, text 
regress_family, float8 alpha, float8 lambda_value, boolean standardization, 
text grouping_columns, text optimizer, text optimizer_params)
+
+voidelastic_net_train
 (text tbl_source, text tbl_result, text col_ind_var, text col_dep_var, text 
regress_family, float8 alpha, float8 lambda_value, boolean standardization, 
text grouping_columns, text optimizer)
+
+voidelastic_net_train
 (text tbl_source, text tbl_result, text col_ind_var, text col_dep_var, text 
regress_family, float8 alpha, float8 lambda_value, boolean standardization, 
text grouping_columns)
+
+voidelastic_net_train
 (text tbl_source, text tbl_result, text col_ind_var, text col_dep_var, text 
regress_family, float8 alpha, float8 lambda_value, boolean 
standardization)
+
+voidelastic_net_train
 (text tbl_source, text tbl_result, text col_ind_var, text col_dep_var, text 
regress_family, float8 alpha, float8 lambda_value)
+
+textelastic_net_train
 ()
+Help function, to print out 
the supported families.  More...
+
+textelastic_net_train
 (text family_or_optimizer)
+Help function, to print out 
the supported optimizer for a family or print out the parameter list for an 
optimizer.  More...
+
+voidelastic_net_predict
 (text tbl_model, text tbl_new_source, text col_id, text tbl_predict)
+Prediction and put the 
result in a table can be used together with General-CV.  More...
+
+float8elastic_net_predict
 (text regress_family, float8[] coefficients, float8 intercept, float8[] 
ind_var)
+Prediction use learned 
coefficients for a given example.  More...
+
+float8elastic_net_gaussian_predict
 (float8[] coefficients, float8 intercept, float8[] ind_var)
+Prediction for linear 
models use learned coefficients for a given example.  More...
+
+booleanelastic_net_binomial_predict
 (float8[] coefficients, float8 intercept, float8[] ind_var)
+Prediction for logistic 
models use learned coefficients for a given example.  More...
+
+float8elastic_net_binomial_prob
 (float8[] coefficients, float8 intercept, float8[] ind_var)
+Compute the probability of 
belonging to the True class for a given observation.  More...
+
+float8__elastic_net_binomial_loglikelihood
 (float8[] coefficients, float8 

[38/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation

2018-05-02 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/dir_d0ff1bc8be395d65672549993d82a3c0.html
--
diff --git a/docs/v1.14/dir_d0ff1bc8be395d65672549993d82a3c0.html 
b/docs/v1.14/dir_d0ff1bc8be395d65672549993d82a3c0.html
new file mode 100644
index 000..161bc3c
--- /dev/null
+++ b/docs/v1.14/dir_d0ff1bc8be395d65672549993d82a3c0.html
@@ -0,0 +1,136 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: pg_gp Directory Reference
+
+
+
+
+
+
+
+
+  $(document).ready(initResizable);
+
+
+
+
+
+  $(document).ready(function() { init_search(); });
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+http://cdn.mathjax.org/mathjax/latest/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.14
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+$(document).ready(function(){initNavTree('dir_d0ff1bc8be395d65672549993d82a3c0.html','');});
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+pg_gp Directory Reference  
+
+
+
+
+Files
+file porter_stemmer.sql_in
+implementation of porter 
stemmer operations in SQL 
+
+
+
+
+
+
+  
+incubator-madlibmethodsstemmersrcpg_gp
+Generated on Wed May 2 2018 13:00:12 for MADlib by
+http://www.doxygen.org/index.html;>
+ 1.8.13 
+  
+
+
+

http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/dir_d79f036e19ca50f1361675a4687317bc.html
--
diff --git a/docs/v1.14/dir_d79f036e19ca50f1361675a4687317bc.html 
b/docs/v1.14/dir_d79f036e19ca50f1361675a4687317bc.html
new file mode 100644
index 000..dce28a4
--- /dev/null
+++ b/docs/v1.14/dir_d79f036e19ca50f1361675a4687317bc.html
@@ -0,0 +1,139 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: linear_systems Directory Reference
+
+
+
+
+
+
+
+
+  $(document).ready(initResizable);
+
+
+
+
+
+  $(document).ready(function() { init_search(); });
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+http://cdn.mathjax.org/mathjax/latest/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.14
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+$(document).ready(function(){initNavTree('dir_d79f036e19ca50f1361675a4687317bc.html','');});
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+linear_systems Directory Reference  
+
+
+
+
+Files
+file dense_linear_systems.sql_in
+SQL functions for linear 
systems. 
+
+file sparse_linear_systems.sql_in
+SQL functions for linear 
systems. 
+
+
+
+
+
+
+  
+incubator-madlibsrcportspostgresmoduleslinear_systems
+Generated on Wed May 2 2018 13:00:12 for MADlib by
+http://www.doxygen.org/index.html;>
+ 1.8.13 
+  
+
+
+

http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/dir_e2aaed6e1ab0079c9b997d45b783e833.html
--
diff --git a/docs/v1.14/dir_e2aaed6e1ab0079c9b997d45b783e833.html 
b/docs/v1.14/dir_e2aaed6e1ab0079c9b997d45b783e833.html
new file mode 100644
index 000..bcdfcf3
--- /dev/null
+++ b/docs/v1.14/dir_e2aaed6e1ab0079c9b997d45b783e833.html
@@ -0,0 +1,136 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: pg_gp Directory Reference
+
+
+
+
+
+
+
+
+  $(document).ready(initResizable);
+
+
+
+
+
+  $(document).ready(function() { 

[25/51] [partial] madlib-site git commit: Doc: Add v1.14 documentation

2018-05-02 Thread riyer
http://git-wip-us.apache.org/repos/asf/madlib-site/blob/e283664c/docs/v1.14/group__grp__graph__vertex__degrees.html
--
diff --git a/docs/v1.14/group__grp__graph__vertex__degrees.html 
b/docs/v1.14/group__grp__graph__vertex__degrees.html
new file mode 100644
index 000..4f4a14a
--- /dev/null
+++ b/docs/v1.14/group__grp__graph__vertex__degrees.html
@@ -0,0 +1,266 @@
+
+http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;>
+http://www.w3.org/1999/xhtml;>
+
+
+
+
+
+MADlib: In-Out Degree
+
+
+
+
+
+
+
+
+  $(document).ready(initResizable);
+
+
+
+
+
+  $(document).ready(function() { init_search(); });
+
+
+  MathJax.Hub.Config({
+extensions: ["tex2jax.js", "TeX/AMSmath.js", "TeX/AMSsymbols.js"],
+jax: ["input/TeX","output/HTML-CSS"],
+});
+http://cdn.mathjax.org/mathjax/latest/MathJax.js";>
+
+
+
+
+
+
+  (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+  (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new 
Date();a=s.createElement(o),
+  
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+  })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+  ga('create', 'UA-45382226-1', 'madlib.apache.org');
+  ga('send', 'pageview');
+
+
+
+
+
+
+ 
+ 
+  http://madlib.apache.org;>
+  
+   
+   1.14
+   
+   User Documentation for Apache MADlib
+  
+   
+
+  
+  
+  
+
+  
+
+
+ 
+ 
+
+
+
+
+
+var searchBox = new SearchBox("searchBox", "search",false,'Search');
+
+
+
+  
+
+  
+
+  
+  
+  
+
+
+$(document).ready(function(){initNavTree('group__grp__graph__vertex__degrees.html','');});
+
+
+
+
+
+
+
+
+
+
+
+
+
+  
+In-Out DegreeGraph  Measures  
+
+
+Contents 
+
+In-out degrees 
+
+Examples 
+
+This function computes the degree of each node. The node degree is 
the number of edges adjacent to that node. The node in-degree is the number of 
edges pointing in to the node and node out-degree is the number of edges 
pointing out of the node.
+In-out 
degrees
+graph_vertex_degrees(
+vertex_table,
+vertex_id,
+edge_table,
+edge_args,
+out_table,
+grouping_cols
+)
+
+Arguments 
+vertex_table 
+TEXT. Name of the table containing the vertex data for 
the graph. Must contain the column specified in the 'vertex_id' parameter 
below.
+
+
+vertex_id 
+TEXT, default = 'id'. Name of the column in 
'vertex_table' containing vertex ids. The vertex ids are of type INTEGER with 
no duplicates. They do not need to be contiguous.
+
+
+edge_table 
+TEXT. Name of the table containing the edge data. The 
edge table must contain columns for source vertex, destination vertex and edge 
weight. Column naming convention is described below in the 'edge_args' 
parameter.
+
+
+edge_args 
+TEXT. A comma-delimited string containing multiple 
named arguments of the form "name=value". The following parameters are 
supported for this string argument:
+src (INTEGER): Name of the column containing the source vertex ids in the 
edge table. Default column name is 'src'.
+dest (INTEGER): Name of the column containing the destination vertex ids 
in the edge table. Default column name is 'dest'.
+weight (FLOAT8): Name of the column containing the edge weights in the 
edge table. Default column name is 'weight'.
+
+
+
+out_table 
+TEXT. Name of the table to store the result. It 
contains a row for every vertex of every group and has the following columns 
(in addition to the grouping columns):
+vertex: The id for the source vertex. Will use the input vertex column 
'id' for column naming.
+indegree: Number of incoming edges to the vertex.
+outdegree: Number of outgoing edges from the vertex.
+
+
+
+grouping_cols 
+TEXT, default = NULL. List of columns used to group the input into 
discrete subgraphs. These columns must exist in the edge table. When this value 
is null, no grouping is used and a single result is generated.  
+
+Examples
+
+Create vertex and edge tables to represent the graph: 
+DROP TABLE IF EXISTS vertex, edge;
+CREATE TABLE vertex(
+id INTEGER,
+name TEXT
+);
+CREATE TABLE edge(
+src_id INTEGER,
+dest_id INTEGER,
+edge_weight FLOAT8
+);
+INSERT INTO vertex VALUES
+(0, 'A'),
+(1, 'B'),
+(2, 'C'),
+(3, 'D'),
+(4, 'E'),
+(5, 'F'),
+(6, 'G'),
+(7, 'H');
+INSERT INTO edge VALUES
+(0, 1, 1.0),
+(0, 2, 1.0),
+(0, 4, 10.0),
+(1, 2, 2.0),
+(1, 3, 10.0),
+(2, 3, 1.0),
+(2, 5, 1.0),
+(2, 6, 3.0),
+(3, 0, 1.0),
+(4, 0, -2.0),
+(5, 6, 1.0),
+(6, 7, 1.0);
+
+Calculate the in-out degrees for each node: 
+DROP TABLE IF EXISTS degrees;
+SELECT madlib.graph_vertex_degrees(
+'vertex',  -- Vertex table
+'id',  -- Vertix id column (NULL means use default naming)
+'edge',-- Edge table
+'src=src_id, dest=dest_id, weight=edge_weight',
+'degrees');-- Output table of shortest paths
+SELECT * FROM 

  1   2   >