Re: Apache MADlib v1.12 status

2017-08-14 Thread Frank McQuillan
Hi Ed,

We have not been able to reproduce
https://issues.apache.org/jira/browse/MADLIB-1091
so it may move out.

I still have some docs updates to do so that will be a coming PR probably
Tues or Wed.

Frank

On Mon, Aug 14, 2017 at 3:30 PM, Ed Espino  wrote:

> MADlib dev,
>
> We are winding down the number of outstanding issues for the Apache MADlib
> v1.12 release. The one outstanding issue is
> https://issues.apache.org/jira/browse/MADLIB-1091. Once this is resolved,
> I'm hoping to start the release process.
>
> Regards,
> -=e
>
> --
> *Ed Espino*
>


[GitHub] incubator-madlib issue #167: Update RELEASE_NOTES for v1.12 release

2017-08-14 Thread fmcquillan99
Github user fmcquillan99 commented on the issue:

https://github.com/apache/incubator-madlib/pull/167
  
Here are some suggested changes/additions:

1) Change release date to Fri Aug 18 which might be a better estimate.

2) MLP
Change
New Module: Multilayer Perceptron (MADLIB-413)
to
New Module: Multilayer Perceptron (MADLIB-413, MADLIB-1134)

3) APSP
Change
New module: Graph - All Pairs Shortest Path (MADLIB-1099)
to
New module: Graph - All Pairs Shortest Path (MADLIB-1072, MADLIB-1099, 
MADLIB-1106)

4) WCC
Change
New module: Graph - Weakly Connected Components (MADLIB-1071, MADLIB-1083)
to 
New module: Graph - Weakly Connected Components (MADLIB-1071, MADLIB-1083, 
MADLIB-1101)

5) Summary
Change
Summary: Allow user to determine the number of columns per run (MADLIB-1117)
to
Summary: 
  - Allow user to determine the number of columns per run (MADLIB-1117)
  - Improve efficiency of computation time by ~35% (MADLIB-1104)

6) TLP
Updates for Apache Top Level Project readiness (MADLIB-1130, MADLIB-1133)
* what about MADLIB-1132 and MADLIB-1142
* also add the epic MADLIB-1112

7) Train-test split
Add:
 New Module: Sample - Train-test split (MADLIB-1119)

8) under the bugs section:
Change
- Fix the data scaling bug with normalization
to
- Fix the data scaling bug with normalization (MADLIB-1094)

9) under the bugs section:
change:
Update 'optimizer' GUC only if editable
to
Update 'optimizer' GUC only if editable (MADLIB-1109)

10) Change
Promote cardinality estimators to top level module from early stage 
to
Promote cardinality estimators to top level module from early stage 
(MADLIB-1120)

11) Under bugs section:
change
Graph: Quoted output table name does not work for some modules
to
Graph: Quoted output table name does not work for some modules (MADLIB-1137)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: MADLIB-1103 --> v2.0 (thoughts)

2017-08-14 Thread Roman Shaposhnik
On Mon, Aug 14, 2017 at 12:12 PM, Ed Espino  wrote:
> https://issues.apache.org/jira/browse/MADLIB-1103 (Remove pyxb GPL
> workaround) is dependent on the release of PyXB 1.2.6 (which is currently
> not scheduled). I'm inclined to move it to v2.0 and we can revisit at a
> later point. Thoughts?

Makes sense to me!

Thanks,
Roman.


[GitHub] incubator-madlib pull request #166: Sample: test_train_split

2017-08-14 Thread orhankislal
Github user orhankislal commented on a diff in the pull request:

https://github.com/apache/incubator-madlib/pull/166#discussion_r133094253
  
--- Diff: src/ports/postgres/modules/sample/test_train_split.py_in ---
@@ -0,0 +1,311 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import plpy
+from utilities.control import MinWarning
+from utilities.utilities import _assert
+from utilities.utilities import extract_keyvalue_params
+from utilities.utilities import add_postfix
+from utilities.utilities import unique_string
+from utilities.utilities import split_quoted_delimited_str
+from utilities.validate_args import table_exists
+from utilities.validate_args import columns_exist_in_table
+from utilities.validate_args import table_is_empty
+from utilities.validate_args import get_expr_type
+from utilities.validate_args import get_cols
+from graph.graph_utils import _check_groups
+from graph.graph_utils import _grp_from_table
+
+m4_changequote(` ')
+
+
+def _get_sql_string(str):
+if str:
+return "'" + str + "'"
+return "NULL"
+
+def test_train_split(schema_madlib, source_table, output_table, 
train_proportion,
+ test_proportion, grouping_cols, target_cols, 
with_replacement,
+ separate_output_tables, **kwargs):
+"""
+test train split function
+Args:
+@param source_table Input table name.
+@param output_table Output table name.
+@param proportion   The ratio of sample size to the number of
--- End diff --

proportion should be replaced by train_proportion and test proportion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-madlib pull request #166: Sample: test_train_split

2017-08-14 Thread orhankislal
Github user orhankislal commented on a diff in the pull request:

https://github.com/apache/incubator-madlib/pull/166#discussion_r133094303
  
--- Diff: src/ports/postgres/modules/sample/test_train_split.py_in ---
@@ -0,0 +1,311 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import plpy
+from utilities.control import MinWarning
+from utilities.utilities import _assert
+from utilities.utilities import extract_keyvalue_params
+from utilities.utilities import add_postfix
+from utilities.utilities import unique_string
+from utilities.utilities import split_quoted_delimited_str
+from utilities.validate_args import table_exists
+from utilities.validate_args import columns_exist_in_table
+from utilities.validate_args import table_is_empty
+from utilities.validate_args import get_expr_type
+from utilities.validate_args import get_cols
+from graph.graph_utils import _check_groups
+from graph.graph_utils import _grp_from_table
+
+m4_changequote(` ')
+
+
+def _get_sql_string(str):
+if str:
+return "'" + str + "'"
+return "NULL"
+
+def test_train_split(schema_madlib, source_table, output_table, 
train_proportion,
+ test_proportion, grouping_cols, target_cols, 
with_replacement,
+ separate_output_tables, **kwargs):
+"""
+test train split function
+Args:
+@param source_table Input table name.
+@param output_table Output table name.
+@param proportion   The ratio of sample size to the number of
+records.
+@param grouping_cols(Default: NULL) The columns to distinguish
+each strata.
+@param target_cols  (Default: NULL) The columns to include in
+the output.
+@param with_replacement (Default: FALSE) The sampling method.
+
--- End diff --

Missing parameter: separate_output_tables


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-madlib pull request #166: Sample: test_train_split

2017-08-14 Thread orhankislal
Github user orhankislal commented on a diff in the pull request:

https://github.com/apache/incubator-madlib/pull/166#discussion_r133093991
  
--- Diff: src/ports/postgres/modules/sample/test_train_split.sql_in ---
@@ -0,0 +1,321 @@
+/* --- 
*//**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ *
+ * @file test_train_split.sql_in
+ *
+ * @brief SQL functions for test train split.
+ * @date 07/19/2017
+ *
+ * @sa Given a table, test train split returns a proportion of records
+ * for each group (strata).
+ *
+ *//* 
--- */
+
+m4_include(`SQLCommon.m4')
+
+
+/**
+@addtogroup grp_test_train_split
+
+Contents
+
+test train split
+Examples
+
+
+
+@brief A method for independently sampling subpopulations (strata).
+
+test train split is a method for independently sampling
--- End diff --

The explanation should be modified for the test-train functionality.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


JIRA for migrating repos following MADlib's TLP graduation

2017-08-14 Thread Nandish Jayaram
Hi All,

I have opened an Apache Infrastructure ticket to migrate MADlib's
git repos, distribution server, and other common tasks associated
with the move from incubator to TLP. The ticket is:
https://issues.apache.org/jira/browse/INFRA-14872

Please do have a look at it and let me know if I have missed something,
or if something is to be changed. I followed the instructions at
http://www.apache.org/dev/infra-contact#requesting-graduation to open
the ticket, and used the template used by Apache Flex's TLP ticket
https://issues.apache.org/jira/browse/INFRA-5688.

I will keep you posted on the status of the ticket. We might still need to
change some settings in MADlib's Jenkins build, once the git repo move
is finished. I thought that was something we could control and might not
need Infra's help for that (please correct me if I am wrong).

NJ


Apache MADlib v1.12 status

2017-08-14 Thread Ed Espino
MADlib dev,

We are winding down the number of outstanding issues for the Apache MADlib
v1.12 release. The one outstanding issue is
https://issues.apache.org/jira/browse/MADLIB-1091. Once this is resolved,
I'm hoping to start the release process.

Regards,
-=e

-- 
*Ed Espino*


MADLIB-1103 --> v2.0 (thoughts)

2017-08-14 Thread Ed Espino
https://issues.apache.org/jira/browse/MADLIB-1103 (Remove pyxb GPL
workaround) is dependent on the release of PyXB 1.2.6 (which is currently
not scheduled). I'm inclined to move it to v2.0 and we can revisit at a
later point. Thoughts?

-=e

-- 
*Ed Espino*


Re: Jira post v1.12 version?

2017-08-14 Thread Ed Espino
Thanks Frank.

I have moved them to v2.0. The main reason why I am interested in these
issues is IMHO they tie directly to easing the dev user community adoption
(lowers bar of entry - newer gcc versions supported).

-=e

On Mon, Aug 14, 2017 at 12:04 PM, Frank McQuillan 
wrote:

> Ed,
>
> I would suggest v2.0 for the next version, so you can add those 2 JIRAs to
> v2.0
>
> Once we get v1.12 out the door I was going to solicit comments from the
> community on v2.0 features so we can get that backlog going.
>
> Frank
>
> On Mon, Aug 14, 2017 at 11:30 AM, Ed Espino  wrote:
>
> > Dev,
> >
> > What are we setting the Jira Fix Version/s for issues to be addressed in
> > the next release (post v1.12)? I noticed a v2.0 version (06/Oct/17)
> > available in Jira.
> >
> > The two issues I'd like to set to the next release are the following:
> >
> > https://issues.apache.org/jira/browse/MADLIB-1025 - MADlib does not
> > compile
> > with gcc 6.2
> > https://issues.apache.org/jira/browse/MADLIB-1145 - Ubuntu 16.04 - Using
> > GCC 5 (default gcc) causes Postgres 9.6 crash
> >
> > Any guidance is greatly appreciated.
> >
> > Regards
> > -=e
> >
> > --
> > *Ed Espino*
> >
>



-- 
*Ed Espino*


[GitHub] incubator-madlib issue #167: Update RELEASE_NOTES for v1.12 release

2017-08-14 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/incubator-madlib/pull/167
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/155/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Jira post v1.12 version?

2017-08-14 Thread Frank McQuillan
Ed,

I would suggest v2.0 for the next version, so you can add those 2 JIRAs to
v2.0

Once we get v1.12 out the door I was going to solicit comments from the
community on v2.0 features so we can get that backlog going.

Frank

On Mon, Aug 14, 2017 at 11:30 AM, Ed Espino  wrote:

> Dev,
>
> What are we setting the Jira Fix Version/s for issues to be addressed in
> the next release (post v1.12)? I noticed a v2.0 version (06/Oct/17)
> available in Jira.
>
> The two issues I'd like to set to the next release are the following:
>
> https://issues.apache.org/jira/browse/MADLIB-1025 - MADlib does not
> compile
> with gcc 6.2
> https://issues.apache.org/jira/browse/MADLIB-1145 - Ubuntu 16.04 - Using
> GCC 5 (default gcc) causes Postgres 9.6 crash
>
> Any guidance is greatly appreciated.
>
> Regards
> -=e
>
> --
> *Ed Espino*
>


Re: [VOTE]: MADlib repo(s) migration

2017-08-14 Thread Frank McQuillan
1

On Fri, Aug 11, 2017 at 10:16 AM, Nandish Jayaram 
wrote:

> Hi All,
>
> A gentle reminder to vote if you'd like. I was thinking of opening the
> Apache Infra
> ticket for the move sometime today if there are no more votes to come.
>
> NJ
>
> On Thu, Aug 10, 2017 at 3:39 AM, ChenLiang Wang 
> wrote:
>
> > 1
> >
> > On 08/10/2017 05:47 AM, Orhan Kislal wrote:
> > > 1
> > >
> > > Orhan Kislal
> > >
> > > On Wed, Aug 9, 2017 at 2:32 PM, Nandish Jayaram 
> > wrote:
> > >
> > >> Hi All,
> > >>
> > >> With MADlib's graduation to TLP, it's time to migrate its github
> > >> repos from `*incubator-madlib*` to `*madlib*`. We will have to open
> > >> an Apache Infrastructure ticket to request this move for the following
> > >> repos (along with other stuff like wiki, jenkins etc):
> > >> https://git1-us-west.apache.org/repos/asf?p=incubator-madlib.git
> > >>  (Read/Write)
> > >> https://github.com/apache/incubator-madlib (Github mirror- read only)
> > >> https://git1-us-west.apache.org/repos/asf?p=incubator-madlib-site.git
> > >> https://github.com/apache/incubator-madlib-site (GitHub mirror)
> > >>
> > >> There are two ways to go about this, and the Infra ticket has to be
> > >> raised accordingly.
> > >> 1) Just maintain the current set-up, but have the repos renamed from
> > >> incubator-madlib to madlib.
> > >> 2) Use Gitbox to enable github repo as a R/W repo and not just
> > read-only.
> > >> Check this email (
> > >> https://mail-archives.apache.org/mod_mbox/incubator-madlib-
> > >> dev/201708.mbox/%3cCA+ULb+vP0ViWH4Nc=4eaXvbT0KOmeFtQzp4eAa3p0fKPP7c
> > >> 8...@mail.gmail.com%3e)
> > >> for further information.
> > >>
> > >> Please vote you preference and we can decide to move accordingly.
> > >>
> > >> NJ
> > >>
> > >
> >
>


[GitHub] incubator-madlib pull request #167: Update RELEASE_NOTES for v1.12 release

2017-08-14 Thread orhankislal
GitHub user orhankislal opened a pull request:

https://github.com/apache/incubator-madlib/pull/167

Update RELEASE_NOTES for v1.12 release



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/orhankislal/incubator-madlib 
release/rel_notes_1.12

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-madlib/pull/167.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #167


commit 3965e6cbbd0bff312c116a80c87dba0214e6d876
Author: Orhan Kislal 
Date:   2017-08-14T18:43:29Z

Update RELEASE_NOTES for v1.12 release




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Jira post v1.12 version?

2017-08-14 Thread Ed Espino
Dev,

What are we setting the Jira Fix Version/s for issues to be addressed in
the next release (post v1.12)? I noticed a v2.0 version (06/Oct/17)
available in Jira.

The two issues I'd like to set to the next release are the following:

https://issues.apache.org/jira/browse/MADLIB-1025 - MADlib does not compile
with gcc 6.2
https://issues.apache.org/jira/browse/MADLIB-1145 - Ubuntu 16.04 - Using
GCC 5 (default gcc) causes Postgres 9.6 crash

Any guidance is greatly appreciated.

Regards
-=e

-- 
*Ed Espino*


[GitHub] incubator-madlib issue #166: Sample: test_train_split

2017-08-14 Thread asfgit
Github user asfgit commented on the issue:

https://github.com/apache/incubator-madlib/pull/166
  

Refer to this link for build results (access rights to CI server needed): 
https://builds.apache.org/job/madlib-pr-build/154/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-madlib pull request #162: MLP: Multilayer Perceptron Phase 2

2017-08-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-madlib/pull/162


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---