[GitHub] madlib pull request #291: Feature: Vector to Columns

2018-07-19 Thread njayaram2
Github user njayaram2 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/291#discussion_r203890181
  
--- Diff: src/ports/postgres/modules/utilities/transform_vec_cols.py_in ---
@@ -0,0 +1,492 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import plpy
+from control import MinWarning
+from internal.db_utils import is_col_1d_array
+from internal.db_utils import quote_literal
+from utilities import _assert
+from utilities import add_postfix
+from utilities import ANY_ARRAY
+from utilities import is_valid_psql_type
+from utilities import py_list_to_sql_string
+from utilities import split_quoted_delimited_str
+from validate_args import is_var_valid
+from validate_args import get_cols
+from validate_args import get_expr_type
+from validate_args import input_tbl_valid
+from validate_args import output_tbl_valid
+from validate_args import table_exists
+
+class vec_cols_helper:
+def __init__(self):
+self.all_cols = None
+
+def get_cols_as_list(self, cols_to_process, source_table=None, 
exclude_cols=None):
+"""
+Get a list of columns based on the value of cols_to_process
+Args:
+@param cols_to_process: str, Either a * or a comma-separated 
list of col names
+@param source_table: str, optional. Source table name
+@param exclude_cols: str, optional. Comma-separated list of 
the col(s) to exclude
+ from the source table, only used if 
cols_to_process is *
+Returns:
+A list of column names (or an empty list)
+"""
+# If cols_to_process is empty/None, return empty list
+if not cols_to_process:
+return []
+if cols_to_process.strip() != "*":
+# If cols_to_process is a comma separated list of names, 
return list
+# of column names in cols_to_process.
+return [col for col in 
split_quoted_delimited_str(cols_to_process)
+if col not in split_quoted_delimited_str(exclude_cols)]
+if source_table:
+if not self.all_cols:
+self.all_cols = get_cols(source_table)
+return [col for col in self.all_cols
+if col not in split_quoted_delimited_str(exclude_cols)]
+return []
+
+class vec2cols:
+def __init__(self):
+self.get_cols_helper = vec_cols_helper()
+self.module_name = self.__class__.__name__
+
+def validate_args(self, source_table, output_table, vector_col, 
feature_names,
+  cols_to_output):
+"""
+Validate args for vec2cols
+"""
+input_tbl_valid(source_table, self.module_name)
+output_tbl_valid(output_table, self.module_name)
+# cols_to_validate = 
self.get_cols_helper.get_cols_as_list(cols_to_output) + [vector_col]
--- End diff --

Guess we can remove this commented line.


---


[GitHub] madlib pull request #291: Feature: Vector to Columns

2018-07-12 Thread ArvindSridhar
Github user ArvindSridhar commented on a diff in the pull request:

https://github.com/apache/madlib/pull/291#discussion_r202140243
  
--- Diff: 
src/ports/postgres/modules/utilities/test/unit_tests/test_utilities.py_in ---
@@ -243,6 +243,9 @@ class UtilitiesTestCase(unittest.TestCase):
 self.assertFalse(s.is_valid_psql_type('boolean[]', s.INCLUDE_ARRAY 
| s.ONLY_ARRAY))
 self.assertFalse(s.is_valid_psql_type('boolean', s.ONLY_ARRAY))
 self.assertFalse(s.is_valid_psql_type('boolean[]', s.ONLY_ARRAY))
+self.assertTrue(s.is_valid_psql_type('boolean[]', s.ANY_ARRAY))
--- End diff --

Done, should be on PRs 292 and 293


---


[GitHub] madlib pull request #291: Feature: Vector to Columns

2018-07-11 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/291#discussion_r201876969
  
--- Diff: 
src/ports/postgres/modules/internal/test/unit_tests/test_db_utils.py_in ---
@@ -0,0 +1,60 @@
+# coding=utf-8
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import sys
+from os import path
+
+# Add modules module to the pythonpath.

+sys.path.append(path.dirname(path.dirname(path.dirname(path.dirname(path.abspath(__file__))
+
+import unittest
+from mock import *
+import sys
+import plpy_mock as plpy
+
+m4_changequote(`')
+class Vec2ColsTestCase(unittest.TestCase):
--- End diff --

There is another `Vec2ColsTestCase` in `test_vec2cols.py_in`. Is this one 
redundant? 


---


[GitHub] madlib pull request #291: Feature: Vector to Columns

2018-07-11 Thread iyerr3
Github user iyerr3 commented on a diff in the pull request:

https://github.com/apache/madlib/pull/291#discussion_r201877031
  
--- Diff: 
src/ports/postgres/modules/utilities/test/unit_tests/test_utilities.py_in ---
@@ -243,6 +243,9 @@ class UtilitiesTestCase(unittest.TestCase):
 self.assertFalse(s.is_valid_psql_type('boolean[]', s.INCLUDE_ARRAY 
| s.ONLY_ARRAY))
 self.assertFalse(s.is_valid_psql_type('boolean', s.ONLY_ARRAY))
 self.assertFalse(s.is_valid_psql_type('boolean[]', s.ONLY_ARRAY))
+self.assertTrue(s.is_valid_psql_type('boolean[]', s.ANY_ARRAY))
--- End diff --

Can we move this and corresponding code changes to another commit and PR? 


---


[GitHub] madlib pull request #291: Feature: Vector to Columns

2018-07-11 Thread ArvindSridhar
GitHub user ArvindSridhar opened a pull request:

https://github.com/apache/madlib/pull/291

Feature: Vector to Columns

JIRA: MADLIB-1240
The vec2cols function enables users to split up a single column into 
multiple columns, given that the input column contains array entries. For 
example, if the input column contained ARRAY[1, 2, 3] in one of its rows, the 
output table will contain 3 different columns, one for each element of the 
array.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/madlib/madlib feature/vector-to-columns

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/madlib/pull/291.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #291


commit 3237acb45c553fc4fc2c20b6e7c9a0b6bec2ffe8
Author: Arvind Sridhar 
Date:   2018-07-06T00:18:55Z

Utilities: Create new function to convert vector to columns

JIRA: MADLIB-1240

The vec2cols function enables users to split up a single column
into multiple columns, given that the input column contains array
entries. For example, if the input column contained ARRAY[1, 2, 3]
in one of its rows, the output table will contain 3 different
columns, one for each element of the array.

Co-authored-by: Nikhil Kak 
Co-authored-by: Nandish Jayaram 

commit 0100da24333fda01fe2b0428d80da2ba5ab9
Author: Arvind Sridhar 
Date:   2018-07-09T23:12:28Z

Internal: Add function to check column type for 1D array

Co-authored-by: Nikhil Kak 

commit 1e8bc328824ea57a0d253834f36e7ca4b0eff26a
Author: Arvind Sridhar 
Date:   2018-07-09T23:14:48Z

Utilities: Add check for whether type is of any array variant

Co-authored-by: Nikhil Kak 




---