[GitHub] madlib pull request #291: Feature: Vector to Columns
Github user njayaram2 commented on a diff in the pull request: https://github.com/apache/madlib/pull/291#discussion_r203890181 --- Diff: src/ports/postgres/modules/utilities/transform_vec_cols.py_in --- @@ -0,0 +1,492 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import plpy +from control import MinWarning +from internal.db_utils import is_col_1d_array +from internal.db_utils import quote_literal +from utilities import _assert +from utilities import add_postfix +from utilities import ANY_ARRAY +from utilities import is_valid_psql_type +from utilities import py_list_to_sql_string +from utilities import split_quoted_delimited_str +from validate_args import is_var_valid +from validate_args import get_cols +from validate_args import get_expr_type +from validate_args import input_tbl_valid +from validate_args import output_tbl_valid +from validate_args import table_exists + +class vec_cols_helper: +def __init__(self): +self.all_cols = None + +def get_cols_as_list(self, cols_to_process, source_table=None, exclude_cols=None): +""" +Get a list of columns based on the value of cols_to_process +Args: +@param cols_to_process: str, Either a * or a comma-separated list of col names +@param source_table: str, optional. Source table name +@param exclude_cols: str, optional. Comma-separated list of the col(s) to exclude + from the source table, only used if cols_to_process is * +Returns: +A list of column names (or an empty list) +""" +# If cols_to_process is empty/None, return empty list +if not cols_to_process: +return [] +if cols_to_process.strip() != "*": +# If cols_to_process is a comma separated list of names, return list +# of column names in cols_to_process. +return [col for col in split_quoted_delimited_str(cols_to_process) +if col not in split_quoted_delimited_str(exclude_cols)] +if source_table: +if not self.all_cols: +self.all_cols = get_cols(source_table) +return [col for col in self.all_cols +if col not in split_quoted_delimited_str(exclude_cols)] +return [] + +class vec2cols: +def __init__(self): +self.get_cols_helper = vec_cols_helper() +self.module_name = self.__class__.__name__ + +def validate_args(self, source_table, output_table, vector_col, feature_names, + cols_to_output): +""" +Validate args for vec2cols +""" +input_tbl_valid(source_table, self.module_name) +output_tbl_valid(output_table, self.module_name) +# cols_to_validate = self.get_cols_helper.get_cols_as_list(cols_to_output) + [vector_col] --- End diff -- Guess we can remove this commented line. ---
[GitHub] madlib pull request #291: Feature: Vector to Columns
Github user ArvindSridhar commented on a diff in the pull request: https://github.com/apache/madlib/pull/291#discussion_r202140243 --- Diff: src/ports/postgres/modules/utilities/test/unit_tests/test_utilities.py_in --- @@ -243,6 +243,9 @@ class UtilitiesTestCase(unittest.TestCase): self.assertFalse(s.is_valid_psql_type('boolean[]', s.INCLUDE_ARRAY | s.ONLY_ARRAY)) self.assertFalse(s.is_valid_psql_type('boolean', s.ONLY_ARRAY)) self.assertFalse(s.is_valid_psql_type('boolean[]', s.ONLY_ARRAY)) +self.assertTrue(s.is_valid_psql_type('boolean[]', s.ANY_ARRAY)) --- End diff -- Done, should be on PRs 292 and 293 ---
[GitHub] madlib pull request #291: Feature: Vector to Columns
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/291#discussion_r201876969 --- Diff: src/ports/postgres/modules/internal/test/unit_tests/test_db_utils.py_in --- @@ -0,0 +1,60 @@ +# coding=utf-8 +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import sys +from os import path + +# Add modules module to the pythonpath. +sys.path.append(path.dirname(path.dirname(path.dirname(path.dirname(path.abspath(__file__)) + +import unittest +from mock import * +import sys +import plpy_mock as plpy + +m4_changequote(`') +class Vec2ColsTestCase(unittest.TestCase): --- End diff -- There is another `Vec2ColsTestCase` in `test_vec2cols.py_in`. Is this one redundant? ---
[GitHub] madlib pull request #291: Feature: Vector to Columns
Github user iyerr3 commented on a diff in the pull request: https://github.com/apache/madlib/pull/291#discussion_r201877031 --- Diff: src/ports/postgres/modules/utilities/test/unit_tests/test_utilities.py_in --- @@ -243,6 +243,9 @@ class UtilitiesTestCase(unittest.TestCase): self.assertFalse(s.is_valid_psql_type('boolean[]', s.INCLUDE_ARRAY | s.ONLY_ARRAY)) self.assertFalse(s.is_valid_psql_type('boolean', s.ONLY_ARRAY)) self.assertFalse(s.is_valid_psql_type('boolean[]', s.ONLY_ARRAY)) +self.assertTrue(s.is_valid_psql_type('boolean[]', s.ANY_ARRAY)) --- End diff -- Can we move this and corresponding code changes to another commit and PR? ---
[GitHub] madlib pull request #291: Feature: Vector to Columns
GitHub user ArvindSridhar opened a pull request: https://github.com/apache/madlib/pull/291 Feature: Vector to Columns JIRA: MADLIB-1240 The vec2cols function enables users to split up a single column into multiple columns, given that the input column contains array entries. For example, if the input column contained ARRAY[1, 2, 3] in one of its rows, the output table will contain 3 different columns, one for each element of the array. You can merge this pull request into a Git repository by running: $ git pull https://github.com/madlib/madlib feature/vector-to-columns Alternatively you can review and apply these changes as the patch at: https://github.com/apache/madlib/pull/291.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #291 commit 3237acb45c553fc4fc2c20b6e7c9a0b6bec2ffe8 Author: Arvind Sridhar Date: 2018-07-06T00:18:55Z Utilities: Create new function to convert vector to columns JIRA: MADLIB-1240 The vec2cols function enables users to split up a single column into multiple columns, given that the input column contains array entries. For example, if the input column contained ARRAY[1, 2, 3] in one of its rows, the output table will contain 3 different columns, one for each element of the array. Co-authored-by: Nikhil Kak Co-authored-by: Nandish Jayaram commit 0100da24333fda01fe2b0428d80da2ba5ab9 Author: Arvind Sridhar Date: 2018-07-09T23:12:28Z Internal: Add function to check column type for 1D array Co-authored-by: Nikhil Kak commit 1e8bc328824ea57a0d253834f36e7ca4b0eff26a Author: Arvind Sridhar Date: 2018-07-09T23:14:48Z Utilities: Add check for whether type is of any array variant Co-authored-by: Nikhil Kak ---