commit python-sklearn-pandas for openSUSE:Factory

root Mon, 10 Dec 2018 03:30:31 -0800

Hello community,

here is the log from the commit of package python-sklearn-pandas for 
openSUSE:Factory checked in at 2018-12-10 12:30:21
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-sklearn-pandas (Old)
 and      /work/SRC/openSUSE:Factory/.python-sklearn-pandas.new.19453 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Package is "python-sklearn-pandas"

Mon Dec 10 12:30:21 2018 rev:3 rq:656752 version:1.8.0

Changes:
--------
--- 
/work/SRC/openSUSE:Factory/python-sklearn-pandas/python-sklearn-pandas.changes  
    2018-09-04 22:56:29.393066440 +0200
+++ 
/work/SRC/openSUSE:Factory/.python-sklearn-pandas.new.19453/python-sklearn-pandas.changes
   2018-12-10 12:30:23.618403685 +0100
@@ -1,0 +2,11 @@
+Sat Dec  8 19:39:33 UTC 2018 - Arun Persaud <[email protected]>
+
+- update to version 1.8.0:
+  * Add FunctionTransformer class (#117).
+  * Fix column names derivation for dataframes with multi-index or
+    non-string columns (#166).
+  * Change behaviour of DataFrameMapper's fit_transform method to
+    invoke each underlying transformers' native fit_transform if
+    implemented. (#150)
+
+-------------------------------------------------------------------

Old:
----
  sklearn-pandas-1.7.0.tar.gz

New:
----
  sklearn-pandas-1.8.0.tar.gz

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ python-sklearn-pandas.spec ++++++
--- /var/tmp/diff_new_pack.CfwBdG/_old  2018-12-10 12:30:24.358402945 +0100
+++ /var/tmp/diff_new_pack.CfwBdG/_new  2018-12-10 12:30:24.362402941 +0100
@@ -12,13 +12,13 @@
 # license that conforms to the Open Source Definition (Version 1.9)
 # published by the Open Source Initiative.
 
-# Please submit bugfixes or comments via http://bugs.opensuse.org/
+# Please submit bugfixes or comments via https://bugs.opensuse.org/
 #
 
 
 %{?!python_module:%define python_module() python-%{**} python3-%{**}}
 Name:           python-sklearn-pandas
-Version:        1.7.0
+Version:        1.8.0
 Release:        0
 Summary:        Pandas integration with sklearn
 License:        Zlib AND BSD-2-Clause

++++++ sklearn-pandas-1.7.0.tar.gz -> sklearn-pandas-1.8.0.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/sklearn-pandas-1.7.0/PKG-INFO 
new/sklearn-pandas-1.8.0/PKG-INFO
--- old/sklearn-pandas-1.7.0/PKG-INFO   2018-08-15 14:16:05.000000000 +0200
+++ new/sklearn-pandas-1.8.0/PKG-INFO   2018-12-01 20:14:57.000000000 +0100
@@ -1,11 +1,12 @@
 Metadata-Version: 1.0
 Name: sklearn-pandas
-Version: 1.7.0
+Version: 1.8.0
 Summary: Pandas integration with sklearn
 Home-page: https://github.com/paulgb/sklearn-pandas
 Author: Israel Saeta Pérez
 Author-email: [email protected]
 License: UNKNOWN
+Description-Content-Type: UNKNOWN
 Description: UNKNOWN
 Keywords: scikit,sklearn,pandas
 Platform: UNKNOWN
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/sklearn-pandas-1.7.0/README.rst 
new/sklearn-pandas-1.8.0/README.rst
--- old/sklearn-pandas-1.7.0/README.rst 2018-08-15 14:15:41.000000000 +0200
+++ new/sklearn-pandas-1.8.0/README.rst 2018-12-01 20:13:37.000000000 +0100
@@ -11,7 +11,7 @@
 
 1. A way to map ``DataFrame`` columns to transformations, which are later 
recombined into features.
 2. A compatibility shim for old ``scikit-learn`` versions to cross-validate a 
pipeline that takes a pandas ``DataFrame`` as input. This is only needed for 
``scikit-learn<0.16.0`` (see `#11 
<https://github.com/paulgb/sklearn-pandas/issues/11>`__ for details). It is 
deprecated and will likely be dropped in ``skearn-pandas==2.0``.
-3. A ``CategoricalImputer`` that replaces null-like values with the mode and 
works with string columns.
+3. A couple of special transformers that work well with pandas inputs: 
``CategoricalImputer`` and ``FunctionTransformer`.`
 
 Installation
 ------------
@@ -65,7 +65,7 @@
 Map the Columns to Transformations
 **********************************
 
-The mapper takes a list of tuples. The first is a column name from the pandas 
DataFrame, or a list containing one or multiple columns (we will see an example 
with multiple columns later). The second is an object which will perform the 
transformation which will be applied to that column. The third is optional and 
is a dictionary containing the transformation options, if applicable (see 
"custom column names for transformed features" below).
+The mapper takes a list of tuples. The first element of each tuple is a column 
name from the pandas DataFrame, or a list containing one or multiple columns 
(we will see an example with multiple columns later). The second element is an 
object which will perform the transformation which will be applied to that 
column. The third one is optional and is a dictionary containing the 
transformation options, if applicable (see "custom column names for transformed 
features" below).
 
 Let's see an example::
 
@@ -401,23 +401,44 @@
 
     >>> from sklearn_pandas import CategoricalImputer
     >>> data = np.array(['a', 'b', 'b', np.nan], dtype=object)
-    >>> imputer = CategoricalImputer(strategy='fixed_value', replacement='a')
+    >>> imputer = CategoricalImputer(strategy='constant', fill_value='a')
     >>> imputer.fit_transform(data)
     array(['a', 'b', 'b', 'a'], dtype=object)
 
 
+``FunctionTransformer``
+***********************
+
+Often one wants to apply simple transformations to data such as ``np.log``. 
``FunctionTransformer`` is a simple wrapper that takes any function and applies 
vectorization so that it can be used as a transformer.
+
+Example:
+
+    >>> from sklearn_pandas import FunctionTransformer
+    >>> array = np.array([10, 100])
+    >>> transformer = FunctionTransformer(np.log10)
+
+    >>> transformer.fit_transform(array)
+    array([1., 2.])
+
 Changelog
 ---------
 
+1.8.0 (2018-12-01)
+******************
+* Add ``FunctionTransformer`` class (#117).
+* Fix column names derivation for dataframes with multi-index or non-string
+  columns (#166).
+* Change behaviour of DataFrameMapper's fit_transform method to invoke each 
underlying transformers'
+  native fit_transform if implemented. (#150)
+
 1.7.0 (2018-08-15)
 ******************
 * Fix issues with unicode names in ``get_names`` (#160).
 * Update to build using ``numpy==1.14`` and ``python==3.6`` (#154).
-* Add ``strategy`` and ``replacement`` parameters to ``CategoricalImputer`` to 
allow imputing
-  with values other than the mode (#144).
+* Add ``strategy`` and ``fill_value`` parameters to ``CategoricalImputer`` to 
allow imputing
+  with values other than the mode (#144), (#161).
 * Preserve input data types when no transform is supplied (#138).
 
-
 1.6.0 (2017-10-28)
 ******************
 * Add column name to exception during fit/transform (#110).
@@ -497,16 +518,19 @@
 * Ariel Rossanigo (@arielrossanigo)
 * Arnau Gil Amat (@arnau126)
 * Assaf Ben-David (@AssafBenDavid)
+* Brendan Herger (@bjherger)
 * Cal Paterson (@calpaterson)
 * @defvorfu
 * Gustavo Sena Mafra (@gsmafra)
 * Israel Saeta Pérez (@dukebody)
 * Jeremy Howard (@jph00)
 * Jimmy Wan (@jimmywan)
+* Kristof Van Engeland (@kristofve91)
 * Olivier Grisel (@ogrisel)
 * Paul Butler (@paulgb)
 * Richard Miller (@rwjmiller)
 * Ritesh Agrawal (@ragrawal)
+* @SandroCasagrande
 * Timothy Sweetser (@hacktuarial)
 * Vitaley Zaretskey (@vzaretsk)
 * Zac Stewart (@zacstewart)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/sklearn-pandas-1.7.0/setup.cfg 
new/sklearn-pandas-1.8.0/setup.cfg
--- old/sklearn-pandas-1.7.0/setup.cfg  2018-08-15 14:16:05.000000000 +0200
+++ new/sklearn-pandas-1.8.0/setup.cfg  2018-12-01 20:14:57.000000000 +0100
@@ -4,5 +4,4 @@
 [egg_info]
 tag_build = 
 tag_date = 0
-tag_svn_revision = 0
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/sklearn-pandas-1.7.0/sklearn_pandas/__init__.py 
new/sklearn-pandas-1.8.0/sklearn_pandas/__init__.py
--- old/sklearn-pandas-1.7.0/sklearn_pandas/__init__.py 2018-08-15 
14:15:41.000000000 +0200
+++ new/sklearn-pandas-1.8.0/sklearn_pandas/__init__.py 2018-12-01 
20:13:33.000000000 +0100
@@ -1,6 +1,6 @@
-__version__ = '1.7.0'
+__version__ = '1.8.0'
 
 from .dataframe_mapper import DataFrameMapper  # NOQA
 from .cross_validation import cross_val_score, GridSearchCV, 
RandomizedSearchCV  # NOQA
-from .categorical_imputer import CategoricalImputer  # NOQA
+from .transformers import CategoricalImputer, FunctionTransformer  # NOQA
 from .features_generator import gen_features  # NOQA
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' 
old/sklearn-pandas-1.7.0/sklearn_pandas/categorical_imputer.py 
new/sklearn-pandas-1.8.0/sklearn_pandas/categorical_imputer.py
--- old/sklearn-pandas-1.7.0/sklearn_pandas/categorical_imputer.py      
2018-08-05 17:20:13.000000000 +0200
+++ new/sklearn-pandas-1.8.0/sklearn_pandas/categorical_imputer.py      
2018-10-21 12:55:27.000000000 +0200
@@ -33,49 +33,46 @@
     copy : boolean, optional (default=True)
         If True, a copy of X will be created.
 
-    strategy : string, optional (default = 'mode')
-        If set to 'mode', replace all instances of `missing_values`
-        with the modal value. Otherwise, replace with
-        the value specified via `replacement`.
+    strategy : string, optional (default = 'most_frequent')
+        The imputation strategy.
 
-    replacement : string, optional (default='?')
+        - If "most_frequent", then replace missing using the most frequent
+          value along each column. Can be used with strings or numeric data.
+        - If "constant", then replace missing values with fill_value. Can be
+          used with strings or numeric data.
+
+    fill_value : string, optional (default='?')
         The value that all instances of `missing_values` are replaced
-        with if `strategy` is not set to 'mode'. This is useful if
+        with if `strategy` is set to `constant`. This is useful if
         you don't want to impute with the mode, or if there are multiple
         modes in your data and you want to choose a particular one. If
-        `strategy` is set to `mode`, this parameter is ignored.
+        `strategy` is not set to `constant`, this parameter is ignored.
 
     Attributes
     ----------
     fill_ : str
-        Most frequent value of the training data.
+        The imputation fill value
 
     """
 
     def __init__(
         self,
         missing_values='NaN',
-        strategy='mode',
-        replacement=None,
+        strategy='most_frequent',
+        fill_value='?',
         copy=True
     ):
         self.missing_values = missing_values
         self.copy = copy
-        self.replacement = replacement
+        self.fill_value = fill_value
         self.strategy = strategy
 
-        strategies = ['fixed_value', 'mode']
+        strategies = ['constant', 'most_frequent']
         if self.strategy not in strategies:
             raise ValueError(
                 'Strategy {0} not in {1}'.format(self.strategy, strategies)
             )
 
-        if self.strategy == 'fixed_value' and self.replacement is None:
-            raise ValueError(
-                'Please specify a value for \'replacement\''
-                'when using the fixed_value strategy.'
-            )
-
     def fit(self, X, y=None):
         """
 
@@ -95,10 +92,10 @@
 
         mask = _get_mask(X, self.missing_values)
         X = X[~mask]
-        if self.strategy == 'mode':
+        if self.strategy == 'most_frequent':
             modes = pd.Series(X).mode()
-        elif self.strategy == 'fixed_value':
-            modes = np.array([self.replacement])
+        elif self.strategy == 'constant':
+            modes = np.array([self.fill_value])
         if modes.shape[0] == 0:
             raise ValueError('Data is empty or all values are null')
         elif modes.shape[0] > 1:
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' 
old/sklearn-pandas-1.7.0/sklearn_pandas/dataframe_mapper.py 
new/sklearn-pandas-1.8.0/sklearn_pandas/dataframe_mapper.py
--- old/sklearn-pandas-1.7.0/sklearn_pandas/dataframe_mapper.py 2018-08-05 
19:04:13.000000000 +0200
+++ new/sklearn-pandas-1.8.0/sklearn_pandas/dataframe_mapper.py 2018-08-15 
14:42:44.000000000 +0200
@@ -114,6 +114,16 @@
         if (df_out and (sparse or default)):
             raise ValueError("Can not use df_out with sparse or default")
 
+    def _build(self):
+        """
+        Build attributes built_features and built_default.
+        """
+        if isinstance(self.features, list):
+            self.built_features = [_build_feature(*f) for f in self.features]
+        else:
+            self.built_features = self.features
+        self.built_default = _build_transformer(self.default)
+
     @property
     def _selected_columns(self):
         """
@@ -198,12 +208,7 @@
         y       the target vector relative to X, optional
 
         """
-        if isinstance(self.features, list):
-            self.built_features = [_build_feature(*f) for f in self.features]
-        else:
-            self.built_features = self.features
-
-        self.built_default = _build_transformer(self.default)
+        self._build()
 
         for columns, transformers, options in self.built_features:
             input_df = options.get('input_df', self.input_df)
@@ -233,7 +238,7 @@
         if alias is not None:
             name = alias
         elif isinstance(columns, list):
-            name = '_'.join(columns)
+            name = '_'.join(map(str, columns))
         else:
             name = columns
         num_cols = x.shape[1] if len(x.shape) > 1 else 1
@@ -273,23 +278,32 @@
         else:
             raise TypeError(type(ex))
 
-    def transform(self, X):
+    def _transform(self, X, y=None, do_fit=False):
         """
-        Transform the given data. Assumes that fit has already been called.
-
-        X       the data to transform
+        Transform the given data with possibility to fit in advance.
+        Avoids code duplication for implementation of transform and
+        fit_transform.
         """
+        if do_fit:
+            self._build()
+
         extracted = []
         self.transformed_names_ = []
         for columns, transformers, options in self.built_features:
             input_df = options.get('input_df', self.input_df)
+
             # columns could be a string or list of
             # strings; we don't care because pandas
             # will handle either.
             Xt = self._get_col_subset(X, columns, input_df)
             if transformers is not None:
                 with add_column_names_to_exception(columns):
-                    Xt = transformers.transform(Xt)
+                    if do_fit and hasattr(transformers, 'fit_transform'):
+                        Xt = _call_fit(transformers.fit_transform, Xt, y)
+                    else:
+                        if do_fit:
+                            _call_fit(transformers.fit, Xt, y)
+                        Xt = transformers.transform(Xt)
             extracted.append(_handle_feature(Xt))
 
             alias = options.get('alias')
@@ -302,7 +316,12 @@
             Xt = self._get_col_subset(X, unsel_cols, self.input_df)
             if self.built_default is not None:
                 with add_column_names_to_exception(unsel_cols):
-                    Xt = self.built_default.transform(Xt)
+                    if do_fit and hasattr(self.built_default, 'fit_transform'):
+                        Xt = _call_fit(self.built_default.fit_transform, Xt, y)
+                    else:
+                        if do_fit:
+                            _call_fit(self.built_default.fit, Xt, y)
+                        Xt = self.built_default.transform(Xt)
                 self.transformed_names_ += self.get_names(
                     unsel_cols, self.built_default, Xt)
             else:
@@ -348,3 +367,22 @@
             return df_out
         else:
             return stacked
+
+    def transform(self, X):
+        """
+        Transform the given data. Assumes that fit has already been called.
+
+        X       the data to transform
+        """
+        return self._transform(X)
+
+    def fit_transform(self, X, y=None):
+        """
+        Fit a transformation from the pipeline and directly apply
+        it to the given data.
+
+        X       the data to fit
+
+        y       the target vector relative to X, optional
+        """
+        return self._transform(X, y, True)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/sklearn-pandas-1.7.0/sklearn_pandas/transformers.py 
new/sklearn-pandas-1.8.0/sklearn_pandas/transformers.py
--- old/sklearn-pandas-1.7.0/sklearn_pandas/transformers.py     1970-01-01 
01:00:00.000000000 +0100
+++ new/sklearn-pandas-1.8.0/sklearn_pandas/transformers.py     2018-12-01 
20:13:29.000000000 +0100
@@ -0,0 +1,152 @@
+import numpy as np
+import pandas as pd
+
+from sklearn.base import BaseEstimator, TransformerMixin
+from sklearn.utils.validation import check_is_fitted
+
+
+def _get_mask(X, value):
+    """
+    Compute the boolean mask X == missing_values.
+    """
+    if value == "NaN" or \
+       value is None or \
+       (isinstance(value, float) and np.isnan(value)):
+        return pd.isnull(X)
+    else:
+        return X == value
+
+
+class CategoricalImputer(BaseEstimator, TransformerMixin):
+    """
+    Impute missing values from a categorical/string np.ndarray or pd.Series
+    with the most frequent value on the training data.
+
+    Parameters
+    ----------
+    missing_values : string or "NaN", optional (default="NaN")
+        The placeholder for the missing values. All occurrences of
+        `missing_values` will be imputed. None and np.nan are treated
+        as being the same, use the string value "NaN" for them.
+
+    copy : boolean, optional (default=True)
+        If True, a copy of X will be created.
+
+    strategy : string, optional (default = 'most_frequent')
+        The imputation strategy.
+
+        - If "most_frequent", then replace missing using the most frequent
+          value along each column. Can be used with strings or numeric data.
+        - If "constant", then replace missing values with fill_value. Can be
+          used with strings or numeric data.
+
+    fill_value : string, optional (default='?')
+        The value that all instances of `missing_values` are replaced
+        with if `strategy` is set to `constant`. This is useful if
+        you don't want to impute with the mode, or if there are multiple
+        modes in your data and you want to choose a particular one. If
+        `strategy` is not set to `constant`, this parameter is ignored.
+
+    Attributes
+    ----------
+    fill_ : str
+        The imputation fill value
+
+    """
+
+    def __init__(
+        self,
+        missing_values='NaN',
+        strategy='most_frequent',
+        fill_value='?',
+        copy=True
+    ):
+        self.missing_values = missing_values
+        self.copy = copy
+        self.fill_value = fill_value
+        self.strategy = strategy
+
+        strategies = ['constant', 'most_frequent']
+        if self.strategy not in strategies:
+            raise ValueError(
+                'Strategy {0} not in {1}'.format(self.strategy, strategies)
+            )
+
+    def fit(self, X, y=None):
+        """
+
+        Get the most frequent value.
+
+        Parameters
+        ----------
+            X : np.ndarray or pd.Series
+                Training data.
+
+            y : Passthrough for ``Pipeline`` compatibility.
+
+        Returns
+        -------
+            self: CategoricalImputer
+        """
+
+        mask = _get_mask(X, self.missing_values)
+        X = X[~mask]
+        if self.strategy == 'most_frequent':
+            modes = pd.Series(X).mode()
+        elif self.strategy == 'constant':
+            modes = np.array([self.fill_value])
+        if modes.shape[0] == 0:
+            raise ValueError('Data is empty or all values are null')
+        elif modes.shape[0] > 1:
+            raise ValueError('No value is repeated more than '
+                             'once in the column')
+        else:
+            self.fill_ = modes[0]
+
+        return self
+
+    def transform(self, X):
+        """
+
+        Replaces missing values in the input data with the most frequent value
+        of the training data.
+
+        Parameters
+        ----------
+            X : np.ndarray or pd.Series
+                Data with values to be imputed.
+
+        Returns
+        -------
+            np.ndarray
+                Data with imputed values.
+        """
+
+        check_is_fitted(self, 'fill_')
+
+        if self.copy:
+            X = X.copy()
+
+        mask = _get_mask(X, self.missing_values)
+        X[mask] = self.fill_
+
+        return np.asarray(X)
+
+
+class FunctionTransformer(BaseEstimator, TransformerMixin):
+    """
+    Use this class to convert a random function into a
+    transformer.
+    """
+
+    def __init__(self, func):
+        self.__func = func
+
+    def fit(self, x, y=None):
+        return self
+
+    def transform(self, x):
+        return np.vectorize(self.__func)(x)
+
+    def __call__(self, *args, **kwargs):
+        return self.__func(*args, **kwargs)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' 
old/sklearn-pandas-1.7.0/sklearn_pandas.egg-info/PKG-INFO 
new/sklearn-pandas-1.8.0/sklearn_pandas.egg-info/PKG-INFO
--- old/sklearn-pandas-1.7.0/sklearn_pandas.egg-info/PKG-INFO   2018-08-15 
14:15:57.000000000 +0200
+++ new/sklearn-pandas-1.8.0/sklearn_pandas.egg-info/PKG-INFO   2018-12-01 
20:14:57.000000000 +0100
@@ -1,11 +1,12 @@
 Metadata-Version: 1.0
 Name: sklearn-pandas
-Version: 1.7.0
+Version: 1.8.0
 Summary: Pandas integration with sklearn
 Home-page: https://github.com/paulgb/sklearn-pandas
 Author: Israel Saeta Pérez
 Author-email: [email protected]
 License: UNKNOWN
+Description-Content-Type: UNKNOWN
 Description: UNKNOWN
 Keywords: scikit,sklearn,pandas
 Platform: UNKNOWN
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' 
old/sklearn-pandas-1.7.0/sklearn_pandas.egg-info/SOURCES.txt 
new/sklearn-pandas-1.8.0/sklearn_pandas.egg-info/SOURCES.txt
--- old/sklearn-pandas-1.7.0/sklearn_pandas.egg-info/SOURCES.txt        
2018-08-15 14:16:05.000000000 +0200
+++ new/sklearn-pandas-1.8.0/sklearn_pandas.egg-info/SOURCES.txt        
2018-12-01 20:14:57.000000000 +0100
@@ -9,6 +9,7 @@
 sklearn_pandas/dataframe_mapper.py
 sklearn_pandas/features_generator.py
 sklearn_pandas/pipeline.py
+sklearn_pandas/transformers.py
 sklearn_pandas.egg-info/PKG-INFO
 sklearn_pandas.egg-info/SOURCES.txt
 sklearn_pandas.egg-info/dependency_links.txt

commit python-sklearn-pandas for openSUSE:Factory

Reply via email to