commit python-dask for openSUSE:Factory

root Tue, 11 Sep 2018 08:18:24 -0700

Hello community,

here is the log from the commit of package python-dask for openSUSE:Factory 
checked in at 2018-09-11 17:17:52
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-dask (Old)
 and      /work/SRC/openSUSE:Factory/.python-dask.new (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Package is "python-dask"

Tue Sep 11 17:17:52 2018 rev:7 rq:634440 version:0.19.1

Changes:
--------
--- /work/SRC/openSUSE:Factory/python-dask/python-dask.changes  2018-09-04 
22:56:24.821050827 +0200
+++ /work/SRC/openSUSE:Factory/.python-dask.new/python-dask.changes     
2018-09-11 17:17:59.183346691 +0200
@@ -1,0 +2,31 @@
+Sat Sep  8 04:33:17 UTC 2018 - Arun Persaud <[email protected]>
+
+- update to version 0.19.1:
+  * Array
+    + Don't enforce dtype if result has no dtype (:pr:`3928`) Matthew
+      Rocklin
+    + Fix NumPy issubtype deprecation warning (:pr:`3939`) Bruce Merry
+    + Fix arg reduction tokens to be unique with different arguments
+      (:pr:`3955`) Tobias de Jong
+    + Coerce numpy integers to ints in slicing code (:pr:`3944`) Yu
+      Feng
+    + Linalg.norm ndim along axis partial fix (:pr:`3933`) Tobias de
+      Jong
+  * Dataframe
+    + Deterministic DataFrame.set_index (:pr:`3867`) George Sakkis
+    + Fix divisions in read_parquet when dealing with filters #3831
+      #3930 (:pr:`3923`) (:pr:`3931`) @andrethrill
+    + Fixing returning type in categorical.as_known (:pr:`3888`)
+      Sriharsha Hatwar
+    + Fix DataFrame.assign for callables (:pr:`3919`) Tom Augspurger
+    + Include partitions with no width in repartition (:pr:`3941`)
+      Matthew Rocklin
+    + Don't constrict stage/k dtype in dataframe shuffle (:pr:`3942`)
+      Matthew Rocklin
+  * Documentation
+    + DOC: Add hint on how to render task graphs horizontally
+      (:pr:`3922`) Uwe Korn
+    + Add try-now button to main landing page (:pr:`3924`) Matthew
+      Rocklin
+
+-------------------------------------------------------------------

Old:
----
  dask-0.19.0.tar.gz

New:
----
  dask-0.19.1.tar.gz

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ python-dask.spec ++++++
--- /var/tmp/diff_new_pack.klQLhF/_old  2018-09-11 17:17:59.943345525 +0200
+++ /var/tmp/diff_new_pack.klQLhF/_new  2018-09-11 17:17:59.947345519 +0200
@@ -22,7 +22,7 @@
 # python(2/3)-distributed has a dependency loop with python(2/3)-dask
 %bcond_with     test_distributed
 Name:           python-dask
-Version:        0.19.0
+Version:        0.19.1
 Release:        0
 Summary:        Minimal task scheduling abstraction
 License:        BSD-3-Clause

++++++ dask-0.19.0.tar.gz -> dask-0.19.1.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/PKG-INFO new/dask-0.19.1/PKG-INFO
--- old/dask-0.19.0/PKG-INFO    2018-08-30 18:41:43.000000000 +0200
+++ new/dask-0.19.1/PKG-INFO    2018-09-06 14:15:04.000000000 +0200
@@ -1,11 +1,12 @@
-Metadata-Version: 2.1
+Metadata-Version: 1.2
 Name: dask
-Version: 0.19.0
+Version: 0.19.1
 Summary: Parallel PyData with Task Scheduling
 Home-page: http://github.com/dask/dask/
-Maintainer: Matthew Rocklin
-Maintainer-email: [email protected]
+Author: Matthew Rocklin
+Author-email: [email protected]
 License: BSD
+Description-Content-Type: UNKNOWN
 Description: Dask
         ====
         
@@ -44,9 +45,3 @@
 Classifier: Programming Language :: Python :: 3.6
 Classifier: Programming Language :: Python :: 3.7
 Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*
-Provides-Extra: complete
-Provides-Extra: bag
-Provides-Extra: array
-Provides-Extra: delayed
-Provides-Extra: distributed
-Provides-Extra: dataframe
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/_version.py 
new/dask-0.19.1/dask/_version.py
--- old/dask-0.19.0/dask/_version.py    2018-08-30 18:41:43.000000000 +0200
+++ new/dask-0.19.1/dask/_version.py    2018-09-06 14:15:04.000000000 +0200
@@ -11,8 +11,8 @@
 {
  "dirty": false,
  "error": null,
- "full-revisionid": "546760c5ced47ace30bca21fb125b7258c56035c",
- "version": "0.19.0"
+ "full-revisionid": "40b5d7b07c9db16e7cbd70be1bc8738ce94fe32c",
+ "version": "0.19.1"
 }
 '''  # END VERSION_JSON
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/array/core.py 
new/dask-0.19.1/dask/array/core.py
--- old/dask-0.19.0/dask/array/core.py  2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/array/core.py  2018-09-06 13:45:35.000000000 +0200
@@ -3552,7 +3552,7 @@
     function = kwargs.pop('enforce_dtype_function')
 
     result = function(*args, **kwargs)
-    if dtype != result.dtype and dtype != object:
+    if hasattr(result, 'dtype') and dtype != result.dtype and dtype != object:
         if not np.can_cast(result, dtype, casting='same_kind'):
             raise ValueError("Inferred dtype from function %r was %r "
                              "but got %r, which can't be cast using "
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/array/creation.py 
new/dask-0.19.1/dask/array/creation.py
--- old/dask-0.19.0/dask/array/creation.py      2018-08-30 18:28:02.000000000 
+0200
+++ new/dask-0.19.1/dask/array/creation.py      2018-09-06 13:45:35.000000000 
+0200
@@ -943,7 +943,7 @@
 
         result = result.map_blocks(
             wrapped_pad_func,
-            token="pad",
+            name="pad",
             dtype=result.dtype,
             pad_func=mode,
             iaxis_pad_width=pad_width[d],
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/array/linalg.py 
new/dask-0.19.1/dask/array/linalg.py
--- old/dask-0.19.0/dask/array/linalg.py        2018-08-30 18:28:02.000000000 
+0200
+++ new/dask-0.19.1/dask/array/linalg.py        2018-09-06 13:45:35.000000000 
+0200
@@ -111,6 +111,7 @@
             "  2. Have only one column of blocks\n\n"
             "Note: This function (tsqr) supports QR decomposition in the case 
of\n"
             "tall-and-skinny matrices (single column chunk/block; see qr)"
+            "Current shape: {},\nCurrent chunksize: {}".format(data.shape, 
data.chunksize)
         )
 
     token = '-' + tokenize(data, compute_svd)
@@ -1081,9 +1082,6 @@
 
 @wraps(np.linalg.norm)
 def norm(x, ord=None, axis=None, keepdims=False):
-    if x.ndim > 2:
-        raise ValueError("Improper number of dimensions to norm.")
-
     if axis is None:
         axis = tuple(range(x.ndim))
     elif isinstance(axis, Number):
@@ -1091,6 +1089,9 @@
     else:
         axis = tuple(axis)
 
+    if len(axis) > 2:
+        raise ValueError("Improper number of dimensions to norm.")
+
     if ord == "fro":
         ord = None
         if len(axis) == 1:
@@ -1104,6 +1105,8 @@
     elif ord == "nuc":
         if len(axis) == 1:
             raise ValueError("Invalid norm order for vectors.")
+        if x.ndim > 2:
+            raise NotImplementedError("SVD based norm not implemented for ndim 
> 2")
 
         r = svd(x)[1][None].sum(keepdims=keepdims)
     elif ord == np.inf:
@@ -1111,29 +1114,41 @@
         if len(axis) == 1:
             r = r.max(axis=axis, keepdims=keepdims)
         else:
-            r = r.sum(axis=axis[1], keepdims=keepdims).max(keepdims=keepdims)
+            r = r.sum(axis=axis[1], keepdims=True).max(axis=axis[0], 
keepdims=True)
+            if keepdims is False:
+                r = r.squeeze(axis=axis)
     elif ord == -np.inf:
         r = abs(r)
         if len(axis) == 1:
             r = r.min(axis=axis, keepdims=keepdims)
         else:
-            r = r.sum(axis=axis[1], keepdims=keepdims).min(keepdims=keepdims)
+            r = r.sum(axis=axis[1], keepdims=True).min(axis=axis[0], 
keepdims=True)
+            if keepdims is False:
+                r = r.squeeze(axis=axis)
     elif ord == 0:
         if len(axis) == 2:
             raise ValueError("Invalid norm order for matrices.")
 
-        r = (r != 0).astype(r.dtype).sum(axis=0, keepdims=keepdims)
+        r = (r != 0).astype(r.dtype).sum(axis=axis, keepdims=keepdims)
     elif ord == 1:
         r = abs(r)
         if len(axis) == 1:
             r = r.sum(axis=axis, keepdims=keepdims)
         else:
-            r = r.sum(axis=axis[0], keepdims=keepdims).max(keepdims=keepdims)
+            r = r.sum(axis=axis[0], keepdims=True).max(axis=axis[1], 
keepdims=True)
+            if keepdims is False:
+                r = r.squeeze(axis=axis)
     elif len(axis) == 2 and ord == -1:
-        r = abs(r).sum(axis=axis[0], keepdims=keepdims).min(keepdims=keepdims)
+        r = abs(r).sum(axis=axis[0], keepdims=True).min(axis=axis[1], 
keepdims=True)
+        if keepdims is False:
+            r = r.squeeze(axis=axis)
     elif len(axis) == 2 and ord == 2:
+        if x.ndim > 2:
+            raise NotImplementedError("SVD based norm not implemented for ndim 
> 2")
         r = svd(x)[1][None].max(keepdims=keepdims)
     elif len(axis) == 2 and ord == -2:
+        if x.ndim > 2:
+            raise NotImplementedError("SVD based norm not implemented for ndim 
> 2")
         r = svd(x)[1][None].min(keepdims=keepdims)
     else:
         if len(axis) == 2:
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/array/reductions.py 
new/dask-0.19.1/dask/array/reductions.py
--- old/dask-0.19.0/dask/array/reductions.py    2018-08-30 18:28:02.000000000 
+0200
+++ new/dask-0.19.1/dask/array/reductions.py    2018-09-06 13:45:35.000000000 
+0200
@@ -614,7 +614,8 @@
                         "got '{0}'".format(axis))
 
     # Map chunk across all blocks
-    name = 'arg-reduce-chunk-{0}'.format(tokenize(chunk, axis))
+    name = 'arg-reduce-{0}'.format(tokenize(axis, x, chunk,
+                                            combine, split_every))
     old = x.name
     keys = list(product(*map(range, x.numblocks)))
     offsets = list(product(*(accumulate(operator.add, bd[:-1], 0)
@@ -714,7 +715,8 @@
 
     m = x.map_blocks(func, axis=axis, dtype=dtype)
 
-    name = '%s-axis=%d-%s' % (func.__name__, axis, tokenize(x, dtype))
+    name = '{0}-{1}'.format(func.__name__, tokenize(func, axis, binop,
+                                                    ident, x, dtype))
     n = x.numblocks[axis]
     full = slice(None, None, None)
     slc = (full,) * axis + (slice(-1, None),) + (full,) * (x.ndim - axis - 1)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/array/slicing.py 
new/dask-0.19.1/dask/array/slicing.py
--- old/dask-0.19.0/dask/array/slicing.py       2018-08-30 18:28:02.000000000 
+0200
+++ new/dask-0.19.1/dask/array/slicing.py       2018-09-06 13:45:35.000000000 
+0200
@@ -69,7 +69,7 @@
         return np.asanyarray(nonzero)
     elif np.issubdtype(index_array.dtype, np.integer):
         return index_array
-    elif np.issubdtype(index_array.dtype, float):
+    elif np.issubdtype(index_array.dtype, np.floating):
         int_index = index_array.astype(np.intp)
         if np.allclose(index_array, int_index):
             return int_index
@@ -391,7 +391,7 @@
             ind = index - chunk_boundaries[i - 1]
         else:
             ind = index
-        return {i: ind}
+        return {int(i): int(ind)}
 
     assert isinstance(index, slice)
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/array/tests/test_array_core.py 
new/dask-0.19.1/dask/array/tests/test_array_core.py
--- old/dask-0.19.0/dask/array/tests/test_array_core.py 2018-08-30 
18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/array/tests/test_array_core.py 2018-09-06 
13:45:35.000000000 +0200
@@ -3644,3 +3644,8 @@
         da.argmax(Y, axis=0).compute()
 
     assert not record
+
+
+def test_3925():
+    x = da.from_array(np.array(['a', 'b', 'c'], dtype=object), chunks=-1)
+    assert (x[0] == x[0]).compute(scheduler='sync')
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/array/tests/test_linalg.py 
new/dask-0.19.1/dask/array/tests/test_linalg.py
--- old/dask-0.19.0/dask/array/tests/test_linalg.py     2018-08-30 
18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/array/tests/test_linalg.py     2018-09-06 
13:45:35.000000000 +0200
@@ -659,10 +659,6 @@
     [(5,), (2,), 0],
     [(5,), (2,), (0,)],
     [(5, 6), (2, 2), None],
-    [(5, 6), (2, 2), 0],
-    [(5, 6), (2, 2), 1],
-    [(5, 6), (2, 2), (0, 1)],
-    [(5, 6), (2, 2), (1, 0)],
 ])
 @pytest.mark.parametrize("norm", [
     None,
@@ -685,6 +681,40 @@
     assert_eq(a_r, d_r)
 
 
[email protected]
[email protected]("shape, chunks", [
+    [(5,), (2,)],
+    [(5, 3), (2, 2)],
+    [(4, 5, 3), (2, 2, 2)],
+    [(4, 5, 2, 3), (2, 2, 2, 2)],
+    [(2, 5, 2, 4, 3), (2, 2, 2, 2, 2)],
+])
[email protected]("norm", [
+    None,
+    1,
+    -1,
+    np.inf,
+    -np.inf,
+])
[email protected]("keepdims", [
+    False,
+    True,
+])
+def test_norm_any_slice(shape, chunks, norm, keepdims):
+    a = np.random.random(shape)
+    d = da.from_array(a, chunks=chunks)
+
+    for firstaxis in range(len(shape)):
+        for secondaxis in range(len(shape)):
+            if firstaxis != secondaxis:
+                axis = (firstaxis, secondaxis)
+            else:
+                axis = firstaxis
+            a_r = np.linalg.norm(a, ord=norm, axis=axis, keepdims=keepdims)
+            d_r = da.linalg.norm(d, ord=norm, axis=axis, keepdims=keepdims)
+            assert_eq(a_r, d_r)
+
+
 @pytest.mark.parametrize("shape, chunks, axis", [
     [(5,), (2,), None],
     [(5,), (2,), 0],
@@ -730,9 +760,30 @@
 
     # Need one chunk on last dimension for svd.
     if norm == "nuc" or norm == 2 or norm == -2:
-        d = d.rechunk((d.chunks[0], d.shape[1]))
+        d = d.rechunk({-1: -1})
 
     a_r = np.linalg.norm(a, ord=norm, axis=axis, keepdims=keepdims)
     d_r = da.linalg.norm(d, ord=norm, axis=axis, keepdims=keepdims)
 
     assert_eq(a_r, d_r)
+
+
[email protected]("shape, chunks, axis", [
+    [(3, 2, 4), (2, 2, 2), (1, 2)],
+    [(2, 3, 4, 5), (2, 2, 2, 2), (-1, -2)],
+])
[email protected]("norm", [
+    "nuc",
+    2,
+    -2
+])
[email protected]("keepdims", [
+    False,
+    True,
+])
+def test_norm_implemented_errors(shape, chunks, axis, norm, keepdims):
+    a = np.random.random(shape)
+    d = da.from_array(a, chunks=chunks)
+    if len(shape) > 2 and len(axis) == 2:
+        with pytest.raises(NotImplementedError):
+            da.linalg.norm(d, ord=norm, axis=axis, keepdims=keepdims)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/array/tests/test_optimization.py 
new/dask-0.19.1/dask/array/tests/test_optimization.py
--- old/dask-0.19.0/dask/array/tests/test_optimization.py       2018-08-30 
18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/array/tests/test_optimization.py       2018-09-06 
13:45:35.000000000 +0200
@@ -273,3 +273,15 @@
 
     assert dask.get(a, y.__dask_keys__()) == dask.get(b, y.__dask_keys__())
     assert len(a) < len(b)
+
+
+def test_gh3937():
+    # test for github issue #3937
+    x = da.from_array([1, 2, 3.], (2,))
+    x = da.concatenate((x, [x[-1]]))
+    y = x.rechunk((2,))
+    # This will produce Integral type indices that are not ints (np.int64), 
failing
+    # the optimizer
+    y = da.coarsen(np.sum, y, {0: 2})
+    # How to trigger the optimizer explicitly?
+    y.compute()
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/array/tests/test_reductions.py 
new/dask-0.19.1/dask/array/tests/test_reductions.py
--- old/dask-0.19.0/dask/array/tests/test_reductions.py 2018-08-30 
18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/array/tests/test_reductions.py 2018-09-06 
13:45:35.000000000 +0200
@@ -6,7 +6,7 @@
 import dask.array as da
 from dask.array.utils import assert_eq as _assert_eq, same_keys
 from dask.core import get_deps
-from dask.context import set_options
+import dask.config as config
 
 
 def assert_eq(a, b):
@@ -139,7 +139,7 @@
     assert_eq(dfunc(a, 0), func(x, 0))
     assert_eq(dfunc(a, 1), func(x, 1))
     assert_eq(dfunc(a, 2), func(x, 2))
-    with set_options(split_every=2):
+    with config.set(split_every=2):
         assert_eq(dfunc(a), func(x))
         assert_eq(dfunc(a, 0), func(x, 0))
         assert_eq(dfunc(a, 1), func(x, 1))
@@ -368,7 +368,7 @@
 
 def test_tree_reduce_set_options():
     x = da.from_array(np.arange(242).reshape((11, 22)), chunks=(3, 4))
-    with set_options(split_every={0: 2, 1: 3}):
+    with config.set(split_every={0: 2, 1: 3}):
         assert_max_deps(x.sum(), 2 * 3)
         assert_max_deps(x.sum(axis=0), 2)
 
@@ -487,3 +487,14 @@
               da.topk(a, 5, axis=1, split_every=2))
     assert_eq(a.argtopk(5, axis=1, split_every=2),
               da.argtopk(a, 5, axis=1, split_every=2))
+
+
[email protected]('func', [da.cumsum, da.cumprod,
+                                  da.argmin, da.argmax,
+                                  da.min, da.max,
+                                  da.nansum, da.nanmax])
+def test_regres_3940(func):
+    a = da.ones((5,2), chunks=(2,2))
+    assert func(a).name != func(a + 1).name
+    assert func(a, axis=0).name != func(a).name
+    assert func(a, axis=0).name != func(a, axis=1).name
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/categorical.py 
new/dask-0.19.1/dask/dataframe/categorical.py
--- old/dask-0.19.0/dask/dataframe/categorical.py       2018-08-30 
18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/dataframe/categorical.py       2018-09-06 
13:45:35.000000000 +0200
@@ -184,7 +184,7 @@
             Keywords to pass on to the call to `compute`.
         """
         if self.known:
-            return self
+            return self._series
         categories = 
self._property_map('categories').unique().compute(**kwargs)
         return self.set_categories(categories.values)
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/core.py 
new/dask-0.19.1/dask/dataframe/core.py
--- old/dask-0.19.0/dask/dataframe/core.py      2018-08-30 18:28:02.000000000 
+0200
+++ new/dask-0.19.1/dask/dataframe/core.py      2018-09-06 13:45:35.000000000 
+0200
@@ -2527,6 +2527,9 @@
                   pd.compat.isidentifier(c)))
         return list(o)
 
+    def _ipython_key_completions_(self):
+        return self.columns.tolist()
+
     @property
     def ndim(self):
         """ Return dimensionality """
@@ -2678,6 +2681,9 @@
                     callable(v) or pd.api.types.is_scalar(v)):
                 raise TypeError("Column assignment doesn't support type "
                                 "{0}".format(type(v).__name__))
+            if callable(v):
+                kwargs[k] = v(self)
+
         pairs = list(sum(kwargs.items(), ()))
 
         # Figure out columns of the output
@@ -4078,8 +4084,9 @@
         else:
             d[(out1, k)] = (methods.boundary_slice, (name, i - 1), low, b[j], 
False)
             low = b[j]
+            if len(a) == i + 1 or a[i] < a[i + 1]:
+                j += 1
             i += 1
-            j += 1
         c.append(low)
         k += 1
 
@@ -4113,7 +4120,7 @@
         while c[i] < b[j]:
             tmp.append((out1, i))
             i += 1
-        if last_elem and c[i] == b[-1] and (b[-1] != b[-2] or j == len(b) - 1) 
and i < k:
+        while last_elem and c[i] == b[-1] and (b[-1] != b[-2] or j == len(b) - 
1) and i < k:
             # append if last split is not included
             tmp.append((out1, i))
             i += 1
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/io/parquet.py 
new/dask-0.19.1/dask/dataframe/io/parquet.py
--- old/dask-0.19.0/dask/dataframe/io/parquet.py        2018-08-30 
18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/dataframe/io/parquet.py        2018-09-06 
13:45:35.000000000 +0200
@@ -285,12 +285,15 @@
 
     if index_names and infer_divisions is not False:
         index_name = meta.index.name
-        minmax = fastparquet.api.sorted_partitioned_columns(pf)
+        try:
+            # is https://github.com/dask/fastparquet/pull/371 available in
+            # current fastparquet installation?
+            minmax = fastparquet.api.sorted_partitioned_columns(pf, filters)
+        except TypeError:
+            minmax = fastparquet.api.sorted_partitioned_columns(pf)
         if index_name in minmax:
-            divisions = (list(minmax[index_name]['min']) +
-                         [minmax[index_name]['max'][-1]])
-            divisions = [divisions[i] for i, rg in enumerate(pf.row_groups)
-                         if rg in rgs] + [divisions[-1]]
+            divisions = minmax[index_name]
+            divisions = divisions['min'] + [divisions['max'][-1]]
         else:
             if infer_divisions is True:
                 raise ValueError(
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/io/tests/test_parquet.py 
new/dask-0.19.1/dask/dataframe/io/tests/test_parquet.py
--- old/dask-0.19.0/dask/dataframe/io/tests/test_parquet.py     2018-08-30 
18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/dataframe/io/tests/test_parquet.py     2018-09-06 
13:45:35.000000000 +0200
@@ -819,6 +819,52 @@
     assert len(ddf2) > 0
 
 
+def test_divisions_read_with_filters(tmpdir):
+    check_fastparquet()
+    tmpdir = str(tmpdir)
+    #generate dataframe
+    size = 100
+    categoricals = []
+    for value in ['a', 'b', 'c', 'd']:
+        categoricals += [value] * int(size / 4)
+    df = pd.DataFrame({'a': categoricals,
+                       'b': np.random.random(size=size),
+                       'c': np.random.randint(1, 5, size=size)})
+    d = dd.from_pandas(df, npartitions=4)
+    #save it
+    d.to_parquet(tmpdir, partition_on=['a'], engine='fastparquet')
+    #read it
+    out = dd.read_parquet(tmpdir,
+                          engine='fastparquet',
+                          filters=[('a', '==', 'b')])
+    #test it
+    expected_divisions = (25, 49)
+    assert out.divisions == expected_divisions
+
+
+def test_divisions_are_known_read_with_filters(tmpdir):
+    check_fastparquet()
+    tmpdir = str(tmpdir)
+    #generate dataframe
+    df = pd.DataFrame({'unique': [0, 0, 1, 1, 2, 2, 3, 3],
+                       'id': ['id1', 'id2',
+                              'id1', 'id2',
+                              'id1', 'id2',
+                              'id1', 'id2']},
+                      index=[0, 0, 1, 1, 2, 2, 3, 3])
+    d = dd.from_pandas(df, npartitions=2)
+    #save it
+    d.to_parquet(tmpdir, partition_on=['id'], engine='fastparquet')
+    #read it
+    out = dd.read_parquet(tmpdir,
+                          engine='fastparquet',
+                          filters=[('id', '==', 'id1')])
+    #test it
+    assert out.known_divisions
+    expected_divisions = (0, 2, 3)
+    assert out.divisions == expected_divisions
+
+
 def test_read_from_fastparquet_parquetfile(tmpdir):
     check_fastparquet()
     fn = str(tmpdir)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/partitionquantiles.py 
new/dask-0.19.1/dask/dataframe/partitionquantiles.py
--- old/dask-0.19.0/dask/dataframe/partitionquantiles.py        2018-08-30 
18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/dataframe/partitionquantiles.py        2018-09-06 
13:45:35.000000000 +0200
@@ -436,7 +436,7 @@
     qs = np.linspace(0, 1, npartitions + 1)
     token = tokenize(df, qs, upsample)
     if random_state is None:
-        random_state = hash(token) % np.iinfo(np.int32).max
+        random_state = int(token, 16) % np.iinfo(np.int32).max
     state_data = random_state_data(df.npartitions, random_state)
 
     df_keys = df.__dask_keys__()
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/shuffle.py 
new/dask-0.19.1/dask/dataframe/shuffle.py
--- old/dask-0.19.0/dask/dataframe/shuffle.py   2018-08-30 18:28:02.000000000 
+0200
+++ new/dask-0.19.1/dask/dataframe/shuffle.py   2018-09-06 13:45:35.000000000 
+0200
@@ -456,12 +456,9 @@
     c = ind._values
     typ = np.min_scalar_type(npartitions * 2)
 
-    npartitions, k, stage = [np.array(x, dtype=np.min_scalar_type(x))[()]
-                             for x in [npartitions, k, stage]]
-
     c = np.mod(c, npartitions).astype(typ, copy=False)
-    c = np.floor_divide(c, k ** stage, out=c)
-    c = np.mod(c, k, out=c)
+    np.floor_divide(c, k ** stage, out=c)
+    np.mod(c, k, out=c)
 
     indexer, locations = groupsort_indexer(c.astype(np.int64), k)
     df2 = df.take(indexer)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/tests/test_categorical.py 
new/dask-0.19.1/dask/dataframe/tests/test_categorical.py
--- old/dask-0.19.0/dask/dataframe/tests/test_categorical.py    2018-08-30 
18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/dataframe/tests/test_categorical.py    2018-09-06 
13:45:35.000000000 +0200
@@ -271,6 +271,14 @@
     assert_eq(left, pd.Index(right) if isinstance(right, np.ndarray) else 
right)
 
 
+def test_return_type_known_categories():
+    df = pd.DataFrame({"A": ['a', 'b', 'c']})
+    df['A'] = df['A'].astype('category')
+    dask_df = dd.from_pandas(df, 2)
+    ret_type = dask_df.A.cat.as_known()
+    assert isinstance(ret_type, dd.core.Series)
+
+
 class TestCategoricalAccessor:
 
     @pytest.mark.parametrize('series', cat_series)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/tests/test_dataframe.py 
new/dask-0.19.1/dask/dataframe/tests/test_dataframe.py
--- old/dask-0.19.0/dask/dataframe/tests/test_dataframe.py      2018-08-30 
18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/dataframe/tests/test_dataframe.py      2018-09-06 
13:45:35.000000000 +0200
@@ -925,6 +925,13 @@
         d.assign(foo=d_unknown.a)
 
 
+def test_assign_callable():
+    df = dd.from_pandas(pd.DataFrame({"A": range(10)}), npartitions=2)
+    a = df.assign(B=df.A.shift())
+    b = df.assign(B=lambda x: x.A.shift())
+    assert_eq(a, b)
+
+
 def test_map():
     assert_eq(d.a.map(lambda x: x + 1), full.a.map(lambda x: x + 1))
     lk = dict((v, v + 1) for v in full.a.values)
@@ -2718,6 +2725,16 @@
     assert_eq(df[cols], ddf[cols])
 
 
+def test_ipython_completion():
+    df = pd.DataFrame({'a': [1], 'b': [2]})
+    ddf = dd.from_pandas(df, npartitions=1)
+
+    completions = ddf._ipython_key_completions_()
+    assert 'a' in completions
+    assert 'b' in completions
+    assert 'c' not in completions
+
+
 def test_diff():
     df = pd.DataFrame(np.random.randn(100, 5), columns=list('abcde'))
     ddf = dd.from_pandas(df, 5)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/tests/test_multi.py 
new/dask-0.19.1/dask/dataframe/tests/test_multi.py
--- old/dask-0.19.0/dask/dataframe/tests/test_multi.py  2018-08-30 
18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/dataframe/tests/test_multi.py  2018-09-06 
13:45:35.000000000 +0200
@@ -1308,3 +1308,29 @@
     joined = ddf2.join(ddf2, rsuffix='r')
     assert joined.divisions == (1, 1)
     joined.compute()
+
+
+def test_repartition_repeated_divisions():
+    df = pd.DataFrame({'x': [0, 0, 0, 0]})
+    ddf = dd.from_pandas(df, npartitions=2).set_index('x')
+
+    ddf2 = ddf.repartition(divisions=(0, 0), force=True)
+    assert_eq(ddf2, df.set_index('x'))
+
+
+def test_multi_duplicate_divisions():
+    df1 = pd.DataFrame({'x': [0, 0, 0, 0]})
+    df2 = pd.DataFrame({'x': [0]})
+
+    ddf1 = dd.from_pandas(df1, npartitions=2).set_index('x')
+    ddf2 = dd.from_pandas(df2, npartitions=1).set_index('x')
+    assert ddf1.npartitions == 2
+    assert len(ddf1) == len(df1)
+
+    r1 = ddf1.merge(ddf2, how='left', left_index=True, right_index=True)
+
+    sf1 = df1.set_index('x')
+    sf2 = df2.set_index('x')
+    r2 = sf1.merge(sf2, how='left', left_index=True, right_index=True)
+
+    assert_eq(r1, r2)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/tests/test_shuffle.py 
new/dask-0.19.1/dask/dataframe/tests/test_shuffle.py
--- old/dask-0.19.0/dask/dataframe/tests/test_shuffle.py        2018-08-30 
18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/dataframe/tests/test_shuffle.py        2018-09-06 
13:45:35.000000000 +0200
@@ -1,9 +1,11 @@
 import os
+import sys
 import pandas as pd
 import pytest
 import pickle
 import numpy as np
 import string
+import multiprocessing as mp
 from copy import copy
 import pandas.util.testing as tm
 
@@ -358,6 +360,28 @@
         ddf.set_index('y', divisions=['a', 'b', 'd', 'c'], sorted=True)
 
 
[email protected]
[email protected](sys.version_info < (3, 4),
+                    reason="multiprocessing spawn only after Py3.4")
+def test_set_index_consistent_divisions():
+    # See https://github.com/dask/dask/issues/3867
+    df = pd.DataFrame({'x': np.random.random(100),
+                       'y': np.random.random(100) // 0.2},
+                      index=np.random.random(100))
+    ddf = dd.from_pandas(df, npartitions=4)
+    ddf = ddf.clear_divisions()
+
+    ctx = mp.get_context('spawn')
+    pool = ctx.Pool(processes=8)
+    results = [pool.apply_async(_set_index, (ddf, 'x')) for _ in range(100)]
+    divisions_set = set(result.get() for result in results)
+    assert len(divisions_set) == 1
+
+
+def _set_index(df, *args, **kwargs):
+    return df.set_index(*args, **kwargs).divisions
+
+
 @pytest.mark.parametrize('shuffle', ['disk', 'tasks'])
 def test_set_index_reduces_partitions_small(shuffle):
     df = pd.DataFrame({'x': np.random.random(100)})
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/dask.egg-info/PKG-INFO 
new/dask-0.19.1/dask.egg-info/PKG-INFO
--- old/dask-0.19.0/dask.egg-info/PKG-INFO      2018-08-30 18:41:43.000000000 
+0200
+++ new/dask-0.19.1/dask.egg-info/PKG-INFO      2018-09-06 14:15:04.000000000 
+0200
@@ -1,11 +1,12 @@
-Metadata-Version: 2.1
+Metadata-Version: 1.2
 Name: dask
-Version: 0.19.0
+Version: 0.19.1
 Summary: Parallel PyData with Task Scheduling
 Home-page: http://github.com/dask/dask/
-Maintainer: Matthew Rocklin
-Maintainer-email: [email protected]
+Author: Matthew Rocklin
+Author-email: [email protected]
 License: BSD
+Description-Content-Type: UNKNOWN
 Description: Dask
         ====
         
@@ -44,9 +45,3 @@
 Classifier: Programming Language :: Python :: 3.6
 Classifier: Programming Language :: Python :: 3.7
 Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*
-Provides-Extra: complete
-Provides-Extra: bag
-Provides-Extra: array
-Provides-Extra: delayed
-Provides-Extra: distributed
-Provides-Extra: dataframe
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/docs/source/_static/main-page.css 
new/dask-0.19.1/docs/source/_static/main-page.css
--- old/dask-0.19.0/docs/source/_static/main-page.css   2018-08-30 
18:28:02.000000000 +0200
+++ new/dask-0.19.1/docs/source/_static/main-page.css   2018-09-06 
13:45:35.000000000 +0200
@@ -22,10 +22,10 @@
   border-radius: 0.3rem;
 }
 .navbar li:hover {
-  background-color: #ECB172;
+  background-color: #FDA061;
 }
 .navbar li .nav-link{
-  color: #ECB172;
+  color: #FDA061;
 }
 .navbar li:hover .nav-link{
   color: #212529;
@@ -36,11 +36,11 @@
 }
 
 .dropdown-item {
-  color: #ECB172;
+  color: #FDA061;
 }
 
 .dropdown-item:hover {
-  background-color: #ECB172D0;
+  background-color: #FDA061D0;
 }
 
 .hero {
@@ -56,15 +56,26 @@
 
 
 .outline-dask {
-  color: #ECB172;
+  color: #FDA061;
   background-color: transparent;
-  border-color: #ECB172;
+  border-color: #FDA061;
 }
 
+
 .outline-dask:hover {
   color: #212529;
-  background-color: #ECB172;
-  border-color: #ECB172;
+  background-color: #FDA061;
+  border-color: #FDA061;
+}
+
+.solid-dask {
+  color: #212529;
+  background-color: #FDA061;
+}
+
+.solid-dask:hover {
+  color: #212529;
+  background-color: #EC9050;
 }
 
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/docs/source/changelog.rst 
new/dask-0.19.1/docs/source/changelog.rst
--- old/dask-0.19.0/docs/source/changelog.rst   2018-08-30 18:39:37.000000000 
+0200
+++ new/dask-0.19.1/docs/source/changelog.rst   2018-09-06 14:12:47.000000000 
+0200
@@ -1,7 +1,7 @@
 Changelog
 =========
 
-0.19.1 / YYYY-MM-DD
+0.19.2 / YYYY-MM-DD
 -------------------
 
 Array
@@ -25,6 +25,35 @@
 -
 
 
+0.19.1 / 2018-09-06
+-------------------
+
+Array
++++++
+
+-  Don't enforce dtype if result has no dtype (:pr:`3928`) `Matthew Rocklin`_
+-  Fix NumPy issubtype deprecation warning (:pr:`3939`) `Bruce Merry`_
+-  Fix arg reduction tokens to be unique with different arguments (:pr:`3955`) 
`Tobias de Jong`_
+-  Coerce numpy integers to ints in slicing code (:pr:`3944`) `Yu Feng`_
+-  Linalg.norm ndim along axis partial fix (:pr:`3933`) `Tobias de Jong`_
+
+Dataframe
++++++++++
+
+-  Deterministic DataFrame.set_index (:pr:`3867`) `George Sakkis`_
+-  Fix divisions in read_parquet when dealing with filters #3831 #3930 
(:pr:`3923`) (:pr:`3931`)  `@andrethrill`_
+-  Fixing returning type in categorical.as_known  (:pr:`3888`) `Sriharsha 
Hatwar`_
+-  Fix DataFrame.assign for callables (:pr:`3919`) `Tom Augspurger`_
+-  Include partitions with no width in repartition (:pr:`3941`) `Matthew 
Rocklin`_
+-  Don't constrict stage/k dtype in dataframe shuffle (:pr:`3942`) `Matthew 
Rocklin`_
+
+Documentation
++++++++++++++
+
+-  DOC: Add hint on how to render task graphs horizontally (:pr:`3922`) `Uwe 
Korn`_
+-  Add try-now button to main landing page (:pr:`3924`) `Matthew Rocklin`_
+
+
 0.19.0 / 2018-08-29
 -------------------
 
@@ -32,7 +61,7 @@
 +++++
 
 -  Fix argtopk split_every bug (:pr:`3810`) `Guido Imperiale`_
--  Ensure result computing dask.array.isnull(`) always gives a numpy array 
(:pr:`3825`) `Stephan Hoyer`_
+-  Ensure result computing dask.array.isnull() always gives a numpy array 
(:pr:`3825`) `Stephan Hoyer`_
 -  Support concatenate for scipy.sparse in dask array (:pr:`3836`) `Matthew 
Rocklin`_
 -  Fix argtopk on 32-bit systems. (:pr:`3823`) `Elliott Sales de Andrade`_
 -  Normalize keys in rechunk (:pr:`3820`) `Matthew Rocklin`_
@@ -1366,3 +1395,8 @@
 .. _`Hans Moritz Günther`: https://github.com/hamogu
 .. _`@rtobar`: https://github.com/rtobar
 .. _`Julia Signell`: https://github.com/jsignell
+.. _`Sriharsha Hatwar`: https://github.com/Sriharsha-hatwar
+.. _`Bruce Merry`: https://github.com/bmerry
+.. _`Joe Hamman`: https://github.com/jhamman
+.. _`Robert Sare`: https://github.com/rmsare
+.. _`Jeremy Chan`: https://github.com/convexset
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/docs/source/graphviz.rst 
new/dask-0.19.1/docs/source/graphviz.rst
--- old/dask-0.19.0/docs/source/graphviz.rst    2018-08-30 18:28:02.000000000 
+0200
+++ new/dask-0.19.1/docs/source/graphviz.rst    2018-09-06 13:45:35.000000000 
+0200
@@ -18,6 +18,10 @@
 except that rather than computing the result,
 they produce an image of the task graph.
 
+By default the task graph is rendered from top to bottom.
+In the case that you prefer to visualize it from left to right, pass
+``rankdir="LR"`` as a keyword argument to ``.visualize``.
+
 .. code-block:: python
 
    import dask.array as da
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' 
'--exclude=.svnignore' old/dask-0.19.0/docs/source/index.html 
new/dask-0.19.1/docs/source/index.html
--- old/dask-0.19.0/docs/source/index.html      2018-08-30 18:28:02.000000000 
+0200
+++ new/dask-0.19.1/docs/source/index.html      2018-09-06 13:45:35.000000000 
+0200
@@ -67,6 +67,7 @@
         enabling performance at scale for the tools you love
       </p>
         <a class="btn outline-dask btn-lg" href="docs.html">Learn More</a>
+        <a class="btn solid-dask btn-lg" 
href="https://mybinder.org/v2/gh/dask/dask-examples/master"; role="button">Try 
Now &raquo;</a>
       </div>
       <div class="product-device box-shadow d-none d-md-block"></div>
       <div class="product-device product-device-2 box-shadow d-none 
d-md-block"></div>

commit python-dask for openSUSE:Factory

Reply via email to