Hello community, here is the log from the commit of package python-dask for openSUSE:Factory checked in at 2019-11-17 19:23:27 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Comparing /work/SRC/openSUSE:Factory/python-dask (Old) and /work/SRC/openSUSE:Factory/.python-dask.new.26869 (New) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-dask" Sun Nov 17 19:23:27 2019 rev:23 rq:749097 version:2.8.0 Changes: -------- --- /work/SRC/openSUSE:Factory/python-dask/python-dask.changes 2019-11-13 13:26:41.603595513 +0100 +++ /work/SRC/openSUSE:Factory/.python-dask.new.26869/python-dask.changes 2019-11-17 19:23:28.898857695 +0100 @@ -1,0 +2,34 @@ +Sat Nov 16 17:53:12 UTC 2019 - Arun Persaud <a...@gmx.de> + +- update to version 2.8.0: + * Array + + Implement complete dask.array.tile function (:pr:`5574`) Bouwe + Andela + + Add median along an axis with automatic rechunking (:pr:`5575`) + Matthew Rocklin + + Allow da.asarray to chunk inputs (:pr:`5586`) Matthew Rocklin + * Bag + + Use key_split in Bag name (:pr:`5571`) Matthew Rocklin + * Core + + Switch Doctests to Py3.7 (:pr:`5573`) Ryan Nazareth + + Relax get_colors test to adapt to new Bokeh release (:pr:`5576`) + Matthew Rocklin + + Add dask.blockwise.fuse_roots optimization (:pr:`5451`) Matthew + Rocklin + + Add sizeof implementation for small dicts (:pr:`5578`) Matthew + Rocklin + + Update fsspec, gcsfs, s3fs (:pr:`5588`) Tom Augspurger + * DataFrame + + Add dropna argument to groupby (:pr:`5579`) Richard J Zamora + + Revert "Remove import of dask_cudf, which is now a part of cudf + (:pr:`5568`)" (:pr:`5590`) Matthew Rocklin + * Documentation + + Add best practice for dask.compute function (:pr:`5583`) Matthew + Rocklin + + Create FUNDING.yml (:pr:`5587`) Gina Helfrich + + Add screencast for coordination primitives (:pr:`5593`) Matthew + Rocklin + + Move funding to .github repo (:pr:`5589`) Tom Augspurger + + Update calendar link (:pr:`5569`) Tom Augspurger + +------------------------------------------------------------------- Old: ---- dask-2.7.0.tar.gz New: ---- dask-2.8.0.tar.gz ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Other differences: ------------------ ++++++ python-dask.spec ++++++ --- /var/tmp/diff_new_pack.Bj0fHb/_old 2019-11-17 19:23:29.442857463 +0100 +++ /var/tmp/diff_new_pack.Bj0fHb/_new 2019-11-17 19:23:29.446857462 +0100 @@ -27,12 +27,11 @@ %endif %define skip_python2 1 Name: python-dask%{psuffix} -Version: 2.7.0 +Version: 2.8.0 Release: 0 Summary: Minimal task scheduling abstraction License: BSD-3-Clause -Group: Development/Languages/Python -URL: http://github.com/ContinuumIO/dask/ +URL: https://github.com/ContinuumIO/dask/ Source: https://files.pythonhosted.org/packages/source/d/dask/dask-%{version}.tar.gz BuildRequires: %{python_module setuptools} BuildRequires: fdupes @@ -104,7 +103,6 @@ # This must have a Requires for dask and all the dask subpackages %package all Summary: All dask components -Group: Development/Languages/Python Requires: %{name} = %{version} Requires: %{name}-array = %{version} Requires: %{name}-bag = %{version} @@ -125,7 +123,6 @@ %package array Summary: Numpy-like array data structure for dask -Group: Development/Languages/Python Requires: %{name} = %{version} Requires: python-numpy >= 1.13.0 Recommends: python-chest @@ -149,7 +146,6 @@ %package bag Summary: Data structure generic python objects in dask -Group: Development/Languages/Python Requires: %{name} = %{version} Requires: %{name}-multiprocessing = %{version} Requires: python-cloudpickle >= 0.2.1 @@ -173,7 +169,6 @@ %package dataframe Summary: Pandas-like DataFrame data structure for dask -Group: Development/Languages/Python Requires: %{name} = %{version} Requires: %{name}-array = %{version} Requires: %{name}-multiprocessing = %{version} @@ -208,7 +203,6 @@ %package distributed Summary: Interface with the distributed task scheduler in dask -Group: Development/Languages/Python Requires: %{name} = %{version} Requires: python-distributed >= 2.0 @@ -228,7 +222,6 @@ %package dot Summary: Display dask graphs using graphviz -Group: Development/Languages/Python Requires: %{name} = %{version} Requires: graphviz Requires: graphviz-gd @@ -247,7 +240,6 @@ %package multiprocessing Summary: Display dask graphs using graphviz -Group: Development/Languages/Python Requires: %{name} = %{version} Requires: python-cloudpickle >= 0.2.1 Requires: python-partd >= 0.3.7 @@ -282,7 +274,7 @@ # test_persist # test_local_get_with_distributed_active # test_local_scheduler -%python_expand PYTHONPATH=%{buildroot}%{$python_sitelib} py.test-%{python_bin_suffix} -v dask/tests -k 'not (test_serializable_groupby_agg or test_persist or test_local_get_with_distributed_active or test_await or test_local_scheduler)' +%pytest dask/tests -k 'not (test_serializable_groupby_agg or test_persist or test_local_get_with_distributed_active or test_await or test_local_scheduler)' %endif %if !%{with test} ++++++ dask-2.7.0.tar.gz -> dask-2.8.0.tar.gz ++++++ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/PKG-INFO new/dask-2.8.0/PKG-INFO --- old/dask-2.7.0/PKG-INFO 2019-11-08 22:06:23.000000000 +0100 +++ new/dask-2.8.0/PKG-INFO 2019-11-14 23:57:18.000000000 +0100 @@ -1,6 +1,6 @@ Metadata-Version: 2.1 Name: dask -Version: 2.7.0 +Version: 2.8.0 Summary: Parallel PyData with Task Scheduling Home-page: https://github.com/dask/dask/ Maintainer: Matthew Rocklin @@ -43,10 +43,10 @@ Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Requires-Python: >=3.6 -Provides-Extra: dataframe -Provides-Extra: delayed +Provides-Extra: complete Provides-Extra: array -Provides-Extra: distributed Provides-Extra: diagnostics +Provides-Extra: dataframe Provides-Extra: bag -Provides-Extra: complete +Provides-Extra: delayed +Provides-Extra: distributed diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/_version.py new/dask-2.8.0/dask/_version.py --- old/dask-2.7.0/dask/_version.py 2019-11-08 22:06:23.000000000 +0100 +++ new/dask-2.8.0/dask/_version.py 2019-11-14 23:57:18.000000000 +0100 @@ -11,8 +11,8 @@ { "dirty": false, "error": null, - "full-revisionid": "98a1e61fcf9230e3a4dcdf8523e435ed83dfb2c0", - "version": "2.7.0" + "full-revisionid": "539d1e27a8ccce01de5f3d49f1748057c27552f2", + "version": "2.8.0" } ''' # END VERSION_JSON diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/array/__init__.py new/dask-2.8.0/dask/array/__init__.py --- old/dask-2.7.0/dask/array/__init__.py 2019-10-11 05:14:07.000000000 +0200 +++ new/dask-2.8.0/dask/array/__init__.py 2019-11-13 18:07:07.000000000 +0100 @@ -190,6 +190,7 @@ all, min, max, + median, moment, trace, argmin, diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/array/core.py new/dask-2.8.0/dask/array/core.py --- old/dask-2.7.0/dask/array/core.py 2019-11-05 22:48:29.000000000 +0100 +++ new/dask-2.8.0/dask/array/core.py 2019-11-13 21:17:45.000000000 +0100 @@ -45,7 +45,6 @@ is_integer, IndexCallable, funcname, - derived_from, SerializableLock, Dispatch, factors, @@ -3672,7 +3671,7 @@ return stack(a) elif not isinstance(getattr(a, "shape", None), Iterable): a = np.asarray(a) - return from_array(a, chunks=a.shape, getitem=getter_inline, **kwargs) + return from_array(a, getitem=getter_inline, **kwargs) def asanyarray(a): diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/array/creation.py new/dask-2.8.0/dask/array/creation.py --- old/dask-2.7.0/dask/array/creation.py 2019-10-11 05:14:07.000000000 +0200 +++ new/dask-2.8.0/dask/array/creation.py 2019-11-12 16:54:02.000000000 +0100 @@ -796,17 +796,28 @@ @derived_from(np) def tile(A, reps): - if not isinstance(reps, Integral): - raise NotImplementedError("Only integer valued `reps` supported.") - - if reps < 0: + try: + tup = tuple(reps) + except TypeError: + tup = (reps,) + if any(i < 0 for i in tup): raise ValueError("Negative `reps` are not allowed.") - elif reps == 0: - return A[..., :0] - elif reps == 1: - return A + c = asarray(A) + + if all(tup): + for nrep in tup[::-1]: + c = nrep * [c] + return block(c) - return concatenate(reps * [A], axis=-1) + d = len(tup) + if d < c.ndim: + tup = (1,) * (c.ndim - d) + tup + if c.ndim < d: + shape = (1,) * (d - c.ndim) + c.shape + else: + shape = c.shape + shape_out = tuple(s * t for s, t in zip(shape, tup)) + return empty(shape=shape_out, dtype=c.dtype) def expand_pad_value(array, pad_value): diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/array/optimization.py new/dask-2.8.0/dask/array/optimization.py --- old/dask-2.7.0/dask/array/optimization.py 2019-11-07 03:43:11.000000000 +0100 +++ new/dask-2.8.0/dask/array/optimization.py 2019-11-13 18:07:07.000000000 +0100 @@ -4,7 +4,7 @@ import numpy as np from .core import getter, getter_nofancy, getter_inline -from ..blockwise import optimize_blockwise +from ..blockwise import optimize_blockwise, fuse_roots from ..core import flatten, reverse_dict from ..optimization import cull, fuse, inline_functions from ..utils import ensure_dict @@ -40,6 +40,7 @@ # High level stage optimization if isinstance(dsk, HighLevelGraph): dsk = optimize_blockwise(dsk, keys=keys) + dsk = fuse_roots(dsk, keys=keys) # Low level task optimizations dsk = ensure_dict(dsk) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/array/reductions.py new/dask-2.8.0/dask/array/reductions.py --- old/dask-2.7.0/dask/array/reductions.py 2019-11-08 20:58:43.000000000 +0100 +++ new/dask-2.8.0/dask/array/reductions.py 2019-11-13 18:07:07.000000000 +0100 @@ -1,4 +1,5 @@ import builtins +from collections.abc import Iterable import operator from functools import partial, wraps from itertools import product, repeat @@ -20,7 +21,7 @@ from .numpy_compat import ma_divide, divide as np_divide from ..base import tokenize from ..highlevelgraph import HighLevelGraph -from ..utils import ignoring, funcname, Dispatch, deepmap, getargspec +from ..utils import ignoring, funcname, Dispatch, deepmap, getargspec, derived_from from .. import config # Generic functions to support chunks of different types @@ -1251,3 +1252,36 @@ @wraps(np.trace) def trace(a, offset=0, axis1=0, axis2=1, dtype=None): return diagonal(a, offset=offset, axis1=axis1, axis2=axis2).sum(-1, dtype=dtype) + + +@derived_from(np) +def median(a, axis=None, keepdims=False, out=None): + """ + This works by automatically chunking the reduced axes to a single chunk + and then calling ``numpy.median`` function across the remaining dimensions + """ + if axis is None: + raise NotImplementedError( + "The da.median function only works along an axis. " + "The full algorithm is difficult to do in parallel" + ) + + if not isinstance(axis, Iterable): + axis = (axis,) + + axis = [ax + a.ndim if ax < 0 else ax for ax in axis] + + a = a.rechunk({ax: -1 if ax in axis else "auto" for ax in range(a.ndim)}) + + result = a.map_blocks( + np.median, + axis=axis, + keepdims=keepdims, + drop_axis=axis if not keepdims else None, + chunks=[1 if ax in axis else c for ax, c in enumerate(a.chunks)] + if keepdims + else None, + ) + + result = handle_out(out, result) + return result diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/array/tests/test_array_core.py new/dask-2.8.0/dask/array/tests/test_array_core.py --- old/dask-2.7.0/dask/array/tests/test_array_core.py 2019-11-05 22:48:29.000000000 +0100 +++ new/dask-2.8.0/dask/array/tests/test_array_core.py 2019-11-13 21:17:45.000000000 +0100 @@ -2372,6 +2372,13 @@ assert not any(isinstance(v, np.ndarray) for v in x.dask.values()) +def test_asarray_chunks(): + with dask.config.set({"array.chunk-size": "100 B"}): + x = np.ones(1000) + d = da.asarray(x) + assert d.npartitions > 1 + + @pytest.mark.filterwarnings("ignore:the matrix subclass") def test_asanyarray(): x = np.matrix([1, 2, 3]) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/array/tests/test_array_function.py new/dask-2.8.0/dask/array/tests/test_array_function.py --- old/dask-2.7.0/dask/array/tests/test_array_function.py 2019-10-11 05:14:07.000000000 +0200 +++ new/dask-2.8.0/dask/array/tests/test_array_function.py 2019-11-13 18:07:07.000000000 +0100 @@ -61,7 +61,6 @@ lambda x: np.min_scalar_type(x), lambda x: np.linalg.det(x), lambda x: np.linalg.eigvals(x), - lambda x: np.median(x), ], ) def test_array_notimpl_function_dask(func): @@ -226,15 +225,14 @@ assert_eq(xx, yy, check_meta=False) -def test_median_func(): +def test_non_existent_func(): # Regression test for __array_function__ becoming default in numpy 1.17 - # dask has no median function, so ensure that this still calls np.median - image = da.from_array(np.array([[0, 1], [1, 2]]), chunks=(1, 2)) + # dask has no sort function, so ensure that this still calls np.sort + x = da.from_array(np.array([1, 2, 4, 3]), chunks=(2,)) if IS_NEP18_ACTIVE: with pytest.warns( - FutureWarning, - match="The `numpy.median` function is not implemented by Dask", + FutureWarning, match="The `numpy.sort` function is not implemented by Dask" ): - assert int(np.median(image)) == 1 + assert list(np.sort(x)) == [1, 2, 3, 4] else: - assert int(np.median(image)) == 1 + assert list(np.sort(x)) == [1, 2, 3, 4] diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/array/tests/test_creation.py new/dask-2.8.0/dask/array/tests/test_creation.py --- old/dask-2.7.0/dask/array/tests/test_creation.py 2019-10-11 05:14:07.000000000 +0200 +++ new/dask-2.8.0/dask/array/tests/test_creation.py 2019-11-12 16:54:02.000000000 +0100 @@ -571,9 +571,18 @@ assert all(concat(d.repeat(r).chunks)) +@pytest.mark.parametrize("reps", [2, (2, 2), (1, 2), (2, 1), (2, 3, 4, 0)]) +def test_tile_basic(reps): + a = da.asarray([0, 1, 2]) + b = [[1, 2], [3, 4]] + + assert_eq(np.tile(a.compute(), reps), da.tile(a, reps)) + assert_eq(np.tile(b, reps), da.tile(b, reps)) + + @pytest.mark.parametrize("shape, chunks", [((10,), (1,)), ((10, 11, 13), (4, 5, 3))]) -@pytest.mark.parametrize("reps", [0, 1, 2, 3, 5]) -def test_tile(shape, chunks, reps): +@pytest.mark.parametrize("reps", [0, 1, 2, 3, 5, (1,), (1, 2)]) +def test_tile_chunks(shape, chunks, reps): x = np.random.random(shape) d = da.from_array(x, chunks=chunks) @@ -591,13 +600,32 @@ @pytest.mark.parametrize("shape, chunks", [((10,), (1,)), ((10, 11, 13), (4, 5, 3))]) -@pytest.mark.parametrize("reps", [[1], [1, 2]]) -def test_tile_array_reps(shape, chunks, reps): +@pytest.mark.parametrize("reps", [0, (0,), (2, 0), (0, 3, 0, 4)]) +def test_tile_zero_reps(shape, chunks, reps): x = np.random.random(shape) d = da.from_array(x, chunks=chunks) - with pytest.raises(NotImplementedError): - da.tile(d, reps) + assert_eq(np.tile(x, reps), da.tile(d, reps)) + + +@pytest.mark.parametrize("shape, chunks", [((1, 1, 0), (1, 1, 0)), ((2, 0), (1, 0))]) +@pytest.mark.parametrize("reps", [2, (3, 2, 5)]) +def test_tile_empty_array(shape, chunks, reps): + x = np.empty(shape) + d = da.from_array(x, chunks=chunks) + + assert_eq(np.tile(x, reps), da.tile(d, reps)) + + +@pytest.mark.parametrize( + "shape", [(3,), (2, 3), (3, 4, 3), (3, 2, 3), (4, 3, 2, 4), (2, 2)] +) +@pytest.mark.parametrize("reps", [(2,), (1, 2), (2, 1), (2, 2), (2, 3, 2), (3, 2)]) +def test_tile_np_kroncompare_examples(shape, reps): + x = np.random.random(shape) + d = da.asarray(x) + + assert_eq(np.tile(x, reps), da.tile(d, reps)) skip_stat_length = pytest.mark.xfail(_numpy_117, reason="numpy-14061") diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/array/tests/test_optimization.py new/dask-2.8.0/dask/array/tests/test_optimization.py --- old/dask-2.7.0/dask/array/tests/test_optimization.py 2019-11-07 03:43:11.000000000 +0100 +++ new/dask-2.8.0/dask/array/tests/test_optimization.py 2019-11-13 18:07:07.000000000 +0100 @@ -389,3 +389,13 @@ X = da.dot(X, X.T) assert_eq(X.compute(optimize_graph=False), X) + + +def test_fuse_roots(): + x = da.ones(10, chunks=(2,)) + y = da.zeros(10, chunks=(2,)) + z = (x + 1) + (2 * y ** 2) + (zz,) = dask.optimize(z) + # assert len(zz.dask) == 5 + assert sum(map(dask.istask, zz.dask.values())) == 5 # there are some aliases + assert_eq(zz, z) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/array/tests/test_reductions.py new/dask-2.8.0/dask/array/tests/test_reductions.py --- old/dask-2.7.0/dask/array/tests/test_reductions.py 2019-10-11 05:14:07.000000000 +0200 +++ new/dask-2.8.0/dask/array/tests/test_reductions.py 2019-11-13 18:07:07.000000000 +0100 @@ -663,3 +663,15 @@ _assert(a, b, 0, 1, 2, float) _assert(a, b, offset=1, axis1=0, axis2=2, dtype=int) _assert(a, b, offset=1, axis1=0, axis2=2, dtype=float) + + +@pytest.mark.parametrize("axis", [0, [0, 1], 1, -1]) +@pytest.mark.parametrize("keepdims", [True, False]) +def test_median(axis, keepdims): + x = np.arange(100).reshape((2, 5, 10)) + d = da.from_array(x, chunks=2) + + assert_eq( + da.median(d, axis=axis, keepdims=keepdims), + np.median(x, axis=axis, keepdims=keepdims), + ) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/bag/core.py new/dask-2.8.0/dask/bag/core.py --- old/dask-2.7.0/dask/bag/core.py 2019-11-08 20:58:43.000000000 +0100 +++ new/dask-2.8.0/dask/bag/core.py 2019-11-12 16:54:02.000000000 +0100 @@ -81,6 +81,7 @@ ensure_dict, ensure_bytes, ensure_unicode, + key_split, ) from . import chunk @@ -492,8 +493,7 @@ return type(self), (self.name, self.npartitions) def __str__(self): - name = self.name if len(self.name) < 10 else self.name[:7] + "..." - return "dask.bag<%s, npartitions=%d>" % (name, self.npartitions) + return "dask.bag<%s, npartitions=%d>" % (key_split(self.name), self.npartitions) __repr__ = __str__ @@ -1543,10 +1543,10 @@ >>> df = b.to_dataframe() >>> df.compute() - balance name - 0 100 Alice - 1 200 Bob - 0 300 Charlie + name balance + 0 Alice 100 + 1 Bob 200 + 0 Charlie 300 """ import pandas as pd import dask.dataframe as dd diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/bag/tests/test_bag.py new/dask-2.8.0/dask/bag/tests/test_bag.py --- old/dask-2.7.0/dask/bag/tests/test_bag.py 2019-11-08 20:58:43.000000000 +0100 +++ new/dask-2.8.0/dask/bag/tests/test_bag.py 2019-11-12 16:54:02.000000000 +0100 @@ -180,6 +180,8 @@ assert str(b.npartitions) in func(b) assert b.name[:5] in func(b) + assert "from_sequence" in func(db.from_sequence(range(5))) + def test_pluck(): d = {("x", 0): [(1, 10), (2, 20)], ("x", 1): [(3, 30), (4, 40)]} diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/blockwise.py new/dask-2.8.0/dask/blockwise.py --- old/dask-2.7.0/dask/blockwise.py 2019-11-07 03:43:11.000000000 +0100 +++ new/dask-2.8.0/dask/blockwise.py 2019-11-13 18:07:07.000000000 +0100 @@ -12,7 +12,7 @@ from .core import reverse_dict from .delayed import to_task_dask from .highlevelgraph import HighLevelGraph -from .optimization import SubgraphCallable +from .optimization import SubgraphCallable, fuse from .utils import ensure_dict, homogeneous_deepmap, apply @@ -775,3 +775,59 @@ raise ValueError("Shapes do not align %s" % g) return toolz.valmap(toolz.first, g2) + + +def fuse_roots(graph: HighLevelGraph, keys: list): + """ + Fuse nearby layers if they don't have dependencies + + Often Blockwise sections of the graph fill out all of the computation + except for the initial data access or data loading layers:: + + Large Blockwise Layer + | | | + X Y Z + + This can be troublesome because X, Y, and Z tasks may be executed on + different machines, and then require communication to move around. + + This optimization identifies this situation, lowers all of the graphs to + concrete dicts, and then calls ``fuse`` on them, with a width equal to the + number of layers like X, Y, and Z. + + This is currently used within array and dataframe optimizations. + + Parameters + ---------- + graph: HighLevelGraph + The full graph of the computation + keys: list + The output keys of the comptuation, to be passed on to fuse + + See Also + -------- + Blockwise + fuse + """ + layers = graph.layers.copy() + dependencies = graph.dependencies.copy() + dependents = reverse_dict(dependencies) + + for name, layer in graph.layers.items(): + deps = graph.dependencies[name] + if ( + isinstance(layer, Blockwise) + and len(deps) > 1 + and not any(dependencies[dep] for dep in deps) # no need to fuse if 0 or 1 + and all(len(dependents[dep]) == 1 for dep in deps) + ): + new = toolz.merge(layer, *[layers[dep] for dep in deps]) + new, _ = fuse(new, keys, ave_width=len(deps)) + + for dep in deps: + del layers[dep] + + layers[name] = new + dependencies[name] = set() + + return HighLevelGraph(layers, dependencies) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/dataframe/backends.py new/dask-2.8.0/dask/dataframe/backends.py --- old/dask-2.7.0/dask/dataframe/backends.py 2019-11-08 20:58:43.000000000 +0100 +++ new/dask-2.8.0/dask/dataframe/backends.py 2019-11-14 20:24:20.000000000 +0100 @@ -15,4 +15,4 @@ @meta_nonempty.register_lazy("cudf") @make_meta.register_lazy("cudf") def _register_cudf(): - import cudf # noqa: F401 + import dask_cudf # noqa: F401 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/dataframe/core.py new/dask-2.8.0/dask/dataframe/core.py --- old/dask-2.7.0/dask/dataframe/core.py 2019-11-05 22:48:30.000000000 +0100 +++ new/dask-2.8.0/dask/dataframe/core.py 2019-11-13 18:07:07.000000000 +0100 @@ -717,7 +717,7 @@ 2017-01-08 13.0 2017-01-09 15.0 2017-01-10 17.0 - dtype: float64 + Freq: D, dtype: float64 """ from .rolling import map_overlap @@ -2079,7 +2079,7 @@ ) layer[(name, i)] = (aggregate, (cumpart._name, i), (cname, i)) graph = HighLevelGraph.from_collections( - cname, layer, dependencies=[cumpart, cumlast] + name, layer, dependencies=[cumpart, cumlast] ) result = new_dd_object(graph, name, chunk(self._meta), self.divisions) return handle_out(out, result) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/dataframe/groupby.py new/dask-2.8.0/dask/dataframe/groupby.py --- old/dask-2.7.0/dask/dataframe/groupby.py 2019-11-05 22:48:30.000000000 +0100 +++ new/dask-2.8.0/dask/dataframe/groupby.py 2019-11-13 18:07:11.000000000 +0100 @@ -162,22 +162,25 @@ return df.groupby(**kwargs) -def _groupby_slice_apply(df, grouper, key, func, *args, **kwargs): +def _groupby_slice_apply( + df, grouper, key, func, *args, group_keys=True, dropna=None, **kwargs +): # No need to use raise if unaligned here - this is only called after # shuffling, which makes everything aligned already - group_keys = kwargs.pop("group_keys", True) - g = df.groupby(grouper, group_keys=group_keys) + dropna = {"dropna": dropna} if dropna is not None else {} + g = df.groupby(grouper, group_keys=group_keys, **dropna) if key: g = g[key] return g.apply(func, *args, **kwargs) -def _groupby_slice_transform(df, grouper, key, func, *args, **kwargs): +def _groupby_slice_transform( + df, grouper, key, func, *args, group_keys=True, dropna=None, **kwargs +): # No need to use raise if unaligned here - this is only called after # shuffling, which makes everything aligned already - group_keys = kwargs.pop("group_keys", True) - - g = df.groupby(grouper, group_keys=group_keys) + dropna = {"dropna": dropna} if dropna is not None else {} + g = df.groupby(grouper, group_keys=group_keys, **dropna) if key: g = g[key] @@ -271,15 +274,16 @@ self.__name__ = name -def _groupby_aggregate(df, aggfunc=None, levels=None, **kwargs): - return aggfunc(df.groupby(level=levels, sort=False), **kwargs) +def _groupby_aggregate(df, aggfunc=None, levels=None, dropna=None, **kwargs): + dropna = {"dropna": dropna} if dropna is not None else {} + return aggfunc(df.groupby(level=levels, sort=False, **dropna), **kwargs) -def _apply_chunk(df, *index, **kwargs): +def _apply_chunk(df, *index, dropna=None, **kwargs): func = kwargs.pop("chunk") columns = kwargs.pop("columns") - - g = _groupby_raise_unaligned(df, by=index) + dropna = {"dropna": dropna} if dropna is not None else {} + g = _groupby_raise_unaligned(df, by=index, **dropna) if is_series_like(df) or columns is None: return func(g, **kwargs) @@ -980,9 +984,11 @@ The slice keys applied to GroupBy result group_keys: bool Passed to pandas.DataFrame.groupby() + dropna: bool + Whether to drop null values from groupby index """ - def __init__(self, df, by=None, slice=None, group_keys=True): + def __init__(self, df, by=None, slice=None, group_keys=True, dropna=None): assert isinstance(df, (DataFrame, Series)) self.group_keys = group_keys @@ -1020,7 +1026,13 @@ else: index_meta = self.index - self._meta = self.obj._meta.groupby(index_meta, group_keys=group_keys) + self.dropna = {} + if dropna is not None: + self.dropna["dropna"] = dropna + + self._meta = self.obj._meta.groupby( + index_meta, group_keys=group_keys, **self.dropna + ) @property def _meta_nonempty(self): @@ -1041,7 +1053,7 @@ else: index_meta = self.index - grouped = sample.groupby(index_meta, group_keys=self.group_keys) + grouped = sample.groupby(index_meta, group_keys=self.group_keys, **self.dropna) return _maybe_slice(grouped, self._slice) def _aca_agg( @@ -1068,12 +1080,16 @@ if not isinstance(self.index, list) else [self.obj] + self.index, chunk=_apply_chunk, - chunk_kwargs=dict(chunk=func, columns=columns, **chunk_kwargs), + chunk_kwargs=dict( + chunk=func, columns=columns, **chunk_kwargs, **self.dropna + ), aggregate=_groupby_aggregate, meta=meta, token=token, split_every=split_every, - aggregate_kwargs=dict(aggfunc=aggfunc, levels=levels, **aggregate_kwargs), + aggregate_kwargs=dict( + aggfunc=aggfunc, levels=levels, **aggregate_kwargs, **self.dropna + ), split_out=split_out, split_out_setup=split_out_on_index, ) @@ -1097,7 +1113,8 @@ chunk=chunk, columns=columns, token=name_part, - meta=meta + meta=meta, + **self.dropna ) cumpart_raw_frame = ( @@ -1125,7 +1142,8 @@ columns=0 if columns is None else columns, chunk=M.last, meta=meta, - token=name_last + token=name_last, + **self.dropna ) # aggregate cumulated partitions and its previous last element @@ -1585,6 +1603,7 @@ token=funcname(func), *args, group_keys=self.group_keys, + **self.dropna, **kwargs ) @@ -1672,6 +1691,7 @@ token=funcname(func), *args, group_keys=self.group_keys, + **self.dropna, **kwargs ) @@ -1684,9 +1704,9 @@ def __getitem__(self, key): if isinstance(key, list): - g = DataFrameGroupBy(self.obj, by=self.index, slice=key) + g = DataFrameGroupBy(self.obj, by=self.index, slice=key, **self.dropna) else: - g = SeriesGroupBy(self.obj, by=self.index, slice=key) + g = SeriesGroupBy(self.obj, by=self.index, slice=key, **self.dropna) # error is raised from pandas g._meta = g._meta[key] diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/dataframe/optimize.py new/dask-2.8.0/dask/dataframe/optimize.py --- old/dask-2.7.0/dask/dataframe/optimize.py 2019-11-07 03:43:11.000000000 +0100 +++ new/dask-2.8.0/dask/dataframe/optimize.py 2019-11-13 18:07:07.000000000 +0100 @@ -5,7 +5,7 @@ from .. import config, core from ..highlevelgraph import HighLevelGraph from ..utils import ensure_dict -from ..blockwise import optimize_blockwise, Blockwise +from ..blockwise import optimize_blockwise, fuse_roots, Blockwise def optimize(dsk, keys, **kwargs): @@ -14,6 +14,7 @@ # Think about an API for this. dsk = optimize_read_parquet_getitem(dsk) dsk = optimize_blockwise(dsk, keys=list(core.flatten(keys))) + dsk = fuse_roots(dsk, keys=list(core.flatten(keys))) dsk = ensure_dict(dsk) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/dataframe/tests/test_groupby.py new/dask-2.8.0/dask/dataframe/tests/test_groupby.py --- old/dask-2.7.0/dask/dataframe/tests/test_groupby.py 2019-11-05 22:48:30.000000000 +0100 +++ new/dask-2.8.0/dask/dataframe/tests/test_groupby.py 2019-11-13 18:07:11.000000000 +0100 @@ -2187,3 +2187,60 @@ lambda series: series - series.mean() ), ) + + +@pytest.mark.xfail(reason="dropna kwarg not supported in pandas groupby.") +@pytest.mark.parametrize("dropna", [False, True]) +def test_groupby_dropna_pandas(dropna): + + # The `dropna` arg is not currently supported by pandas + # (See #https://github.com/pandas-dev/pandas/pull/21669) + # Dask supports the argument for the cudf backend, + # but passing it to the pandas backend will fail. + + # TODO: Expand test when `dropna` is supported in pandas. + # (See: `test_groupby_dropna_cudf`) + + df = pd.DataFrame( + {"a": [1, 2, 3, 4, None, None, 7, 8], "e": [4, 5, 6, 3, 2, 1, 0, 0]} + ) + ddf = dd.from_pandas(df, npartitions=3) + + dask_result = ddf.groupby("a", dropna=dropna) + pd_result = df.groupby("a", dropna=dropna) + assert_eq(dask_result, pd_result) + + +@pytest.mark.parametrize("dropna", [False, True, None]) +@pytest.mark.parametrize("by", ["a", "c", "d", ["a", "b"], ["a", "c"], ["a", "d"]]) +def test_groupby_dropna_cudf(dropna, by): + + # NOTE: This test requires cudf/dask_cudf, and will + # be skipped by non-GPU CI + + cudf = pytest.importorskip("cudf") + dask_cudf = pytest.importorskip("dask_cudf") + + df = cudf.DataFrame( + { + "a": [1, 2, 3, 4, None, None, 7, 8], + "b": [1, 0] * 4, + "c": ["a", "b", None, None, "e", "f", "g", "h"], + "e": [4, 5, 6, 3, 2, 1, 0, 0], + } + ) + df["d"] = df["c"].astype("category") + ddf = dask_cudf.from_cudf(df, npartitions=3) + + if dropna is None: + dask_result = ddf.groupby(by).e.sum() + cudf_result = df.groupby(by).e.sum() + else: + dask_result = ddf.groupby(by, dropna=dropna).e.sum() + cudf_result = df.groupby(by, dropna=dropna).e.sum() + if by in ["c", "d"]: + # Loose string/category index name in cudf... + dask_result = dask_result.compute() + dask_result.index.name = cudf_result.index.name + + assert_eq(dask_result, cudf_result) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/diagnostics/profile_visualize.py new/dask-2.8.0/dask/diagnostics/profile_visualize.py --- old/dask-2.7.0/dask/diagnostics/profile_visualize.py 2019-10-11 05:14:07.000000000 +0200 +++ new/dask-2.8.0/dask/diagnostics/profile_visualize.py 2019-11-12 16:54:02.000000000 +0100 @@ -341,6 +341,7 @@ The completed bokeh plot object. """ bp = import_required("bokeh.plotting", _BOKEH_MISSING_MSG) + import bokeh from bokeh import palettes from bokeh.models import LinearAxis, Range1d @@ -365,7 +366,17 @@ t = mem = cpu = [] p = bp.figure(y_range=(0, 100), x_range=(0, 1), **defaults) colors = palettes.all_palettes[palette][6] - p.line(t, cpu, color=colors[0], line_width=4, legend="% CPU") + p.line( + t, + cpu, + color=colors[0], + line_width=4, + **{ + "legend_label" + if LooseVersion(bokeh.__version__) >= "1.4" + else "legend": "% CPU" + } + ) p.yaxis.axis_label = "% CPU" p.extra_y_ranges = { "memory": Range1d( @@ -373,7 +384,16 @@ ) } p.line( - t, mem, color=colors[2], y_range_name="memory", line_width=4, legend="Memory" + t, + mem, + color=colors[2], + y_range_name="memory", + line_width=4, + **{ + "legend_label" + if LooseVersion(bokeh.__version__) >= "1.4" + else "legend": "Memory" + } ) p.add_layout(LinearAxis(y_range_name="memory", axis_label="Memory (MB)"), "right") p.xaxis.axis_label = "Time (s)" diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/diagnostics/tests/test_profiler.py new/dask-2.8.0/dask/diagnostics/tests/test_profiler.py --- old/dask-2.7.0/dask/diagnostics/tests/test_profiler.py 2019-10-11 05:14:07.000000000 +0200 +++ new/dask-2.8.0/dask/diagnostics/tests/test_profiler.py 2019-11-12 16:54:02.000000000 +0100 @@ -362,13 +362,12 @@ @ignore_abc_warning def test_get_colors(): from dask.diagnostics.profile_visualize import get_colors - from bokeh.palettes import Blues9, Blues5, Viridis - from itertools import cycle + from bokeh.palettes import Blues256, Blues5, Viridis funcs = list(range(11)) cmap = get_colors("Blues", funcs) - lk = dict(zip(funcs, cycle(Blues9))) - assert cmap == [lk[i] for i in funcs] + assert set(cmap) < set(Blues256) + assert len(set(cmap)) == 11 funcs = list(range(5)) cmap = get_colors("Blues", funcs) diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/optimization.py new/dask-2.8.0/dask/optimization.py --- old/dask-2.7.0/dask/optimization.py 2019-11-07 03:43:11.000000000 +0100 +++ new/dask-2.8.0/dask/optimization.py 2019-11-13 18:07:07.000000000 +0100 @@ -1,4 +1,5 @@ import math +import numbers import re from . import config, core @@ -516,6 +517,11 @@ reducible = {k for k, vals in rdeps.items() if len(vals) == 1} if keys: reducible -= keys + + for k, v in dsk.items(): + if type(v) is not tuple and not isinstance(v, (numbers.Number, str)): + reducible.discard(k) + if not reducible and ( not fuse_subgraphs or all(len(set(v)) != 1 for v in rdeps.values()) ): diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/sizeof.py new/dask-2.8.0/dask/sizeof.py --- old/dask-2.7.0/dask/sizeof.py 2019-11-05 22:48:30.000000000 +0100 +++ new/dask-2.8.0/dask/sizeof.py 2019-11-13 18:07:11.000000000 +0100 @@ -28,6 +28,14 @@ return getsizeof(seq) + sum(map(sizeof, seq)) +@sizeof.register(dict) +def sizeof_python_dict(d): + if len(d) > 10: + return getsizeof(d) + 1000 * len(d) + else: + return getsizeof(d) + sum(map(sizeof, d.keys())) + sum(map(sizeof, d.values())) + + @sizeof.register_lazy("cupy") def register_cupy(): import cupy diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/tests/test_base.py new/dask-2.8.0/dask/tests/test_base.py --- old/dask-2.7.0/dask/tests/test_base.py 2019-11-08 20:58:43.000000000 +0100 +++ new/dask-2.8.0/dask/tests/test_base.py 2019-11-13 18:07:07.000000000 +0100 @@ -582,9 +582,7 @@ # Otherwise, the lengths below would be 4 and 0. assert len([k for k in keys if "mul" in k[0]]) == 8 assert len([k for k in keys if "add" in k[0]]) == 4 - assert ( - len([k for k in keys if "add-from_sequence-mul" in k[0]]) == 4 - ) # See? Renamed + assert len([k for k in keys if "add-mul" in k[0]]) == 4 # See? Renamed @pytest.mark.skipif("not da") diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/tests/test_optimization.py new/dask-2.8.0/dask/tests/test_optimization.py --- old/dask-2.7.0/dask/tests/test_optimization.py 2019-11-07 03:43:11.000000000 +0100 +++ new/dask-2.8.0/dask/tests/test_optimization.py 2019-11-13 18:07:07.000000000 +0100 @@ -1285,3 +1285,15 @@ } ) assert res == sol + + +def test_dont_fuse_numpy_arrays(): + """ + Some types should stay in the graph bare + + This helps with things like serialization + """ + np = pytest.importorskip("numpy") + dsk = {"x": np.arange(5), "y": (inc, "x")} + + assert fuse(dsk, "y")[0] == dsk diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask/tests/test_sizeof.py new/dask-2.8.0/dask/tests/test_sizeof.py --- old/dask-2.7.0/dask/tests/test_sizeof.py 2019-11-05 22:48:30.000000000 +0100 +++ new/dask-2.8.0/dask/tests/test_sizeof.py 2019-11-13 18:07:11.000000000 +0100 @@ -121,3 +121,11 @@ assert sizeof(empty.columns[0]) > 0 assert sizeof(empty.columns[1]) > 0 assert sizeof(empty.columns[2]) > 0 + + +def test_dict(): + np = pytest.importorskip("numpy") + x = np.ones(10000) + assert sizeof({"x": x}) > x.nbytes + assert sizeof({"x": [x]}) > x.nbytes + assert sizeof({"x": [{"y": x}]}) > x.nbytes diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask.egg-info/PKG-INFO new/dask-2.8.0/dask.egg-info/PKG-INFO --- old/dask-2.7.0/dask.egg-info/PKG-INFO 2019-11-08 22:06:23.000000000 +0100 +++ new/dask-2.8.0/dask.egg-info/PKG-INFO 2019-11-14 23:57:18.000000000 +0100 @@ -1,6 +1,6 @@ Metadata-Version: 2.1 Name: dask -Version: 2.7.0 +Version: 2.8.0 Summary: Parallel PyData with Task Scheduling Home-page: https://github.com/dask/dask/ Maintainer: Matthew Rocklin @@ -43,10 +43,10 @@ Classifier: Programming Language :: Python :: 3.6 Classifier: Programming Language :: Python :: 3.7 Requires-Python: >=3.6 -Provides-Extra: dataframe -Provides-Extra: delayed +Provides-Extra: complete Provides-Extra: array -Provides-Extra: distributed Provides-Extra: diagnostics +Provides-Extra: dataframe Provides-Extra: bag -Provides-Extra: complete +Provides-Extra: delayed +Provides-Extra: distributed diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/dask.egg-info/requires.txt new/dask-2.8.0/dask.egg-info/requires.txt --- old/dask-2.7.0/dask.egg-info/requires.txt 2019-11-08 22:06:23.000000000 +0100 +++ new/dask-2.8.0/dask.egg-info/requires.txt 2019-11-14 23:57:18.000000000 +0100 @@ -5,7 +5,7 @@ [bag] cloudpickle>=0.2.1 -fsspec>=0.5.1 +fsspec>=0.6.0 toolz>=0.7.3 partd>=0.3.10 @@ -14,7 +14,7 @@ bokeh>=1.0.0 cloudpickle>=0.2.1 distributed>=2.0 -fsspec>=0.5.1 +fsspec>=0.6.0 numpy>=1.13.0 pandas>=0.21.0 partd>=0.3.10 @@ -25,7 +25,7 @@ pandas>=0.21.0 toolz>=0.7.3 partd>=0.3.10 -fsspec>=0.5.1 +fsspec>=0.6.0 [delayed] cloudpickle>=0.2.1 diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/docs/source/best-practices.rst new/dask-2.8.0/docs/source/best-practices.rst --- old/dask-2.7.0/docs/source/best-practices.rst 2019-10-11 05:14:07.000000000 +0200 +++ new/dask-2.8.0/docs/source/best-practices.rst 2019-11-13 18:07:11.000000000 +0100 @@ -334,3 +334,28 @@ for item in L: result = process(item, df) # include pointer to df in every delayed call results.append(result) + + +Avoid calling compute repeatedly +-------------------------------- + +Compute related results with shared computations in a single :func:`dask.compute` call + +.. code-block:: python + + # Don't repeatedly call compute + + df = dd.read_csv("...") + xmin = df.x.min().compute() + xmax = df.x.max().compute() + +.. code-block:: python + + # Do compute multiple results at the same time + + df = dd.read_csv("...") + + xmin, xmax = dask.compute(df.x.min(), df.x.max()) + +This allows Dask to compute the shared parts of the computation (like the +``dd.read_csv`` call above) only once, rather than once per ``compute`` call. diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/docs/source/changelog.rst new/dask-2.8.0/docs/source/changelog.rst --- old/dask-2.7.0/docs/source/changelog.rst 2019-11-08 21:16:03.000000000 +0100 +++ new/dask-2.8.0/docs/source/changelog.rst 2019-11-14 23:55:24.000000000 +0100 @@ -1,6 +1,43 @@ Changelog ========= +2.8.0 / 2019-11-14 +------------------ + +Array ++++++ +- Implement complete dask.array.tile function (:pr:`5574`) `Bouwe Andela`_ +- Add median along an axis with automatic rechunking (:pr:`5575`) `Matthew Rocklin`_ +- Allow da.asarray to chunk inputs (:pr:`5586`) `Matthew Rocklin`_ + +Bag ++++ + +- Use key_split in Bag name (:pr:`5571`) `Matthew Rocklin`_ + +Core +++++ +- Switch Doctests to Py3.7 (:pr:`5573`) `Ryan Nazareth`_ +- Relax get_colors test to adapt to new Bokeh release (:pr:`5576`) `Matthew Rocklin`_ +- Add dask.blockwise.fuse_roots optimization (:pr:`5451`) `Matthew Rocklin`_ +- Add sizeof implementation for small dicts (:pr:`5578`) `Matthew Rocklin`_ +- Update fsspec, gcsfs, s3fs (:pr:`5588`) `Tom Augspurger`_ + +DataFrame ++++++++++ +- Add dropna argument to groupby (:pr:`5579`) `Richard J Zamora`_ +- Revert "Remove import of dask_cudf, which is now a part of cudf (:pr:`5568`)" (:pr:`5590`) `Matthew Rocklin`_ + +Documentation ++++++++++++++ + +- Add best practice for dask.compute function (:pr:`5583`) `Matthew Rocklin`_ +- Create FUNDING.yml (:pr:`5587`) `Gina Helfrich`_ +- Add screencast for coordination primitives (:pr:`5593`) `Matthew Rocklin`_ +- Move funding to .github repo (:pr:`5589`) `Tom Augspurger`_ +- Update calendar link (:pr:`5569`) `Tom Augspurger`_ + + 2.7.0 / 2019-11-08 ------------------ @@ -37,8 +74,8 @@ - Explicitly use iloc for row indexing (:pr:`5500`) `Krishan Bhasin`_ - Accept dask arrays on columns assignemnt (:pr:`5224`) `Henrique Ribeiro`- - Implement unique and value_counts for SeriesGroupBy (:pr:`5358`) `Scott Sievert`_ -- Add sizeof definition for pyarrow tables and columns (:pr:`5522`) `Richard J Zamora`_ -- Enable row-group task partitioning in pyarrow-based read_parquet (:pr:`5508`) `Richard J Zamora`_ +- Add sizeof definition for pyarrow tables and columns (:pr:`5522`) `Richard J Zamora`_ +- Enable row-group task partitioning in pyarrow-based read_parquet (:pr:`5508`) `Richard J Zamora`_ - Removes npartitions='auto' from dd.merge docstring (:pr:`5531`) `James Bourbeau`_ - Apply enforce error message shows non-overlapping columns. (:pr:`5530`) `Tom Augspurger`_ - Optimize meta_nonempty for repetitive dtypes (:pr:`5553`) `Petio Petrov`_ @@ -2662,3 +2699,4 @@ .. _`Mads R. B. Kristensen`: https://github.com/madsbk .. _`Prithvi MK`: https://github.com/pmk21 .. _`Eric Dill`: https://github.com/ericdill +.. _`Gina Helfrich`: https://github.com/Dr-G diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/docs/source/futures.rst new/dask-2.8.0/docs/source/futures.rst --- old/dask-2.7.0/docs/source/futures.rst 2019-10-11 05:14:07.000000000 +0200 +++ new/dask-2.8.0/docs/source/futures.rst 2019-11-14 20:24:20.000000000 +0100 @@ -446,6 +446,16 @@ resources, track progress of ongoing computations, or share data in side-channels between many workers, clients, and tasks sensibly. +.. raw:: html + + <iframe width="560" + height="315" + src="https://www.youtube.com/embed/Q-Y3BR1u7c0" + style="margin: 0 auto 20px auto; display: block;" + frameborder="0" + allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" + allowfullscreen></iframe> + These features are rarely necessary for common use of Dask. We recommend that beginning users stick with using the simpler futures found above (like ``Client.submit`` and ``Client.gather``) rather than embracing needlessly diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/docs/source/install.rst new/dask-2.8.0/docs/source/install.rst --- old/dask-2.7.0/docs/source/install.rst 2019-11-05 22:48:30.000000000 +0100 +++ new/dask-2.8.0/docs/source/install.rst 2019-11-13 21:17:45.000000000 +0100 @@ -96,9 +96,9 @@ +-------------+----------+--------------------------------------------------------------+ | fastparquet | | Storing and reading data from parquet files | +-------------+----------+--------------------------------------------------------------+ -| fsspec | >=0.5.1 | Used for local, cluster and remote data IO | +| fsspec | >=0.6.0 | Used for local, cluster and remote data IO | +-------------+----------+--------------------------------------------------------------+ -| gcsfs | | File-system interface to Google Cloud Storage | +| gcsfs | >=0.4.0 | File-system interface to Google Cloud Storage | +-------------+----------+--------------------------------------------------------------+ | murmurhash | | Faster hashing of arrays | +-------------+----------+--------------------------------------------------------------+ @@ -112,7 +112,7 @@ +-------------+----------+--------------------------------------------------------------+ | pyarrow | >=0.14.0 | Python library for Apache Arrow | +-------------+----------+--------------------------------------------------------------+ -| s3fs | | Reading from Amazon S3 | +| s3fs | >=0.4.0 | Reading from Amazon S3 | +-------------+----------+--------------------------------------------------------------+ | sqlalchemy | | Writing and reading from SQL databases | +-------------+----------+--------------------------------------------------------------+ diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/docs/source/support.rst new/dask-2.8.0/docs/source/support.rst --- old/dask-2.7.0/docs/source/support.rst 2019-11-05 22:48:30.000000000 +0100 +++ new/dask-2.8.0/docs/source/support.rst 2019-11-14 20:24:20.000000000 +0100 @@ -23,13 +23,16 @@ Overflow or GitHub. 4. **Monthly developer meeting** happens the first Thursday of the month at 11:00 US Central Time in `this video meeting <https://zoom.us/j/802251830>`_. - Subscribe to `this Google Calendar invite`_ to be notified of changes to - the meeting schedule. Meeting notes are available at + Meeting notes are available at https://docs.google.com/document/d/1UqNAP87a56ERH_xkQsS5Q_0PKYybd5Lj2WANy_hRzI0/edit + You can subscribe to this calendar to be notified of changes: + + * `Google Calendar <https://calendar.google.com/calendar/embed?src=4l0vts0c1cgdbq5jhcogj55sfs%40group.calendar.google.com&ctz=America%2FChicago>`__ + * `iCal <https://calendar.google.com/calendar/ical/4l0vts0c1cgdbq5jhcogj55sfs%40group.calendar.google.com/public/basic.ics>`__ + .. _`Stack Overflow with the #dask tag`: https://stackoverflow.com/questions/tagged/dask .. _`GitHub issue tracker`: https://github.com/dask/dask/issues/ -.. _`this Google Calendar invite`: https://calendar.google.com/event?action=TEMPLATE&tmeid=NmxnamVvcGtjY3E2NGI5bTZzcW1hYjlrYzhybTZiYjFjY29qOGI5ZzY0cWoyYzFrNjFpMzhwaGlja18yMDE5MDYwNlQxNjAwMDBaIDRsMHZ0czBjMWNnZGJxNWpoY29najU1c2ZzQGc&tmsrc=4l0vts0c1cgdbq5jhcogj55sfs%40group.calendar.google.com&scp=ALL Asking for help diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.7.0/setup.py new/dask-2.8.0/setup.py --- old/dask-2.7.0/setup.py 2019-11-08 20:58:43.000000000 +0100 +++ new/dask-2.8.0/setup.py 2019-11-14 23:56:25.000000000 +0100 @@ -11,7 +11,7 @@ "array": ["numpy >= 1.13.0", "toolz >= 0.7.3"], "bag": [ "cloudpickle >= 0.2.1", - "fsspec >= 0.5.1", + "fsspec >= 0.6.0", "toolz >= 0.7.3", "partd >= 0.3.10" ], @@ -20,7 +20,7 @@ "pandas >= 0.21.0", "toolz >= 0.7.3", "partd >= 0.3.10", - "fsspec >= 0.5.1", + "fsspec >= 0.6.0", ], "distributed": ["distributed >= 2.0"], "diagnostics": ["bokeh >= 1.0.0"],