On Wed, Jul 4, 2012 at 9:31 PM, Terry Reedy <tjre...@udel.edu> wrote: > On 7/4/2012 5:57 AM, anatoly techtonik wrote: >> >> On Fri, Jun 29, 2012 at 11:32 PM, Georg Brandl <g.bra...@gmx.net> wrote: > > >>> Anatoly, so far there were no negative votes -- would you care to go >>> another step and propose a patch? >> >> >> Was about to say "no problem", > > > Did you read that there *are* strong negative votes? And that this idea has > been rejected before? I summarized the objections in my two responses and > pointed to the tracker issues. One of the objections is that there are 4 > different things one might want if the sequence length is not an even > multiple of the chunk size. Your original 'idea' did not specify.
I actually meant that there is a problem to propose a patch in the sense of getting checkout, working on a diff, sending it by attaching to bug tracker as developer guide says. >> For now the best thing I can do (I don't risk even to mention anything >> with 3.3) is to copy/paste code from the docs here: >> >> from itertools import izip_longest >> def chunks(iterable, size, fill=None): >> """Split an iterable into blocks of fixed-length""" >> # chunks('ABCDEFG', 3, 'x') --> ABC DEF Gxx >> args = [iter(iterable)] * size >> return izip_longest(fillvalue=fill, *args) > > > Python ideas is about Python 3 ideas. Please post Python 3 code. > > This is actually a one liner > > return zip_longest(*[iter(iterable)]*size, fillvalue=file) > > We don't generally add such to the stdlib. Can you figure out from the code what this stuff does? It doesn't give chunks of strings. >> BTW, this doesn't work as expected (at least for strings). Expected is: >> chunks('ABCDEFG', 3, 'x') --> 'ABC' 'DEF' 'Gxx' >> got: >> chunks('ABCDEFG', 3, 'x') --> ('A' 'B' 'C') ('D' 'E' 'F') ('G' 'x' 'x') > > > One of the problems with idea of 'add a chunker' is that there are at least > a dozen variants that different people want. That's not the problem. People always want something extra. The problem that we don't have a real wish distribution. If 1000 people want chunks and 1 wants groups with exception - we still account these as equal variants. Therefore my idea is deliberately limited to "string to chunks" user story, and SO implementation proposal. > I discussed the problem of > return types issue in my responses. I showed how to get the 'expected' > response above using grouper, but also suggested that it is the wrong basis > for splitting strings. Repeated slicing make more sense for concrete > sequence types. > > def seqchunk_odd(s, size): > # include odd size left over > for i in range(0, len(s), size): > yield s[i:i+size] > > print(list(seqchunk_odd('ABCDEFG', 3))) > # > ['ABC', 'DEF', 'G'] Right. That's the top answer on SO that people think should be in stdlib. Great we are talking about the same thing actually. > def seqchunk_even(s, size): > # only include even chunks > for i in range(0, size*(len(s)//size), size): > yield s[i:i+size] > > print(list(seqchunk_even('ABCDEFG', 3))) > # > ['ABC', 'DEF'] This is deducible from seqchunk_odd(s, size) > def strchunk_fill(s, size, fill): > # fill odd chunks > q, r = divmod(len(s), size) > even = size * q > for i in range(0, even, size): > yield s[i:i+size] > if size != even: > yield s[even:] + fill * (size - r) > > print(list(strchunk_fill('ABCDEFG', 3, 'x'))) > # > ['ABC', 'DEF', 'Gxx'] Also deducible from seqchunk_odd(s, size) > Because the 'fill' value is necessarily a sequence for strings, > strchunk_fill would only work for lists and tuples if the fill value were > either required to be given as a tuple or list of length 1 or if it were > internally converted inside the function. Skipping that for now. > > Having written the fill version based on the even version, it is easy to > select among the three behaviors by modifying the fill version. > > def strchunk(s, size, fill=NotImplemented): > # fill odd chunks > q, r = divmod(len(s), size) > even = size * q > for i in range(0, even, size): > yield s[i:i+size] > if size != even and fill is not NotImplemented: > yield s[even:] + fill * (size - r) > > print(*strchunk('ABCDEFG', 3)) > print(*strchunk('ABCDEFG', 3, '')) > print(*strchunk('ABCDEFG', 3, 'x')) > # > ABC DEF > ABC DEF G > ABC DEF Gxx I now don't even think that fill value is needed as argument. if len(chunk) < size: chunk.extend( [fill] * ( size - len(chunk)) ) > I already described how something similar could be done by checking each > grouper output tuple for a fill value, but that requires that the fill value > be a sentinal that could not otherwise appear in the tuple. One could modify > grouper to fill with a private object() and check the last item of each > group for that sentinal and act accordingly (delete, truncate, or replace). > A generic api needs some thought, though. I just need to chunk strings and sequences. Generic API is too complex without counting all usecases and iterating over them. > An issue I did not previously mention is that people sometimes want > overlapping chunks rather than contiguous disjoint chunks. The slice > approach trivially adapts to that. > > def seqlap(s, size): > for i in range(len(s)-size+1): > yield s[i:i+size] > > print(*seqlap('ABCDEFG', 3)) > # > ABC BCD CDE DEF EFG > > A sliding window for a generic iterable requires a deque or ring buffer > approach that is quite different from the zip-longest -- grouper approach. That's why I'd like to drastically reduce the scope of proposal. itertools doesn't seem to be the best place anymore. How about sequence method? string.chunks(size) -> ABC DEF G list.chunks(size) -> [A,B,C], [C,D,E],[G] If somebody needs a keyword argument - this can come later without breaking compatibility. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com