[EMAIL PROTECTED] wrote: > actually for the example i have used only one sentry condition by they > are more numerous and complex, also i need to work on a huge amount on > data (each word are a line with many features readed from a file) An open (text) file is a line-based iterator that can be fed directly to 'chunker'. As for different sentry conditions, I imagine they can be coded in either model. How much is a 'huge amount' of data?
> oops > >> to have: >> >> this . >> this . is a . >> this . is a . test to . >> is a . test to . check if it . >> test to . check if it . works . >> check if it . works . well . >> works . well . it looks like . > well . it looks like . > it looks like . > Here's a small update to the generator that allows optional handling of the head and the tail: def chunker(s, chunk_size=3, sentry=".", keep_first = False, keep_last = False): buffer=[] sentry_count = 0 for item in s: buffer.append(item) if item == sentry: sentry_count += 1 if sentry_count < chunk_size: if keep_first: yield buffer else: yield buffer del buffer[:buffer.index(sentry)+1] if keep_last: while buffer: yield buffer del buffer[:buffer.index(sentry)+1] >>> for p in chunker(s.split(), keep_first = True, keep_last=True): print " ".join(p) ... this . this . is a . this . is a . test to . is a . test to . check if it . test to . check if it . works . check if it . works . well . works . well . it looks like . well . it looks like . it looks like . >>> -- http://mail.python.org/mailman/listinfo/python-list