On 12/4/2012 8:57 AM, Nick Mellor wrote:

I have a file full of things like this:

"CAPSICUM RED fresh from Queensland"

Product names (all caps, at start of string) and descriptions (mixed
case, to end of string) all muddled up in the same field. And I need
to split them into two fields. Note that if the text had said:

"CAPSICUM RED fresh from QLD"

I would want QLD in the description, not shunted forwards and put in
the product name. So (uncontrived) list comprehensions and regex's
are out.

I want to split the above into:

("CAPSICUM RED", "fresh from QLD")

Enter dropwhile and takewhile. 6 lines later:

from itertools import takewhile, dropwhile
def split_product_itertools(s):
>   words = s.split()
>   allcaps = lambda word: word == word.upper()
>   product, description =\
>       takewhile(allcaps, words), dropwhile(allcaps, words)
>   return " ".join(product), " ".join(description)

If the original string has no excess whitespace, description is what remains of s after product prefix is omitted. (Py 3 code)

from itertools import takewhile
def allcaps(word): return word == word.upper()

def split_product_itertools(s):
    product = ' '.join(takewhile(allcaps, s.split()))
    return product, s[len(product)+1:]

print(split_product_itertools("CAPSICUM RED fresh from QLD"))
>>>
('CAPSICUM RED', 'fresh from QLD')

Without that assumption, the same idea applies to the split list.

def split_product_itertools(s):
    words = s.split()
    product = list(takewhile(allcaps, words))
    return ' '.join(product), ' '.join(words[len(product):])

--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to