Hi all,

My own solution works but I'm sure it could be simpler or read better. How 
would you do it?

Say you've got a list of companies:

Aerosonde Ltd
Amcor
ANCA
Austal Ships
Australia Post
Australian Air Express
Australian Defence Industries
Australian Railroad Group
Australian Submarine Corporation

and you need to extract phrases from the company names that uniquely identify 
that company. The results for the above list of companies should be:

Company: 'Aerosonde Ltd'
 Aliases: Aerosonde,Ltd,Aerosonde Ltd

Company: 'Amcor'
 Aliases: Amcor

Company: 'ANCA'
 Aliases: ANCA

Company: 'Austal Ships'
 Aliases: Austal,Ships,Austal Ships

Company: 'Australia Post'
 Aliases: Post,Australia Post

Company: 'Australian Air Express'
 Aliases: Air,Express,Australian Air,Air Express,Australian Air Express

Company: 'Australian Defence Industries'
 Aliases: Defence,Industries,Australian Defence,Defence Industries,Australian 
Defence Industries

Company: 'Australian Railroad Group'
 Aliases: Railroad,Group,Australian Railroad,Railroad Group,Australian Railroad 
Group

Company: 'Australian Submarine Corporation'
 Aliases: Submarine,Corporation,Australian Submarine,Submarine 
Corporation,Australian Submarine Corporation

Here's my solution:

from itertools import combinations, chain

companies = [
    "Aerosonde Ltd",
    "Amcor",
    "ANCA",
    "Austal Ships",
    "Australia Post",
    "Australian Air Express",
    "Australian Defence Industries",
    "Australian Railroad Group",
    "Australian Submarine Corporation",
]

def flatten(i):
    return list(chain.from_iterable(i))

companies_as_text_stream = ' '.join(companies)
for company in companies:
        word_combinations = [list(combinations(company.split(), r)) for r in 
range(1, len(company))]
        phrases = [' '.join(phrase) for phrase in flatten(word_combinations)]
        unique_phrases = [phrase for phrase in phrases if 
companies_as_text_stream.count(phrase) == 1]
        aliases = ','.join(unique_phrases)
        print("Company: '{0}'\n Aliases: {1}\n".format(company, aliases))
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to