ronrsr wrote: > I have a single long string - I'd like to split it into a list of > unique keywords. Sadly, the database wasn't designed to do this, so I > must do this in Python - I'm having some trouble using the .split() > function, it doesn't seem to do what I want it to - any ideas? > > thanks very much for your help. > > r-sr- > > > longstring = '<SNIP rather long string>'
What do you want it to do? Split on each semicolon? a = longstring.split(";") for element in a: print element Agricultural subsidies Foreign aidAgriculture Sustainable Agriculture - Support Organic Agriculture Pesticides, US,Childhood Development, Birth Defects Toxic ChemicalsAntibiotics,AnimalsAgricultural Subsidies, Global TradeAgriculturalSubsidiesBiodiversityCitizen ActivismCommunityGardensCooperativesDietingAgriculture, CottonAgriculture, GlobalTradePesticides, MonsantoAgriculture, SeedCoffee, HungerPollution,Water, FeedlotsFood PricesAgriculture, WorkersAnimal Feed, Corn,PesticidesAquacultureChemicalWarfareCompostDebtConsumerismFearPesticides, US, Childhood Development,Birth DefectsCorporate Reform, Personhood (Dem. Book)Corporate Reform, Personhood, Farming (Dem. Book)Crime Rates, Legislation,EducationDebt, Credit CardsDemocracyPopulation, WorldIncomeDemocracy,Corporate Personhood, Porter Township (Dem. Book)DisasterReliefDwellings, SlumsEconomics, MexicoEconomy, LocalEducation,ProtestsEndangered Habitat, RainforestEndangered SpeciesEndangeredSpecies, Extinctionantibiotics, livestockAgricultural subsidies Foreign aid Agriculture Sustainable Agriculture - Support OrganicAgriculture Pesticides, US, Childhood Development, Birth Defects Toxic Chemicals <etc.> I think the problem arises because your string has the following problems: 1.) Inconsistent spaces between words (some are non-existent) 2.) Inconsistent separators between elements (sometimes semi-colons, sometimes commas, but commas appear to belong to elements, sometimes no clear separator at all) Basically, this problem is not solvable by computer with currently available resources. There is no way Python or anything else can know which words are meant to be together and which are not, when there are no separators between elements and no separators between words within those elements. You need to find a new way of generating the string, or do it by hand. How did you get the string? Cameron. -- http://mail.python.org/mailman/listinfo/python-list