On 03/31/2013 02:41 PM, Roy Smith wrote:
In article <mailman.4023.1364751102.2939.python-l...@python.org>,
  Dave Angel <da...@davea.name> wrote:

On 03/31/2013 12:52 PM, C.T. wrote:
On Sunday, March 31, 2013 12:20:25 PM UTC-4, zipher wrote:
  <SNIP>


Thank you, Mark! My problem is the data isn't consistently ordered. I can
use slicing and indexing to put the year into a tuple, but because a car
manufacturer could have two names (ie, Aston Martin) or a car model could
have two names(ie, Iron Duke), its harder to use slicing and indexing for
those two.  I've added the following, but the output is still not what I
need it to be.

So the correct answer is "it cannot be done," and an explanation.

Many times I've been given impossible conditions for a problem.  And
invariably the correct solution is to press [back] on the supplier of the
constraints.

In real life, you often have to deal with crappy input data (and bogus
project requirements).  Sometimes you just need to be creative.

There's only a small set of car manufacturers.  A good start would be
mining wikipedia's [[List of automobile manufacturers]].  Once you've
got that list, you could try matching portions of the input against the
list.

Depending on how much effort you wanted to put into this, you could
explore all sorts of fuzzy matching (ie "delorean" vs "delorean motor
company"), but even a simple search is better than giving up.

And, this is a good excuse to explore some of the interesting
third-party modules.  For example, mwclient ("pip install mwclient")
gives you a neat Python interface to wikipedia.  And there's a whole
landscape of string matching packages to explore.

We deal with this every day at Songza.  Are Kesha and Ke$ha the same
artist?  Pushing back on the record labels to clean up their catalogs
isn't going to get us very far.


I agree with everything you've said, although in your case, presumably the record labels are not your client/boss, so that's not who you push back against. The client should know when the data is being fudged, and have a say in how it's to be done.

But this is a homework assignment. I think the OP is learning Python, not how to second-guess a client.


--
DaveA
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to