On 2014-12-25 17:59, Vincent Davis wrote: > These are vintage motorcycles so the "VIN's" are not like modern > VIN's these are frame numbers and engine number. > I don't want to parse the page, I what a function that given a VIN > (frame or engine number) returns the year the bike was made.
While I've done automobile VIN processing, I'm unfamiliar with motorcycle VIN processing. Traditional VINs consist of 17 alphanumeric (minus look-alike letters), and then certain offsets designate certain pieces of information (manufacturer, model, year, country of origin, etc). If you can describe what an actual VIN looks like and how you want to extract information from it, I'm sure folks here can come up with something. >From a rough reading of that URL you provided, it sounds like the format changed based on the year, so you would have to test against a a whole bunch of patterns to determine the year, and since some of the patterns overlap, you'd have to do subsequent checking. Something like import re def vin_to_year(vin): for pattern, min_val, max_val, year in [ (r'^(\d+)N$', 100, None, 1950), (r'^(\d+)NA$', 101, 15808, 1951), (r'^(\d+)$', 15809, 25000, 1951), (r'^(\d+)$', 25000, 32302, 1952), (r'^(\d+)$', 32303, 44134, 1953), (r'^(\d+)$', 44135, 56699, 1954), (r'^(\d+)$', 56700, 70929, 1955), # a whole bunch more like this ]: r = re.compile(pattern, re.I) m = r.match(vin) if m: if m.groups(): i = int(m.group(1)) if min_val is not None and i < min_val: continue if max_val is not None and i > max_val: continue return year # return DEFAULT_YEAR raise ValueError("Unable to determine year") Based on that link, it also looks like you might have to deal with partial-year ranges, so instead of just returning a year, you might need/want to return (start_year, start_month, end_year, end_month) instead of just the raw year. Thus, you'd have to update the table above to something like (r'^(\d+)$', 15809, 25000, 1951, 1, 1951, 12), (r'^KDA(\d+)$, None, None, 1980, 9, 1981, 4), (r'^K(\d+)$, None, None, 1974, 8, 1975, 7), # note that "K" has to follow "KDA" because of specificity and then iterate over for pattern, min_val, max_val, min_year, min_month, max_year, max_month in [ ... ]: # ... return (min_year, min_month, max_year, max_month) -tkc -- https://mail.python.org/mailman/listinfo/python-list