"Shashwat Anand" <[email protected]> wrote
as to match all of them. The task is time-consuming but with every new
test-sets exceptions are becoming less and less. (There are .2 million
such
pages)
One final thing to try is to identify records where you *failed* to find
a match and re write them into an error file. The error file can then
be manually processed if need be.
You might also be able to clean up the error file by not writing lines
that you know to be non-useful. The resultant error file might then
show up some further patterns that you can exploit.
Its all about eliminating as much manual effort as possible and
making the manual work that is left over as easy as possible.
ie Accept that you won't ever get 100% success and aim to
minimise the pain as much as possible.
HTH,
--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor