On 2/26/15 7:53 PM, memilanuk wrote:
So... okay. I've got a bunch of PDFs of tournament reports that I want to sift thru for information. Ended up using 'pdftotext -layout file.pdf file.txt' to extract the text from the PDF. Still have a few little glitches to iron out there, but I'm getting decent enough results for the moment to move on.
...
So back to the lines of text I have stored as strings in a list. I think I want to convert that to a list of lists, i.e. split each line up, store that info in another list and ditch the whitespace. Or would I be better off using dicts? Originally I was thinking of how to process each line and split it them up based on what information was where - some sort of nested for/if mess. Now I'm starting to think that the lines of text are pretty uniform in structure i.e. the same field is always in the same location, and that list slicing might be the way to go, if a bit tedious to set up initially...? Any thoughts or suggestions from people who've gone down this particular path would be greatly appreciated. I think I have a general idea/direction, but I'm open to other ideas if the path I'm on is just blatantly wrong.
It sounds to me as if the best way to handle all this is keep the information it in a database, preferably one available from the network and centrally managed, so whoever enters the information in the first place enters it there. But I admit that setting such a thing up requires some overhead.
Simpler alternatives include using SQLite, a simple file-based database system, or numpy structured arrays (arrays with named fields). Python includes a standard library module for sqlite and numpy is easy to install.
-- Russell -- https://mail.python.org/mailman/listinfo/python-list