On 2/26/15 7:53 PM, memilanuk wrote:
So... okay.  I've got a bunch of PDFs of tournament reports that I want
to sift thru for information.  Ended up using 'pdftotext -layout
file.pdf file.txt' to extract the text from the PDF.  Still have a few
little glitches to iron out there, but I'm getting decent enough results
for the moment to move on.

...
So back to the lines of text I have stored as strings in a list.  I
think I want to convert that to a list of lists, i.e. split each line
up, store that info in another list and ditch the whitespace.  Or would
I be better off using dicts?  Originally I was thinking of how to
process each line and split it them up based on what information was
where - some sort of nested for/if mess.  Now I'm starting to think that
the lines of text are pretty uniform in structure i.e. the same field is
always in the same location, and that list slicing might be the way to
go, if a bit tedious to set up initially...?

Any thoughts or suggestions from people who've gone down this particular
path would be greatly appreciated.  I think I have a general
idea/direction, but I'm open to other ideas if the path I'm on is just
blatantly wrong.

It sounds to me as if the best way to handle all this is keep the information it in a database, preferably one available from the network and centrally managed, so whoever enters the information in the first place enters it there. But I admit that setting such a thing up requires some overhead.

Simpler alternatives include using SQLite, a simple file-based database system, or numpy structured arrays (arrays with named fields). Python includes a standard library module for sqlite and numpy is easy to install.

-- Russell

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to