I use Postgres with a GUI frontend (Aquafold) as a very large spreadsheet on 
steroids that analyzes rare or defective spellings in a corpus of 65,000 texts 
and1.5 billion words.  I typically extract  data from the corpus with python 
scripts, turn them into tables and load them into the database.

On my Mac with 32 GB of memory performance is OK with queries that typically 
within seconds extract data rows from tables  with up to ten million rows.  If 
the result set is large, I suspect that most of time machine's time is spent 
displaying result sets. I have used indexing sparingly. While it helps, the 
time savings often don't matter much.

I am thinking about scaling up to table with about 60 million rows.  Are there 
things to do or watch out for? Or should I proceed on the assumption that that 
60 million records are within scope and that the added timecost is roughly 
linear?

Martin Mueller
Professor emeritus of English and Classics
Northwestern University



Reply via email to