I'm designing a system and wanted to get some feedback on a potential performance problem down the road while it is still cheap to fix.
The system is similar to an accounting system where a system tracks "Things" which move between different "Buckets". The system answers these questions: - How many Things are in this Bucket today? - Move Things from Bucket1 to Bucket2... - Now how many Things are in each Bucket? So each time a Thing is moved between Buckets, I imagine a DB row like this: | id | thingId | toBucket | fromBucket | qty | Then to find how many Things are in a certain Bucket: 1. Start with initial qty in the bucket at the beginning of time 2. *Add* all qty moved *to* the bucket since beginning of time 3. *Subtract* all qty moved *from* the bucket since beginning of time Simple system to "account" for Things. My problem is this. This design will inherantly get slower as time goes on. As the number of rows recording a transfer between buckets increases, the query to see how many Things are in a Bucket will get slower. I experience this when I use gnucash (which I love). I don't do "closing entries" at the end of the year, so each account has every transaction I have every made. I see it getting slower. It is nothing I am going to do anything about, because it is still fast enough for me. But I have to wonder how big companies with thousands of transactions a day do this? One solution would be to do a "closing entry" at certain periods in time, so old info would be archived. Each bucket would start the new time period with a balance of Things equal to what it was at the point in time we "closed". How else to keep a record of every transaction, but not have the speed of the question "How many Things in Bucket x" depend on looking @ every transaction record ever made? -- http://mail.python.org/mailman/listinfo/python-list