On Wednesday 09 October 2002 11:56 am, Federico Sevilla III wrote: > I do not yet manage very large databases, but have noticed that
Our Progress 7.2 database can balloon to about 2 gigabytes every three months. I'll try to do the math for approximation of the number of records the Progress database can hold. Maybe it can give you an idea whether PostgreSQL can stand the same stress. Assumption: One item transaction every second (one scanner swipe per second) In one minute, a cashier can process 60 items, in one hour, 3,600 items. If the counter is open for 12 hours each day, that's about 43,200 items a day per counter. A counter is not very busy most of the time. There are times when there are no customers. I think it's safe to assume that a counter is idle for 2 hours a day, so it has about 16.7% idle time. So it's safe to assume about 35,999 items processed in a single day per counter. Multiply these by 70 grocery counters and you get about 2,519,899 sales records per day. The remaining 80 counters are dry-goods (shoes, dresses, textile, school supplies, etc..) counters. The total item take per counter is only 1/3 of the grocery counters so its 35,999 x 0.333 x 80 = 959,962 items. All counters therefore committed about 959,962 + 2,519,899 individual sales records to the database for a total of 3,479,861 records per day. Add item master files for about 15,000 unique grocery and dry-goods items (or SKUs - self-keeping units), plus records for price changes (about 1,000 per day) and you'd have a grand total of 3,495,861 records per day. In a month's time, these records will reach about 104,875,824, or 314,627,472 records in 3 months (assuming 30 day months). We use a single database for all the counters' sales records. The backend database is separate (which include tables for delivery reports, supplier info, accounts payables, etc...) 1. Can a PostgreSQL database grow to store large records without taking a performance hit? During peak-hours and especially during a "SALE" event, all the 150 client POS counters are busy accessing only one database. The Progress database on NT4.0 on Proliant PII-450-128MB RAM stands up to the stress well with no visible performance hit at all. Our pricing office staff can even query the item masterfile and print long reports at the same time from it. She doesn't batch-update the item masterfile with price changes after work or early in the morning when the database load is light. She does it in realtime. 2. Can PostgreSQL on Linux on the same hardware handle the load efficiently such as this? > PostgreSQL on Debian does daily runs of a maintenance script that > analyzes all indices and optimizes them as necessary to keep indexed > queries running at optimum. This is excellent! Reindexing in Progress is done manually, but reindexing a 2gig database takes about 30 minutes or less. > From the PostgreSQL documentation: > > Unlike most other database systems which use locks for concurrency > control, PostgreSQL maintains data consistency by using a > multiversion model. This means that while querying a database each > transaction sees a snapshot of data (a database version) as it was > some time ago, regardless of the current state of the underlying > data. This protects the transaction from viewing inconsistent data > that could be caused by (other) concurrent transaction updates on > the same data rows, providing transaction isolation for each > database session. Wow, really excellent. I don't know what databases they refer to when they said "Unlike most other database systems which use locks for concurrency control", but I guess it does exclude Progress. This is the same method used in Progress with NO-LOCKs and EXCLUSIVE-LOCKs. If a record is EXCLUSIVE-LOCKed in Progress, the transaction sees a snapshot of data as it was some time ago. The effect is that others can still read the locked record via the snapshot. The rest of the documentation exactly details the way it is also done in Progress. Hmmm, interesting. > The main difference between multiversion and lock models is that in > MVCC locks acquired for querying (reading) data don't conflict with > locks acquired for writing data and so reading never blocks writing > and writing never blocks reading. Clever guys these PostgreSQL hackers are. Progress has successfully implemented this multiversion inside a locking paradigm, and the effect is the same, reading and writing never interferes with each other. What's even interesting is that these functions are expertly encapsulated in two "cute" 4GL statements, NO-LOCK and EXCLUSIVE-LOCK. Appending either statement to FIND statements permits you to choose whether to read, or read then lock the record for updating, yet allow others to read the locked record in the same manner as multiversioning snapshots in PostgreSQL. The technical feat of encapsulating this functionality in an easy-to-read (and use) 4GL statement makes Progress application development an excellent RAD tool. How I wish I can read the source code for SQL-Ledger like I was reading Progress 4GL statements. I'm beginning to suspect the original hackers who developed the Progress database are now coders for PostgreSQL. It's also interesting to note the wordplay between words Progress and Postgre (post-Progress?). Another urban legend? Probably :-). > PostgreSQL is supposed to do this, but have not personally tested > killing our server while in the middle of a transaction. From the > documentation: > > Write Ahead Logging (WAL) is a standard approach to transaction > logging. Its detailed description may be found in most (if not all) > books about transaction processing. Briefly, WAL's central concept > is that changes to data files (where tables and indexes reside) > must be written only after those changes have been logged - that > is, when log records have been flushed to permanent storage. When > we follow this procedure, we do not need to flush data pages to > disk on every transaction commit, because we know that in the event > of a crash we will be able to recover the database using the log: > any changes that have not been applied to the data pages will first > be redone from the log records (this is roll-forward recovery, also > known as REDO) and then changes made by uncommitted transactions > will be removed from the data pages (roll-backward recovery - > UNDO). Bingo! Another PostgreSQL technology that behaves exactly the same way in Progress. I don't have the technical skills to fully understand how Progress does this, but our CIO explained this to me in a "Progress Transactions for Dummies" way: "During runtime, Progress creates a temporary file called <database-name>.log. Flushing also occurs. Progress creates a <database-name>.bi file where the records are temporarily logged, or flushed as binary-data (don't know much about this, but this is what he said) before updating the main database in use. This flushing to <database-name>.bi (before-imaging) is done fast. In the event of a database crash, the .bi file is used as a reference to recover the database using both .log and .bi files. User record-locking logs are recording in another file <database-name>.lk. Another interesting quirk. Whenever we need to use temporary variables in Progress 4GL, we have the option to append the word NO-UNDO during variable declaration. DEFINE VARIABLE vwprice LIKE database-table.field-name NO-UNDO. Defining variables with NO-UNDO tells Progress not to log the variable's values to the .bi file. Disabling roll-backward recovery or UNDO for temporary working variables can speed up an applications. We enable the UNDO function only in fields or variables that needs to recover it's data from the .bi after a crash. Thanks for the very valuable inputs! Going back to that "urban legend" -- PostgreSQL is a Progress-fork thing... :-) --- mikol "There is no concept more closer to intellectual emancipation than free software. Freedom to responsibly code and share in its most free and pure form." -- Floyd Robinson, September 24, 2002 _ Philippine Linux Users Group. Web site and archives at http://plug.linux.org.ph To leave: send "unsubscribe" in the body to [EMAIL PROTECTED] Fully Searchable Archives With Friendly Web Interface at http://marc.free.net.ph To subscribe to the Linux Newbies' List: send "subscribe" in the body to [EMAIL PROTECTED]
