Re: [plug] Open-source database evaluation

Michael Peligro Wed, 09 Oct 2002 02:31:11 -0700

On Wednesday 09 October 2002 11:56 am, Federico Sevilla III wrote:

> I do not yet manage very large databases, but have noticed that


Our Progress 7.2 database can balloon to about 2 gigabytes every three months.

I'll try to do the math for approximation of the number of records the 
Progress database can hold. Maybe it can give you an idea whether PostgreSQL 
can stand the same stress.

Assumption: 

One item transaction every second (one scanner swipe per second)

In one minute, a cashier can process 60 items, in one hour, 3,600 items.
If the counter is open for 12 hours each day, that's about 43,200 items a day 
per counter.

A counter is not very busy most of the time. There are times when there are no 
customers. I think it's safe to assume that a counter is idle for 2 hours a 
day, so it has about 16.7% idle time. So it's safe to assume about 35,999 
items processed in a single day per counter.

Multiply these by 70 grocery counters and you get about 2,519,899 sales 
records per day.

The remaining 80 counters are dry-goods (shoes, dresses, textile, school 
supplies, etc..) counters. The total item take per counter is only 1/3 of the 
grocery counters so its 35,999 x 0.333 x 80 = 959,962 items.

All counters therefore committed about 959,962 + 2,519,899 individual sales 
records to the database for a total of 3,479,861 records per day.

Add item master files for about 15,000 unique grocery and dry-goods items (or 
SKUs - self-keeping units), plus records for price changes (about 1,000 per 
day) and you'd have a grand total of 3,495,861 records per day.

In a month's time, these records will reach about 104,875,824, or 314,627,472 
records in 3 months (assuming 30 day months).

We use a single database for all the counters' sales records. The backend 
database is separate (which include tables for delivery reports, supplier 
info, accounts payables, etc...)

1. Can a PostgreSQL database grow to store large records without taking a 
performance hit?

During peak-hours and especially during a "SALE" event, all the 150 client POS 
counters are busy accessing only one database. The Progress database on NT4.0 
on Proliant PII-450-128MB RAM stands up to the stress well with no visible 
performance hit at all. Our pricing office staff can even query the item 
masterfile and print long reports at the same time from it. She doesn't 
batch-update the item masterfile with price changes after work or early in 
the morning when the database load is light. She does it in realtime.

2. Can PostgreSQL on Linux on the same hardware handle the load efficiently 
such as this? 

> PostgreSQL on Debian does daily runs of a maintenance script that
> analyzes all indices and optimizes them as necessary to keep indexed
> queries running at optimum.

This is excellent! Reindexing in Progress is done manually, but reindexing a 
2gig database takes about 30 minutes or less.

> From the PostgreSQL documentation:
>
>      Unlike most other database systems which use locks for concurrency
>      control, PostgreSQL  maintains data consistency by using a
>      multiversion model. This means that while querying a database each
>      transaction sees a snapshot of data (a database version) as it was
>      some time ago, regardless of the current state of the underlying
>      data. This protects the transaction from viewing inconsistent data
>      that could be caused by (other) concurrent transaction updates on
>      the same data rows, providing transaction isolation  for each
>      database session.

Wow, really excellent. I don't know what databases they refer to when they 
said "Unlike most other database systems which use locks for concurrency 
control", but I guess it does exclude Progress.

This is the same method used in Progress with NO-LOCKs and EXCLUSIVE-LOCKs. If 
a record is EXCLUSIVE-LOCKed in Progress, the transaction sees a snapshot of 
data as it was some time ago. The effect is that others can still read the 
locked record via the snapshot. The rest of the documentation exactly details 
the way it is also done in Progress. Hmmm, interesting.


>      The main difference between multiversion and lock models is that in
>      MVCC locks acquired for querying (reading) data don't conflict with
>      locks acquired for writing data and so reading never blocks writing
>      and writing never blocks reading.

Clever guys these PostgreSQL hackers are. 

Progress has successfully implemented this multiversion inside a locking 
paradigm, and the effect is the same, reading and writing never interferes 
with each other. What's even interesting is that these functions are expertly 
encapsulated in two "cute" 4GL statements, NO-LOCK and EXCLUSIVE-LOCK. 
Appending either statement to FIND statements permits you to choose whether 
to read, or read then lock the record for updating, yet allow others to read 
the locked record in the same manner as multiversioning snapshots in 
PostgreSQL. 

The technical feat of encapsulating this functionality in an easy-to-read (and 
use) 4GL statement makes Progress application development an excellent RAD 
tool. How I wish I can read the source code for SQL-Ledger like I was reading 
Progress 4GL statements. 

I'm beginning to suspect the original hackers who developed the Progress 
database are now coders for PostgreSQL. It's also interesting to note the 
wordplay between words Progress and Postgre (post-Progress?). Another urban 
legend? Probably :-).


> PostgreSQL is supposed to do this, but have not personally tested
> killing our server while in the middle of a transaction. From the
> documentation:
>
>      Write Ahead Logging (WAL) is a standard approach to transaction
>      logging. Its detailed description may be found in most (if not all)
>      books about transaction processing. Briefly, WAL's central concept
>      is that changes to data files (where tables and indexes reside)
>      must be written only after those changes have been logged - that
>      is, when log records have been flushed to permanent storage. When
>      we follow this procedure, we do not need to flush data pages to
>      disk on every transaction commit, because we know that in the event
>      of a crash we will be able to recover the database using the log:
>      any changes that have not been applied to the data pages will first
>      be redone from the log records (this is roll-forward recovery, also
>      known as REDO) and then changes made by uncommitted transactions
>      will be removed from the data pages (roll-backward recovery -
>      UNDO).

Bingo! Another PostgreSQL technology that behaves exactly the same way in 
Progress.

I don't have the technical skills to fully understand how Progress does this, 
but our CIO explained this to me in a "Progress Transactions for Dummies" 
way:

"During runtime, Progress creates a temporary file called <database-name>.log. 
Flushing also occurs. Progress creates a <database-name>.bi file where the 
records are temporarily logged, or flushed as binary-data (don't know much 
about this, but this is what he said) before updating the main database in 
use. This flushing to <database-name>.bi (before-imaging) is done fast. In 
the event of a database crash, the .bi file is used as a reference to recover 
the database using both .log and .bi files. User record-locking logs are 
recording in another file <database-name>.lk.

Another interesting quirk.

Whenever we need to use temporary variables in Progress 4GL, we have the 
option to append the word NO-UNDO during variable declaration.

DEFINE VARIABLE vwprice LIKE database-table.field-name NO-UNDO.

Defining variables with NO-UNDO tells Progress not to log the variable's 
values to the .bi file. Disabling roll-backward recovery or UNDO for 
temporary working variables can speed up an applications. We enable the UNDO 
function only in fields or variables that needs to recover it's data from the 
.bi after a crash.

Thanks for the very valuable inputs!

Going back to that "urban legend" -- PostgreSQL is a Progress-fork thing...

:-)

---

mikol

"There is no concept more closer to intellectual emancipation than free 
software. Freedom to responsibly code and share in its most free and pure 
form."                          -- Floyd Robinson,  September 24, 2002
_
Philippine Linux Users Group. Web site and archives at http://plug.linux.org.ph
To leave: send "unsubscribe" in the body to [EMAIL PROTECTED]

Fully Searchable Archives With Friendly Web Interface at http://marc.free.net.ph

To subscribe to the Linux Newbies' List: send "subscribe" in the body to 
[EMAIL PROTECTED]

Re: [plug] Open-source database evaluation

Reply via email to