[HACKERS] Parallel Query

postgres-001 Thu, 01 Mar 2007 07:56:11 -0800

Hello,

I have a single Athlon 2.3MHz.  Then I bought a Netfinity 7600 with quad
Xeon 700MHz (for EUR 151 on Ebay, 2gb RAM included).  I worked over my
application to run queries in parallel.  I got a speedup from 13:37 min
(on the AMD) to 4:01 min (on the IBM).  With this email I am suggesting
the following change:


You may wish to try breaking up "union" operators into separate
processes that are executed in parallel.

In my application, I have a union of 12 queries.  I run the first and
simplest one as "create table ... as ..." and then run the remaining
ones as "insert into ... select ..." in parallel.  I realise that this
harder for you than it is for me. But perhaps you may wish to give it
some thought.  I have written a summary of my application for the
advocacy group, but will insert it here for your convenience.

Bye,

Oliver

Thank you!

I wish to thank the PostreSQL developers for their contribution.
PostgreSQL is an incredibly powerful system.

Please accept the below usage report as a modest contribution by
a thankful user.  I hope you can use it in your advocacy efforts
as a poster to PostgreSQL's prowess.

I am currently using PostgreSQL as the basis of a financial expert
system.  I built this system over the last 17 years.  In 1989 I
started a stock market program in AmigaBASIC.  In 1990 it was
transferred to WordPerfect Corporation's "PlanPerfect" spreadsheet.
In 1992 I built an implementation of Ted Codd's "a model for large,
shared databanks" without the ACID properties.  Basically, it was
an in-memory, single user, single threaded, C++ library for set
operations.  In 1996 I ported the whole thing to PostgreSQL and MAKE.

The system, called mny-cnv and not publicly released, uses "gnumake"
to allow for the building of several distinct targets:

verify.txt -- check integrity of financial data
cae.txt -- check accounts in EUR currency
fns.txt -- compute net-worth by virtually liquidating all assetts
irt.txt -- compute investment rentability

In the Makefile, each of the above filenames constitutes a target.
The file "verify.txt" is a list of "psql" output tables, while "cae",
"fns" and "irt" are generated through Perl with a reports library.

Each of the above target has a set of dependencies.  The dependencies
are files with the exact names of tables (and the filename-extension
".tab"), or files with the exact names of views (and the filename-
extension ".vie").

I generate 63 views and tables with a total of 318982 records that
way.  Clearly, the calls form a hierarchy, the bottom of which
consists of base tables.  These base tables are generated and populated
by a 'C' program, the core of which is formed by a flex/bison parser
that reads textfiles with many brackets and a bespoke record structure.

Over the last three years, the time for the generation of "cae.txt"
has decreased from 21 to 13 minutes wall-clock time, simply through
improvements in the optimiser.  There is hardly any disc activity,
since the working set of data is quite small.  The C++ program, by
comparison, took 34 minutes, was not as sophisticated in its portfolio
cost analysis, and computed only a single point in time.

Fundamentally, I use PostgreSQL as declarative programming language
and execution system for set-operations.  With "fns.txt", I compute,
for a set of dates (1st of each month, over a 20-year period), the
net-worth of our assetts after liquidation, with estimates for current
pricing, fees and depreciation for low-liquidity positions like real
estate.

I could have reached the further level of abstraction for computing
the figure for different dates by running the original C++ program
repeatedly (12*20=240 times, 34 minutes each, for a total of 5.6 days).

The fact, that I did write a library to implement Codd's operators,
should tell you something about the complexity of the issue and the
view that I took in how to best reduce the complexity by using higher-
level operators.  This is not strictly a PostgreSQL benefit, but the
optimiser definitely is.

The other results are:

With "verify.txt", I generate a list of money transactions that don't
add up.  Consider the system a generalised double-entry book-keeping
system.  If I use the credit card on a business trip to pay for the
limousine service, a negative entry is made in the credit card account,
a positive entry is made in the business-expenses account and a positive
entry is made in the credit-organisation account for the foreign
currency
fee.  These three items are filed together and must sum up to zero. 
Also,
I have accounts (about 150 in total) for different purposes: there's
DAX index certificates, or other types of derivatives, the Portfolio at
the bank, Cash-In and Cash-Out accounts, life insurance, etc.  Most of
these will add up to zero after liquidation.  Some of them must add up
to zero at all times, even without any automatically generated
transfers.
Think of this file as a report from unit testing.  Only are we verifying
conditions on a system of financial transfers.

"cae.txt" is a set of account statements in one long textfile (a 5mb
ASCII report).  My complete bank statements are part of this, but so
are all manner of other accounts, as hinted above.

"irt.txt" computes the rentability percentage on all accounts, taking
into consideration varying timescales and amounts.  The "expert system"
component implements methods of liquidation for each account type
separately.  There are liquidation methods for shares, funds, banking
accounts, outstanding bills on credit cards, life insurance and about
3 other local specialty forms of retirement planning.  On the assumption
that any investment performing above or below average will return to
the center, bail-out (for massive under-performers) or profit-taking
(for over-performers) can be done.  To be fair, the actual percentage
is worked out with a bisection algorithm written in about 40 lines of
python, but everything else is done in SQL.

Let me stress again, that this endeavour would not be feasible without
the abstraction of SQL, nor would it finish in any sensible time without
the absolutely excellent optimiser of PostgreSQL.

Let me just make the final remark, that the EURO-conversion used strange
rounding rules and a very specific accuracy, which would have been a
pain to try and do in C++.  In PostgreSQL, I use the numeric datatype
which effortlessly recreates a perfect match of half a dozen banks'
conversion efforts.  I did not have to perform a single corrective
booking
to "make things work".  In Perl, I must use the "Currency.pm" package,
which itself is based on "BigFloat" to perform reporting and formatting
in a satisfactory way.

Thank you very much to the PostgreSQL developers!  Keep up the good
work!

Oliver




---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

[HACKERS] Parallel Query

Reply via email to