Re: [SQL] [pgsql-sql] Daily digest v1.2492 (19 messages)

Steve Midgley Tue, 03 Apr 2007 11:39:06 -0700

Hi John,

It sounds like a disk-bound operation, so cpu is not maxed out. I'm notclear on all the details of your operation but it sounds like you'reusing Java to do row-by-row based inserts, selects and updates within atransaction, from a file. This can be a very slow process if you havemany rows. The OS stats you describe fits that theory (but notconclusively).


If you are using (psuedo-)code such as:

Open file {
  Read line {

select from Pg: "select from [other_table] where val =[line[colN]]"exec to Pg: "insert into [table] (col1, col2, ...) values(line[col1], line[col2]..."

}
}

You can radically speed up such a system by using the "copy"(http://www.postgresql.org/docs/8.2/interactive/sql-copy.html) commandto load all the data at once from the file into Pg and then dopost-processing with Java/SQL to get all the fields looking right.Doing a bulk update with a join across several tables is so much fasterthan looping through them with a wrapper in Java (or other lang) youwon't believe it.


I hope this helps and is on-topic for you.

Steve

At 09:38 AM 4/3/2007, [EMAIL PROTECTED] wrote:

Date: Tue, 03 Apr 2007 22:16:13 +0800
From: John Summerfield <[EMAIL PROTECTED]>
To: [email protected]
Subject: A long-running transaction
Message-ID: <[EMAIL PROTECTED]>

I have a Java (java 1.1) program that I wrote some years ago, to read
records from a text file and insert it into a ostgresql database.

One of the assumptions I made was that one file contained one day's
data, maybe as many as 1500 records, and I coded it to do the wholelotas one transaction so either a single file was loaded in its entirity,
or none of its data was.
I lost the Java code, but revived the idea and I've collected abouttwoyears' data using (Linux) shell scripts, and loading the data usingpsql.
Then, I found the Java code on a disused hard disk:-)
I made the necessary changes for it to build in java 1.5, and usedpsqlto extract data from my new database in the correct format for the oldprogram. This time, I have a little more data than I ever loaded atonce
before:
[EMAIL PROTECTED]:~$ wc -l testdata
6242217 testdata
[EMAIL PROTECTED]:~$ \ls -hl testdata
-rw-r--r-- 1 summer summer 285M 2007-03-28 22:32 testdata
[EMAIL PROTECTED]:~$
Now, it wouldn't surprise me if postgresql used lots of memory - buthow
much could it possibly need? My laptop, where I first tried this, has
1.25 Gbytes, so I could allow it some.
It wouldn't surprise me a lot if it used lots of memory and caused allsorts of problems, but no, it's chugging away, still using no more RAM
than it could have had on my old Pentium 133 all those years ago.
In the process of checking it out, I've set it running on a machinewith
a AMD Sempron(tm)   2400+ running Kubuntu 6.10 (kernel is
2.6.17-6-server-xen0) and 512 Mbytes of RAM.

This is the java program:-)
summer   pts/6    :0.0             Thu20    5days  1:07   1:07
/usr/bin/gij-4.1 -cp /usr/s
It's been running five days so far, and I can see where it's up to by
attaching strace. It's reading 2k of the input file every few seconds.
Okay, clearly something's wrong, and I don't think it's all my crdducode.
No probs swapping:
[EMAIL PROTECTED]:~$ free
total used free shared bufferscachedMem: 460800 456472 4328 0 860262164
-/+ buffers/cache:     193448     267352
Swap:      1461872        284    1461588
[EMAIL PROTECTED]:~$
It is hitting the disk pretty hard now on this machine, but thelaptop'sstill going too, and the disk seems to run about half the time, partof
a second running, part idle (but the intervals are getting shorter).

It struck me as fairly curious that neither postgresql nor the
application was hogging the CPU.
Perhaps the laptop is more interesting: look at the size of the bufferpool:
[EMAIL PROTECTED]:~> free
total used free shared bufferscachedMem: 1295528 1268548 26980 0 3976392388
-/+ buffers/cache:     872184     423344
Swap:      1941496      32656    1908840
[EMAIL PROTECTED]:~>
Again, no problem with over-use of RAM, and I'm logged on using KDEtoo
and that's running fine.

It's been running a little longer here:
summer   pts/23    28Mar07  5days 25:12  25:11  java -cp
/home/summer/Classes/:/usr/share/p

This is Sun's Java 1.5 on OpenSUSE 10.2.


This is what suggested I should write:
[EMAIL PROTECTED]:~> procinfo
Linux 2.6.18.8-0.1-default ([EMAIL PROTECTED]) (gcc 4.1.2 20061115) #1
1CPU [Echidna.]

Memory:      Total        Used        Free      Shared     Buffers
Mem:       1295528     1271720       23808           0        3716
Swap:      1941496       32656     1908840
Bootup: Tue Mar 27 18:50:19 2007 Load average: 2.21 2.65 2.69 2/24319305
user  :   1d  3:17:04.03  16.0%  page in :  131097310  disk 1:
3079516r20087664w
nice  :       0:05:39.64   0.1%  page out:  197016649
system:   2d 20:38:37.13  40.1%  page act:   87906251
IOwait:   2d  0:46:37.33  28.5%  page dea:   16218135
hw irq:       0:44:46.71   0.4%  page flt:  306255213
sw irq:       0:50:04.69   0.5%  swap in :       4026
idle  :   1d  0:36:29.73  14.4%  swap out:       9552
uptime:   7d  2:59:20.97         context :  702502442

irq  0: 153880209 timer                 irq  7:         0 parport0
irq  1:     69402 i8042                 irq  8:         2 rtc
irq  2:         0 cascade [4]           irq  9:   1696942 acpi
irq  3:         4                       irq 10:         1
irq 4: 4 irq 11: 71842329ehci_hcd:usb1,
uhci_
irq  5:  28545863 Intel 82801DB-ICH4    irq 12:    467432 i8042
irq  6:         1                       irq 14:  25021586 ide0

[EMAIL PROTECTED]:~>


Look at that line beginning "System:" two days 20 hours in the Linux
kernel. It's my guess that the Linux kernel is spending a great dealof
time manipulating that buffer pool.

This shows postgresql taking 60% CPU:
[EMAIL PROTECTED]:~> ps xaru
USER PID %CPU %MEM VSZ RSS TTY STAT START TIMECOMMAND
summer   20237  0.2  1.2 263716 15988 pts/23   Rl+  Mar28  25:11 java
-cp /home/summer/Class
postgres 19321  0.1  0.3  19844  3984 ?        D    21:50   0:00
postgres: summer stocksshar
summer 19344 0.0 0.0 2484 852 pts/25 R+ 21:52 0:00 psxaru
[EMAIL PROTECTED]:~> ps xaru
USER PID %CPU %MEM VSZ RSS TTY STAT START TIMECOMMAND
postgres 20248 60.0  0.8  20760 11536 ?        D    Mar28 5167:27
postgres: summer sharetrad
postgres 19321  0.1  0.3  19844  4008 ?        D    21:50   0:00
postgres: summer stocksshar
summer 19348 0.0 0.0 2480 848 pts/25 R+ 21:52 0:00 psxaru
[EMAIL PROTECTED]:~>

Note that pid 20248 is the one, the other has nothing to do with the
problem.

This is the SQL I used to create the table:
       String createTableStatement = "create table TJS0102_trades ("
                   + "   trade date"
                   + ",  ttime int2"
                   + ",  ASX varchar(7)"
                   + ",  thigh int4"
                   + ",  tlow int4"
                   + ",  tclose int4"
                   + ",  topen int4"
                   + ",  tvolume int4"
                   + ",  tvalue int4"
//                 + ")"
                   + ",  unique(ASX,trade,ttime)"
                   + ");";
For each record, I update a non-key field in another table; the source
data for that other table is less than a megabyte.


I have changed the program so as to load each day's data as a single
transaction; it ran on a Pentium IV 2.8 Ghz with HT, 512 Mbytes inabout
three and an half hours. This is more-or-less what I expected.

If you agree with me that postgresql should do better, now is a good
time to ask for this as a test case.
Note, you could also get suitable test data from float.com.au - it'snot
where mine came from, but it should be substantially the same.



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Re: [SQL] [pgsql-sql] Daily digest v1.2492 (19 messages)

Reply via email to