In the previous mails I made a mistake, writing "MTuples/s" instead of "MDatums/s", sorry about that. It is the number of rows x columns. The title was wrong, but the data was right.

I've been doing some tests on COPY FROM ... BINARY.

- inlines in various pg_get* etc
- a faster buffer handling for copy
- that's about it...

In the below tables, you have "p17" (ie test patch 17, the last one) and straight postgres compared.

COPY annonces_2 FROM 'annonces.bin' BINARY  :
  Time | Speedup |  Table |  KRows | MDatums | Name
   (s) |         |   MB/s |     /s |      /s |
-------|---------|--------|--------|---------|----------------------------------------------------
 8.417 |  1.40 x |  38.70 |  49.13 |    1.92 | 8.4.0 / p17
11.821 |     --- |  27.56 |  34.98 |    1.36 | 8.4.0 / compiled from source


COPY archive_data_2 FROM 'archive_data.bin' BINARY  :
  Time | Speedup | Table |  KRows | MDatums | Name
   (s) |         |  MB/s |     /s |      /s |
-------|---------|-------|--------|---------|----------------------------------------------------
15.314 | 1.93 x | 25.94 | 172.88 | 4.84 | 8.4.0 / p17 COPY FROM BINARY all
29.520 |     --- | 13.46 |  89.69 |    2.51 | 8.4.0 / compiled from source


COPY test_one_int_2 FROM 'test_one_int.bin' BINARY  :
  Time | Speedup |  Table |   KRows | MDatums | Name
   (s) |         |   MB/s |      /s |      /s |
-------|---------|--------|---------|---------|----------------------------------------------------
10.003 | 1.39 x | 30.63 | 999.73 | 1.00 | 8.4.0 / p17 COPY FROM BINARY all 13.879 | --- | 22.08 | 720.53 | 0.72 | 8.4.0 / compiled from source


COPY test_many_ints_2 FROM 'test_many_ints.bin' BINARY  :
  Time | Speedup | Table |  KRows | MDatums | Name
   (s) |         |  MB/s |     /s |      /s |
-------|---------|-------|--------|---------|----------------------------------------------------
6.009 | 2.08 x | 21.31 | 166.42 | 4.33 | 8.4.0 / p17 COPY FROM BINARY all
12.516 |     --- | 10.23 |  79.90 |    2.08 | 8.4.0 / compiled from source


I thought it might be interesting to get split timings of the various steps in COPY FROM, so I simply commented out bits of code and ran tests.

The "delta" columns are differences between two lines, that is the time taken in the step mentioned on the right.


reading data only = reading all the data and parsing it into chunks, doing everything until the RecvFunc is called.
RecvFuncs = same, + RecvFunc is called
heap_form_tuple = same + heap_form_tuple is called
triggers = same + triggers are applied
insert = actual tuple insertion
p17 = total time (post insert triggers, constraint check, etc)

  Time | Delta | Row delta | Datum delta | Name
   (s) |   (s) |      (us) |        (us) |
-------|-------|-----------|-------------|----------------------
 1.311 |   --- |       --- |         --- | reading data only
 4.516 | 3.205 |     7.750 |       0.199 | RecvFuncs
 4.534 | 0.018 |     0.043 |       0.001 | heap_form_tuple
 5.323 | 0.789 |     1.908 |       0.049 | triggers
 8.182 | 2.858 |     6.912 |       0.177 | insert
 8.417 | 0.236 |     0.570 |       0.015 | p17


COPY archive_data_2 FROM 'archive_data.bin' BINARY  :
  Time |  Delta | Row delta | Datum delta | Name
   (s) |    (s) |      (us) |        (us) |
-------|--------|-----------|-------------|---------------------
 4.729 |    --- |       --- |         --- | reading data only
 8.508 |  3.778 |     1.427 |       0.051 | RecvFuncs
 8.567 |  0.059 |     0.022 |       0.001 | heap_form_tuple
10.804 |  2.237 |     0.845 |       0.030 | triggers
14.475 |  3.671 |     1.386 |       0.050 | insert
15.314 |  0.839 |     0.317 |       0.011 | p17


COPY test_one_int_2 FROM 'test_one_int.bin' BINARY  :
  Time | Delta | Row delta | Datum delta | Name
   (s) |   (s) |      (us) |        (us) |
-------|-------|-----------|-------------|----------------------
 1.247 |   --- |       --- |         --- | reading data only
 1.745 | 0.498 |     0.050 |       0.050 | RecvFuncs
 1.750 | 0.004 |     0.000 |       0.000 | heap_form_tuple
 3.114 | 1.364 |     0.136 |       0.136 | triggers
 9.984 | 6.870 |     0.687 |       0.687 | insert
10.003 | 0.019 |     0.002 |       0.002 | p17


COPY test_many_ints_2 FROM 'test_many_ints.bin' BINARY  :
  Time | Delta | Row delta | Datum delta | Name
   (s) |   (s) |      (us) |        (us) |
-------|-------|-----------|-------------|----------------------
 1.701 |   --- |       --- |         --- | reading data only
 3.122 | 1.421 |     1.421 |       0.055 | RecvFuncs
 3.129 | 0.008 |     0.008 |       0.000 | heap_form_tuple
 3.754 | 0.624 |     0.624 |       0.024 | triggers
 5.639 | 1.885 |     1.885 |       0.073 | insert
 6.009 | 0.370 |     0.370 |       0.014 | p17

We can see that :

- reading and parsing the data is still slow (actually, everything is copied something like 3-4 times)
- RecvFuncs take quite long, too
- triggers use some time, although the table has no triggers ....? This is suspicious... - the actual insertion (which is really what we are interested in when loading a table) is actually very fast

Ideally in COPY the insertion time in the table should take most of the time used in the operation...

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to