Re: [PERFORM] [SQL] two queryes in a single tablescan

2007-10-20 Thread Markus Schaber
Hi, Stefano,

Stefano Dal Pra [EMAIL PROTECTED] wrote:

 suppose you have a large table tab and two (or more) queryes like this:
 
 SELECT count(*),A FROM tab WHERE C GROUP BY A;
 SELECT count(*),B FROM tab WHERE C GROUP BY B;
 
 is there any way to get both results in a single query,
 eventually through stored procedure?
 The retrieved [count(*),A] ; [count(*),B)] data couldnt fit
 on a single table, of course.
 
 The main goal would be to get multiple results while scanning the
 table[s] once only
 thus getting results in a faster  way.

PostgreSQL 8.3 contains great improvements in this area, you can simply
start the selects from concurrent connections, and the backend will
synchronize the scans.



Regards,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] PostgreSQL underestimates sorting

2006-11-22 Thread Markus Schaber
Hi,

PostgreSQL 8.1 (and, back then, 7.4) have the tendency to underestimate
the costs of sort operations, compared to index scans.

The Backend allocates gigs of memory (we've set sort_mem to 1 gig), and
then starts spilling out more Gigs of temporary data to the disk. So the
execution gets - in the end - much slower compared to an index scan, and
wastes lots of disk space.

We did not manage to tune the config values appropriately, at least not
without causing other query plans to suffer badly.

Are there some nice ideas how to shift the planners preferences slightly
towards index scans, without affecting other queries?

There's one thing that most of those queries have in common: They
include TOAST data (large strings, PostGIS geometries etc.), and I
remember that there are known problems with estimating the TOAST costs.
This may be part of the problem, or may be irrelevant.


Thanks,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] PostgreSQL underestimates sorting

2006-11-22 Thread Markus Schaber
Hi, Steinar,

Steinar H. Gunderson wrote:
 On Wed, Nov 22, 2006 at 11:17:23AM +0100, Markus Schaber wrote:
 The Backend allocates gigs of memory (we've set sort_mem to 1 gig), and
 then starts spilling out more Gigs of temporary data to the disk.
 
 How much RAM is in the server? Remember that sort_mem is _per sort_, so if
 you have multiple sorts, it might allocate several multiples of the amount
 you set up.

That one machine has 16 Gigs of ram, and about 10 Gigs tend to be free
/ part of the Linux blocklayer cache.

The temporary data is not swapping, it's the Postgres on-disk sort
algorithm.

HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL underestimates sorting

2006-11-22 Thread Markus Schaber
Hi, Frank,

Frank Wiles wrote:


 The temporary data is not swapping, it's the Postgres on-disk sort
 algorithm.
 
Are you actually running a query where you have a GB of data
you need to sort?  If not I fear you may be causing the system
to swap by setting it this high. 

Yes, the table itself is about 40 Gigs in size, thus much larger than
the memory. The machine has 16 Gigs of ram, and 10-12 Gigs are available
for PostgreSQL + Disk Cache.

There's no swapping, only 23 MB of swap are used (40 Gigs are available).

That's one example configuration, there are others on different machines
where it turns out that forcing index usage leads to faster queries, and
less overall ressource consumption. (Or, at least, faster delivery of
the first part of the result so the application can begin to process it
asynchroneously).

Thanks,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Slow SELECT on three or more clients

2006-11-15 Thread Markus Schaber
Hi, Amir,

AMIR FRANCO D. JOVEN wrote:

 My current project uses PostgreSQL 7.3.4.

By all means, please upgrade.

The newest 7.3 series version is 7.3.16, which fixes lots of critical
bugs, and can be used as a drop-in replacement for 7.3.4 (see Release
Notes at http://www.postgresql.org/docs/7.3/interactive/release.html )

The newest stable release is 8.1.5, and 8.2 is just on the roads...

 I have a table with 94 fields and a select with only one resultset in
 only one client consumes about 0.86 seconds.

with only on resultset?

You mean with only one returned row, I presume.

Each SELECT has exactly one resultset, which can contain zero to many rows.

Please check the following:

- Did you create the appropriate indices?

- Version 7.3.X may suffer from index bloat, so REINDEX might help.

- Did you VACUUM and ANALYZE the table properly?

- Is your free space map setting, the statistics targets, and other
config options tuned to fit your environment?

- Maybe a VACUUM FULL or a CLUSTER command may help you.

 for a single select with one field in one resultset, is 0.86 seconds normal?

That depends on the circumstances.

Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Easy read-heavy benchmark kicking around?

2006-11-08 Thread Markus Schaber
Hi, Brian,

Brian Hurt wrote:

 So the question is: is there an easy to install and run, read-heavy
 benchmark out there that I can wave at them to get them to fix the
 problem?

For sequential read performance, use dd. Most variants of dd I've seen
output some timing information, and if not, do a time dd
if=/your/device of=/dev/null bs=1M on the partition.

Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Best COPY Performance

2006-10-25 Thread Markus Schaber
Hi, Worky,

Worky Workerson wrote:

 $ psql -c COPY my_table TO STDOUT  my_data
 $ ls my_data
 2018792 edgescape_pg_load
 $ time cat my_data | psql -c COPY mytable FROM STDIN
 real5m43.194s
 user0m35.412s
 sys 0m9.567s

That's via PSQL, and you get about 5 MB/Sec.

 On a table with no indices, triggers and contstraints, we managed to
 COPY about 7-8 megabytes/second with psql over our 100 MBit network, so
 here the network was the bottleneck.
 
 hmm, this makes me think that either my PG config is really lacking,
 or that the SAN is badly misconfigured, as I would expect it to
 outperform a 100Mb network.  As it is, with a straight pipe to psql
 COPY, I'm only working with a little over 5.5 MB/s.  Could this be due
 to the primary key index updates?

Yes, index updates cause both CPU load, and random disk access (which is
slow by nature).


HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Best COPY Performance

2006-10-25 Thread Markus Schaber
Hi, Craig,

Craig A. James wrote:

 Would dblink() help in any way?
 
 It might if perl wasn't so damned good at this. ;-)

You know that you can use Perl inside PostgreS via plperl?

HTH,
Markus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Index on two columns not used

2006-10-24 Thread Markus Schaber
Hi, Tom,

Tom Lane wrote:

 You're wrong.  An UPDATE always writes a new version of the row (if it
 overwrote the row in-place, it wouldn't be rollback-able).  The new
 version has a different TID and therefore the index entry must change.
 To support MVCC, our approach is to always insert a new index entry
 pointing at the new TID --- the old one remains in place so that the old
 version can still be found by transactions that need it.

OK, good you corrected me.

I had the weird impression that both row versions have the same tuple ID
(as they are different versions of the same tuple), and so an index
change is not necessary when both versions fit on the same page.

Thanks,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Best COPY Performance

2006-10-23 Thread Markus Schaber
Hi, Worky,

Worky Workerson wrote:
 I am currently getting
 between 10K and 15K inserts/second. 

 I ran trivial little insert into a table with a single integer row and
 came close to 250K inserts/second using psql's \copy, so I'm thinking
 that my code could be optimized a bit more, but wanted to check around
 to see if that was the case.

Could you COPY one of your tables out to disk via psql, and then COPY it
back into the database, to reproduce this measurement with your real data?

Also, how much is the disk load, and CPU usage?

As long as psql is factor 20 better than your perl script, I think that
the perl interface is what should be optimized.

On a table with no indices, triggers and contstraints, we managed to
COPY about 7-8 megabytes/second with psql over our 100 MBit network, so
here the network was the bottleneck.

You should think about making your perl program writing the COPY
statement as text, and piping it into psql.

HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Index on two columns not used

2006-10-23 Thread Markus Schaber
Hi, Peter,

Péter Kovács wrote:
 Sorry for the amateurish question, but what are heap tuples?
 
 Also, my understanding is that the following statement applies only for
 composite indexes: PostgreSQL can't use the values stored in the index
 to check the join condition. I assume that PostgreSQL will be able to
 use single-column-indexes for join conditions. Is this correct?

Both questions are tightly related:

First, the heap is the part of the table where the actual tuples are
stored.

PostgreSQL uses an MVCC system, that means that multiple versions (with
their transaction information) of a single row can coexist in the heap.
This allows for higher concurrency in the backend.

Now, the index basically stores pointers like pages 23 and 42 contain
rows with value 'foo', but version information is not replicated to the
index pages, this keeps the index' size requirements low.

Additionally, in most UPDATE cases, the new row version will fit into
the same page as the old version. In this case, the index does not have
to be changed, which is an additional speed improvement.

But when accessing the data via the index, it can only give a
preselection of pages that contain interesting data, and PostgreSQL has
to look into the actual heap pages to check whether there really are row
versions that are visible in the current transaction.


A further problem is that some GIST index types are lossy, that means
the index does not retain the full information, but only an
approximation, for efficiency reasons.

A prominent example are the PostGIS geometry indices, they only store
the bounding box (4 float values) instead of the whole geometry (may be
millions of double precision coordinates). So it may be necessary to
re-check the condition with the real data, using the lossy index for
preselection.

HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Index on two columns not used

2006-10-23 Thread Markus Schaber
Hi, Alvaro,

Alvaro Herrera wrote:

 Additionally, in most UPDATE cases, the new row version will fit into
 the same page as the old version. In this case, the index does not have
 to be changed, which is an additional speed improvement.
 Actually, when the UPDATE puts a new row version in the same heap page,
 the index must be updated anyway.

AFAICS only, when the index covers (directly or via function) a column
that's actually changed.

Changing columns the index does not depend on should not need any write
access to that index.

Correct me if I'm wrong.

Thanks,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] VACUUM FULL ANALYZE on 8.1.4 is slower then on 8.0

2006-10-20 Thread Markus Schaber
Hi, Andrzej,

Andrzej Zawadzki wrote:
 After upgrade PostgreSQL from 8.0 to 8.1.4 a VACUUM FULL ANALYZE
 process is much slower, from logs:
 
 8.0
 [13666][postgres][2006-10-06 01:13:38 CEST][1340121452] LOG:  statement: 
 VACUUM FULL ANALYZE;
 [13666][postgres][2006-10-06 01:39:15 CEST][0] LOG:  duration: 1536862.425 ms
 
 
 8.1
 [4535][postgres][2006-10-10 01:08:51 CEST][6144112] LOG:  statement: VACUUM 
 FULL ANALYZE;
 [4535][postgres][2006-10-10 02:04:23 CEST][0] LOG:  duration: 3332128.332 ms
 
 Databases are equal.

Are they on equal disks? And in the same areas of those disks? Some
current disks tend to drop down their speed at the end of the LBA
address space drastically.

HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Scrub one large table against another (vmstat output)

2006-10-12 Thread Markus Schaber
Hi, Brendan,

Brendan Curran wrote:
 What prevents you from using an aggregate function?
 
 I guess I could actually obtain the results in an aggregate function and
 use those to maintain a summary table. There is a web view that requires
 'as accurate as possible' numbers to be queried per group (all 40 groups
 are displayed on the same page) and so constant aggregates over the
 entire table would be a nightmare.

That sounds just like a case for GROUP BY and a materialized view.

Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Performance Optimization for Dummies 2 - the SQL

2006-10-04 Thread Markus Schaber
Hi, Alex,

Alex Stapleton wrote:

 explain analyze is more helpful because it prints the times.
 
 You can always use the \timing flag in psql ;)

Have you ever tried EXPLAIN ANALYZE?

\timing gives you one total timing, but EXPLAIN ANALYZE gives you
timings for sub-plans, including real row counts etc.

HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Poor performance on very simple query ?

2006-10-04 Thread Markus Schaber
Hi, Tobias,

Tobias Brox wrote:

 How can you have a default value on a primary key?

Just declare the column with both a default value and a primary key
constraint.

It makes sense when the default value is calculated instead of a
constant, by calling a function that generates the key.

In fact, the SERIAL type does nothing but defining a sequence, and then
use nextval('sequencename') as default.

HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performace Optimization for Dummies

2006-10-03 Thread Markus Schaber
Hi, Carlo,

Carlo Stonebanks wrote:
 Did you think about putting the whole data into PostgreSQL using COPY in
 a nearly unprocessed manner, index it properly, and then use SQL and
 stored functions to transform the data inside the database to the
 desired result?
 
 This is actually what we are doing. The slowness is on the row-by-row 
 transformation. Every row reqauires that all the inserts and updates of the 
 pvious row be committed - that's why we have problems figuring out how to 
 use this using SQL set logic.

Maybe group by, order by, distinct on and hand-written functions
and aggregates (like first() or best()) may help.

You could combine all relevant columns into an user-defined compund
type, then group by entity, and have a self-defined aggregate generate
the accumulated tuple for each entity.

Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performace Optimization for Dummies

2006-10-03 Thread Markus Schaber
Hi, Carlo,

Carlo Stonebanks wrote:

 Trying to achieve a high level of data quality in one large project is
 not often possible. Focus on the most critical areas of checking and get
 that working first with acceptable performance, then layer on additional
 checks while tuning. The complexity of the load programs you have also
 means they are susceptible to introducing data quality problems rather
 than removing them, so an incremental approach will also aid debugging
 of the load suite.
 
 I couldn't agree more.

I still think that using a PL in the backend might be more performant
than having an external client, alone being the SPI interface more
efficient compared to the network serialization for external applications.

HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Unsubscribe

2006-10-02 Thread Markus Schaber
Hi, Uwcssa,

uwcssa wrote:
 
  Please unsubscribe me!  Thank you!   

Sorry, but we (the list members) are unable do that, we have no
adminstrative power on the list. :-(

  Also, it would be better to have a message foot saying how to unsubscribe.

List unsubscribe information is contained in the Headers of every mail
that's sent over the list:

List-Archive: http://archives.postgresql.org/pgsql-performance
List-Help: mailto:[EMAIL PROTECTED]
List-ID: pgsql-performance.postgresql.org
List-Owner: mailto:[EMAIL PROTECTED]
List-Post: mailto:pgsql-performance@postgresql.org
List-Subscribe:
mailto:[EMAIL PROTECTED]
List-Unsubscribe:
mailto:[EMAIL PROTECTED]

Additionally, there is a link to the unsubscribe web form at the list
archive page: http://archives.postgresql.org/pgsql-performance/

HTH,
Markus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] how to optimize postgres 8.1

2006-09-30 Thread Markus Schaber
Hi, Gurkan,

[EMAIL PROTECTED] wrote:

 If I run the query below with informix, it gives cost=107.
 with postgres with additional indexes it gives cost=407, before the additional
 indexes it was even much slower

What are your real timing measurements, in a produciton-like setup in a
production-like load? That's the only kind of benchmarking that will
give you an useful comparison.

You cannot compare anything else.

Especially, you cannot compare those artificial cost estimator values,
as they are likely to be defined differently for PostgreSQL and Informix.

For PostgreSQL, they are relative values to the cost of reading a page
as part of a sequential scan. And those values are tunable - fiddle with
the random_page_cost and cpu_*_cost values in the postgresql.conf, and
you will see very different values compared to the 407 you see now, even
if the query plan stays equal.

Do you look up the definition of cost for Informix? Have you made shure
that they're comparable?

HTH,
Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performace Optimization for Dummies

2006-09-29 Thread Markus Schaber
Hi, Carlo,

Carlo Stonebanks wrote:

 We urgently need a major performance improvement.

Did you think about putting the whole data into PostgreSQL using COPY in
a nearly unprocessed manner, index it properly, and then use SQL and
stored functions to transform the data inside the database to the
desired result?

We're using this way for some 3rd-party databases we have to process
in-house.

HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performace Optimization for Dummies

2006-09-29 Thread Markus Schaber
Hi, Carlo,

Carlo Stonebanks wrote:

 From what I can see, autovacuum is hitting the db's in question about once 
 every five minutes. Does this imply an ANALYZE is being done automatically 
 that would meet the requirements we are talking about here? Is there any 
 benefit ot explicitly performing an ANALYZE?

Autovacuum looks at the modification statistics (they count how much
modifications happened on the table), and decides whether it's time to
VACUUM (reclaim empty space) and ANALYZE (update column value
distributions) the table.

The exact thresholds for Autovacuum to kick in are configurable, see the
docs.

HTH,
Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Decreasing BLKSZ

2006-09-26 Thread Markus Schaber
Hi, Marc,

Marc Morin wrote:

 I wonder whether there is a way to use table partitioning to 
 make the insert pattern more localized?  We'd need to know a 
 lot more about your insertion patterns to guess how, though.
 
 We're doing partitioning as well.

And is constraint exclusion set up properly, and have you verified that
it works?

HTH,
Markus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Multi-processor question

2006-09-25 Thread Markus Schaber
Hi, Kjell Tore,

Kjell Tore Fossbakk wrote:

 I got two AMD Opteron 885 processors (2.6ghz) and 8 gig of memory.
 Harddrives are 4 scsi disks in 10 raid.
 
 I'm running gentoo, and the kernel finds and uses all of my 2 (4) cpu's.
 
 How can i actually verify that my PostgreSQL (or that my OS) actually
 gives each new query a fresh idle CPU) all of my CPU's?

On unixoid systems, use top to display individual CPU loads, they should
be balanced, if you issue the same type of queries in parallel.[1] On
Windows, the Task Manager should be able to display individual CPU load
graphs.

Note, however, that if you issue different kinds of queries in parallel,
it is well possible that some CPUs have 100% load (on CPU-intensive
queries), and the other CPUs have low load (processing the other, I/O
intensive queries.

Btw, if your queries need a long time, but CPU load is low, than it is
very likely that you're I/O bound, either at the disks in the Server, or
at the network connections to the clients.

HTH,
Markus
[1] You might need some command line options or keys, e. G. on some
debian boxes over here, one has to press 1 to switch a running top to
multi-cpu mode, pressing 1 again switches back to accumulation.

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Decreasing BLKSZ

2006-09-25 Thread Markus Schaber
Hi, Marc,

Marc Morin wrote:

 The problem is, the insert pattern has low correlation with the
 (key,logtime) index.   In this case, would need 1M blocks in my
 shared_buffer space to prevent a read-modify-write type of pattern
 happening during the inserts (given a large enough database).

Would it be possible to change the primary key to (logtime,key)? This
could help keeping the working window small.

Secondly, the real working set is smaller, as the rows are all inserted
at the end of the table, filling each page until it's full, so only the
last pages are accessed. There's no relation between the index order,
and the order of data on disk, unless you CLUSTER.

 Any comment on other affects or gotchas with lowering the size of
 BLKSZ?  Currently, our database is thrashing its cache of blocks we
 we're getting only ~100 inserts/second, every insert results in a
 evict-read-modify operation.

I'm not shure that's the correct diagnosis.

Do you have one transaction per insert? Every transaction means a forced
sync to the disk, so you won't get more than about 100-200 commits per
second, depending on your actual disk rotation speed.

To improve concurrency of the numer of inserters running in parallel,
try to tweak the config variables commit_delay and commit_sibling, so
you get a higher overall throughput at cost of an increased delay per
connection, and increase the number of inserters. Using sensible
tweaking, the throughput should scale nearly linear with the number of
backens. :-)

If feasible for your application, you can also bundle several log
entries into a single transaction. If you're CPU bound, you can use COPY
instead of INSERT or (if you can wait for 8.2) the new multi-row INSERT
to further improve performance, but I doubt that you're CPU bound.

The only way to really get over the sync limit is to have (at least)
the WAL on a battery backed ram / SSD media that has no spinning disk
physical limit, or abandon crash safety by turning fsync off.

Thanks,
Markus.
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Confusion and Questions about blocks read

2006-09-23 Thread Markus Schaber
Hi, Tom,

Tom Lane wrote:
 Alex Turner [EMAIL PROTECTED] writes:
 Home come the query statistics showed that 229066 blocks where read given
 that all the blocks in all the tables put together only total 122968?
 
 You forgot to count the indexes.  Also, the use of indexscans in the
 mergejoins probably causes multiple re-reads of some table blocks,
 depending on just what the physical ordering of the rows is.

As far as I understand, Index Bitmap Scans improve this behaviour, by
ensuring that every table block is read only once.

Btw, would it be feasible to enhance normal index scans by looking at
all rows in the current table block whether they meet the query
criteria, fetch them all, and blacklist the block for further revisiting
during the same index scan?

I think that, for non-sorted cases, this could improve index scans a
little, but I don't know whether it's worth the effort, given that
bitmap indidex scans exist.

Thanks,
Markus



-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-22 Thread Markus Schaber
Hi, Guy,

Guy Thornley wrote:

 Of course you could argue the OS should be able to detect this, and prevent
 it occuring anyway. I don't know anything about linux's behaviour in this
 area.

Yes, one can argue that way.

But a generic Algorithm in the OS can never be as smart as the
application which has more informations about semantics and algorithms.
Everything else would need a crystal ball device :-)

HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-21 Thread Markus Schaber
Hi, Luke,

Luke Lonergan wrote:

 I thought that posix_fadvise() with POSIX_FADV_WILLNEED was exactly
 meant for this purpose?
 
 This is a good idea - I wasn't aware that this was possible.

This possibility was the reason for me to propose it. :-)

 We'll do some testing and see if it works as advertised on Linux and
 Solaris.

Fine, I'm looking forward to the results.

According to my small test, it works at least on linux 2.6.17.4.

Btw, posix_fadvise() could even give a small improvement for
multi-threaded backends, given that the I/O subsystem is smart enough to
cope intelligently to cope with large bunches of outstanding requests.

HTH,
Markus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-21 Thread Markus Schaber
Hi, Bucky,

Bucky Jordan wrote:

 We can implement multiple scanners (already present in MPP), or we
 could
 implement AIO and fire off a number of simultaneous I/O requests for
 fulfillment.
 
 So this might be a dumb question, but the above statements apply to the
 cluster (e.g. postmaster) as a whole, not per postgres
 process/transaction correct? So each transaction is blocked waiting for
 the main postmaster to retrieve the data in the order it was requested
 (i.e. not multiple scanners/aio)?

No, that's a wrong assumption.

It applies per active backend. When connecting, the Postmaster forks a
new backend process. Each backend process has its own scanner and
executor. The main postmaster is only for coordination (forking, config
reload etc.), all the work is done in the forked per-connection backends.

Furthermore, the PostgreSQL MVCC system ensures that readers are neither
ever blocked nor blocking other backends. Writers can block each other
due to the ACID transaction semantics, however the MVCC limits that to a
minimum.

 In this case, the only way to take full advantage of larger hardware
 using normal postgres would be to run multiple instances? (Which might
 not be a bad idea since it would set your application up to be able to
 deal with databases distributed on multiple servers...)

Typical OLTP applications (Web UIs, Booking systems, etc.) have multiple
connections, and those run fully parallel.

So if your application is of this type, it will take full advantage of
larger hardware. In the list archive, you should find some links to
benchmarks that prove this statement, PostgreSQL scales linearly, up to
8 CPUs and 32 hyperthreads in this benchmarks.

Our discussion is about some different type of application, where you
have a single application issuing a single query at a time dealing with
a large amount (several gigs up to teras) of data.

Now, when such a query is generating sequential disk access, the I/O
scheduler of the underlying OS can easily recognize that pattern, and
prefetch the data, thus giving the full speed benefit of the underlying
RAID.

The discussed problem arises when such large queries generate random
(non-continous) disk access (e. G. index scans). Here, the underlying
RAID cannot effectively prefetch data as it does not know what the
application will need next. This effectively limits the speed to that of
a single disk, regardless of the details of the underlying RAID, as it
can only process a request at a time, and has to wait for the
application for the next one.

Now, Bizgres MPP goes the way of having multiple threads per backend,
each one processing a fraction of the data. So there are always several
outstanding read requests that can be scheduled to the disks.

My proposal was to use posix_fadvise() in the single-threaded scanner,
so it can tell the OS I will need those blocks in the near future. So
the OS can pre-fetch those blocks into the cache, while PostgreSQL still
processes the previous block of data.

Another proposal would be to use so-called asynchroneous I/O. This is
definitely an interesting and promising idea, but needs much more
changes to the code, compared to posix_fadvise().


I hope that this lengthy mail is enlightening, if not, don't hesitate to
ask.

Thanks for your patience,
Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-20 Thread Markus Schaber
Hi, Luke,

Luke Lonergan wrote:

 Since PG's heap scan is single threaded, the seek rate is equivalent to a
 single disk (even though RAID arrays may have many spindles), the typical
 random seek rates are around 100-200 seeks per second from within the
 backend.  That means that as sequential scan performance increases, such as
 happens when using large RAID arrays, the random_page_cost will range from
 50 to 300 linearly as the size of the RAID array increases.

Do you think that adding some posix_fadvise() calls to the backend to
pre-fetch some blocks into the OS cache asynchroneously could improve
that situation?

I could imagine that e. G. index bitmap scans could profit in the heap
fetching stage by fadvise()ing the next few blocks.

Maybe asynchroneous I/O could be used for the same benefit, but
posix_fadvise is less() intrusive, and can easily be #define'd out on
platforms that don't support it.

Combine this with the Linux Kernel I/O Scheduler patches (readahead
improvements) that were discussed here in summer...

Regards,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Large tables (was: RAID 0 not as fast as

2006-09-20 Thread Markus Schaber
Hi, Luke,

Luke Lonergan wrote:

 Do you think that adding some posix_fadvise() calls to the backend to
 pre-fetch some blocks into the OS cache asynchroneously could improve
 that situation?
 
 Nope - this requires true multi-threading of the I/O, there need to be
 multiple seek operations running simultaneously.  The current executor
 blocks on each page request, waiting for the I/O to happen before requesting
 the next page.  The OS can't predict what random page is to be requested
 next.

I thought that posix_fadvise() with POSIX_FADV_WILLNEED was exactly
meant for this purpose?

My idea was that the executor could posix_fadvise() the blocks it will
need in the near future, and later, when it actually issues the blocking
read, the block is there already. This could even give speedups in the
single-spindle case, as the I/O scheduler could already fetch the next
blocks while the executor processes the current one.

But there must be some details in the executor that prevent this.

 We can implement multiple scanners (already present in MPP), or we could
 implement AIO and fire off a number of simultaneous I/O requests for
 fulfillment.

AIO is much more intrusive to implement, so I'd preferrably look
whether posix_fadvise() could improve the situation.

Thanks,
Markus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] High CPU Load

2006-09-19 Thread Markus Schaber
Hi, Jerome,

Jérôme BENOIS wrote:

 Now i Have 335 concurrent connections, i decreased work_mem parameter to
 32768 and disabled Hyper Threading in BIOS. But my CPU load is still
 very important.

What are your settings for commit_siblings and commit_delay?

 Tomorrow morning i plan to add 2Giga RAM ... But I don't understand why
 my database server worked good with previous version of postgres and
 same queries ...

I don't think any more that it's the RAM, as you told you don't go into
swap. It has to be something else.

Could you try logging which are the problematic queries, maybe they have
bad plans for whatever reason.

 I used already database pool on my application and when i decrease
 number of connection my application is more slow ;-(

Could you just make sure that the pool really uses persistent
connections, and is not broken or misconfigured, always reconnect?


HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] High CPU Load

2006-09-19 Thread Markus Schaber
Hi, Jerome,

Jérôme BENOIS wrote:

 Now i Have 335 concurrent connections, i decreased work_mem parameter to
 32768 and disabled Hyper Threading in BIOS. But my CPU load is still
 very important.
 What are your settings for commit_siblings and commit_delay?
 It default :
 
 #commit_delay = 01   # range 0-10, inmicroseconds
 #commit_siblings = 5 # range 1-1000

You should uncomment them, and play with different settings. I'd try a
commit_delay of 100, and commit_siblings of 5 to start with.

 I plan to return to previous version : 7.4.6 in and i will reinstall all
 in a dedicated server in order to reproduce and solve the problem.

You should use at least 7.4.13 as it fixes some critical buts that were
in 7.4.6. They use the same on-disk format and query planner logic, so
they should not have any difference.

I don't have much more ideas what the problem could be.

Can you try to do some profiling (e. G. with statement logging) to see
what specific statements are the one that cause high cpu load?

Are there other differences (besides the PostgreSQL version) between the
two installations? (Kernel, libraries, other software...)

HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Poor performance on seq scan

2006-09-18 Thread Markus Schaber
Hi, Piotr,

Piotr Kołaczkowski wrote:

 Why match rows from the heap if ALL required data are in the index itself?
 Why look at the heap at all?

Because the index does not contain any transaction informations, so it
has to look to the heap to find out which of the rows are current.

This is one of the more debated points in the PostgreSQL way of MVCC
implementation.


Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] High CPU Load

2006-09-15 Thread Markus Schaber
Hi, Jérôme,

Jérôme BENOIS wrote:

 max_connections = 512

Do you really have that much concurrent connections? Then you should
think about getting a larger machine, probably.

You will definitely want to play with commit_delay and commit_siblings
settings in that case, especially if you have write access.

 work_mem = 65536
 effective_cache_size = 131072

hmm, 131072*8*1024 + 512*65536*1024 = 35433480192 - thats 33 Gig of
Memory you assume here, not counting OS usage, and the fact that certain
queries can use up a multiple of work_mem.

Even on amachine that big, I'd be inclined to dedicate more memory to
caching, and less to the backends, unless specific needs dictate it. You
could try to use sqlrelay or pgpool to cut down the number of backends
you need.

 My Server is Dual Xeon 3.06GHz

For xeons, there were rumours about context switch storms which kill
performance.

 with 2 Go RAM and good SCSI disks.

For 2 Gigs of ram, you should cut down the number of concurrent backends.

Does your machine go into swap?

Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Vacuums on large busy databases

2006-09-15 Thread Markus Schaber
Hi, Francisco,

Francisco Reyes wrote:

 I am looking to either improve the time of the vacuum or decrease it's
 impact on the loads.
 Are the variables:
 #vacuum_cost_delay = 0  # 0-1000 milliseconds
 #vacuum_cost_page_hit = 1   # 0-1 credits
 #vacuum_cost_page_miss = 10 # 0-1 credits
 #vacuum_cost_page_dirty = 20# 0-1 credits
 #vacuum_cost_limit = 200# 0-1 credits

Just to avoid a silly mistake:

You pasted those settings with # sign, that means that PostgreSQL does
treat them as comments, and uses the defaults instead. You should make
shure that you use real settings in your config.

HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] sql-bench

2006-09-14 Thread Markus Schaber
Hi, Yoav X,

yoav x wrote:
 You can use the test with InnoDB by giving the --create-options=engine=innodb 
 option in the
 command line. Even with InnoDB, in some specific tests PG looks very bad 
 compared to InnoDB.

As far as I've seen, they include the CREATE TABLE command in their
benchmarks.

Realistic in-production workloads don't have so much create table
commands, I think.

Wondering,
Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] database bloat,non removovable rows, slow query etc...

2006-09-04 Thread Markus Schaber
Hi, Matteo,

Matteo Sgalaberni wrote:

 A my collegue JDBC application that stay in idle intransaction 24h/24h

Just a little note: For most applications, this can be fixed updating
the JDBC driver. Old versions had the behaviour of auto-opening a new
backend transaction on commit/rollback, whereas new versions delay that
until the first statement in the new transaction is sent.

This won't fix applications that do a select and then sit idle for days
before committing/rolling back, however. Those should be fixed or use
autocommit mode.

 Good to know this...but why this behaviour? it'is lovely...:)
 
 Tom , can you explain why?...

It is because the transaction IDs are global per cluster.

Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] database bloat,non removovable rows, slow query etc...

2006-09-01 Thread Markus Schaber
Hi, Tom and Matteo,

Tom Lane wrote:
 Matteo Sgalaberni [EMAIL PROTECTED] writes:
 22 daemons that have a persistent connection to this database(all
 connection are in idle(no transaction opened).
 
 You may think that, but you are wrong.
 
 INFO:  cliente: found 0 removable, 29931 nonremovable row versions in 559 
 pages
 DETAIL:  29398 dead row versions cannot be removed yet.
 
 The only way the above can happen is if there are some fairly old open
 transactions.  Looking in pg_stat_activity might help you identify the
 culprit(s).

Another possibility might be an outstanding two-phase-commit transaction.

HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] perf pb solved only after pg_dump and restore

2006-08-28 Thread Markus Schaber
Hi, Guillaume

Guillaume Cottenceau wrote:

 We noticed a slowdown on our application while traffic was kinda
 heavy. The logics after reading the docs commanded us to trim the
 enlarged tables, run VACUUM ANALYZE and then expect fast
 performance again; but it wasn't the case[1].

What exactly do you mean with trim the enlarged tables?

 Out of the blue, we dumped the database, removed it, recreated
 from the restore, and now the performance is lightning fast
 again.
 
 Does it look familiar to anyone? I thought running VACUUM ANALYZE
 after a trim should be enough so that pg has assembled the data
 and has good statistical knowledge of the tables contents..

This looks like either your free_space_map setting is way to low, or you
have index bloat.

Maybe a VACUUM FULL fullowed by a REINDEX will have solved your problem.

It also might make sense to issue a CLUSTER instead (which combines the
effects of VACUUM FULL, REINDEX and physically reordering the data).

When the free_space_map is to low, VACUUM ANALYZE should have told you
via a warning (at least, if your logging is set appropriately).


HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] perf pb solved only after pg_dump and restore

2006-08-28 Thread Markus Schaber
Hi, Guillaume,

Guillaume Cottenceau wrote:

 We have a couple of logs files which get larger over time
 (millions of rows). As they are log files, they can be trimmed
 from older values.

Ah, ok, you DELETEd the old rows.

So I assume that you never UPDATE, but only INSERT new entries and
sometimes DELETE a big bunch of entries from the beginning.

This is a special usage pattern, where the normal VACUUM is not well
suited for.

DELETing rows itsself does not free any space. Only after your
transaction is committed, a following VACUUM FULL or CLUSTER does
actually free the space.

VACUUM and VACUUM ANALYZE only remove obsolete rows from the pages and
marks them free (by entering them into the free space map, as long as
that one is large enough). That means that your table will actually stay
as large as before, having 90% of free pages at the beginning and 10%
used pages at the very end. New INSERTs and UPDATEs will prefer to use
pages from the free space map before allocating new pages, but the
existing rows will stay forever.

Now, VACUUM FULL actively moves rows to the beginning of the table,
allowing to cut the end of the table, while CLUSTER recreates the table
from scratch, in index order. Both lead to a compact storage, having all
used rows at the beginning, and no free pages.

So, I think, in your case VACUUM FULL and CLUSTER would both have solved
your problem.

  max_fsm_pages is 2
 Do they look low?
 Notice: table data is only 600M after trim (without indexes),
 while it was probably 3x to 10x this size before the trim.

10x the size means 6G, so 5.4G of data were freed by the trim. Each page
has 8k in size, so the fsm needs about 675000 pages. So, yes, for your
usage, they look low, and give very suboptimal results.

 have index bloat.
 
 Can you elaborate? I have created a couple of indexes (according
 to multiple models of use in our application) and they do take up
 quite some disk space (table dump is 600M but after restore it
 takes up 1.5G on disk) but I thought they could only do good or
 never be used, not impair performance..

Like tables, indices may suffer from getting bloated by old, unused
entries. Especially the GIST based indices in 7.4 (used by PostGIS and
other plugins) suffered from that problem[1], but recent PostgreSQL
versions have improved in this area.

Now, when the query planner decides to use an index, the index access is
extremely slow because of all the deleted entries the index scan has to
skip.

However, from the additional information you gave above, I doubt it was
index bloat.

 Maybe a VACUUM FULL fullowed by a REINDEX will have solved your problem.
 
 So these would have reordered the data for faster sequential
 access which is not the case of VACUUM ANALYZE?

A VACUUM FULL would have reordered the data, and a REINDEX would have
optimized the index.

 It also might make sense to issue a CLUSTER instead (which combines the
 effects of VACUUM FULL, REINDEX and physically reordering the data).
 
 I was reluctant in using CLUSTER because you have to choose an
 index and there are multiple indexes on the large tables..

Usually, CLUSTERing on one index does not necessarily slow down accesses
on other indices, compared to the non-clustered (= random) table before.

If you have some indices that are somehow related (e. G. a timestamp and
a serial number), CLUSTERing on one index does automatically help the
other index, especially as the query planer uses corellation statistics.

Btw, if your queries often include 2 or 3 columns, a multi-column index
(and clustering on that index) might be the best.

 When the free_space_map is to low, VACUUM ANALYZE should have told you
 via a warning (at least, if your logging is set appropriately).
 
 Unfortunately, we didn't keep the logs of VACUUM ANALYZE, so I
 can't be sure :/

AFAIK, the warning is also output on the psql command line.

HTH,
Markus

[1] We once had an index that was about 100 times larger before REINDEX.

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] perf pb solved only after pg_dump and restore

2006-08-28 Thread Markus Schaber
Hi, Guillaume,

Guillaume Cottenceau wrote:

 About REINDEX: is it ok to consider that REINDEX is to indexes
 what VACUUM FULL is to table data, because it cleans up unused
 index pages?

Yes, roughly speaking.

 And AFAICS you're not running it on a regular basis so your database
 was probably completely bloated which means:
 - bloated indexes,
 - bloated tables (ie a lot of fragmentation in the pages which means
 that you need far more pages to store the same data).
 
 I suppose that table fragmentation occurs when DELETE are
 interleaved with INSERT?

Yes, and it gets ugly as soon as the fsm setting is to low / VACUUM
frequency is to low, so it cannot keep up.

Big bunches of UPDATE/DELETE that hit more than, say 20% of the table
between VACUUM runs, justify a VACUUM FULL in most cases.

 VACUUM ANALYZE is normally run overnight (each night). Is it not
 regular enough? There can be hundreds of thousands of statements
 a day.

Which PostgreSQL version are you using? Maybe you should consider
autovacuum (which is a contrib module at least since 7.4, and included
in the server since 8.1). If you think that vacuum during working hours
puts too much load on your server, there are options to tweak that, at
least in 8.1.

Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Identifying bloated tables

2006-08-28 Thread Markus Schaber
Hi, Michal,

Michal Taborsky - Internet Mall wrote:

 When using this view, you are interested in tables, which have the
 bloat column higher that say 2.0 (in freshly dump/restored/analyzed
 database they should all be around 1.0).

I just noticed some columns in pg_catalog with a bloat value 1 and a
negative wasted space - is this due to the pseudo nature of them?

Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] VACUUM FULL needed sometimes to prevent transaction

2006-08-23 Thread Markus Schaber
Hi, Jeff  all,

Jeff Davis wrote:


 (2) You have a long-running transaction that never completed for some
 strange reason.

I just asked myself whether a 2-phase-commit transaction that was
prepared, but never committed, can block vacuuming and TID recycling.

Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PostgreSQL runs a query much slower than BDE and MySQL

2006-08-17 Thread Markus Schaber
Hi, Peter,

Peter Hardman wrote:

 BTW, are you *sure* you are testing PG 8.1?  The Subquery Scan f2 plan
 node looks unnecessary to me, and I'd have expected 8.1 to drop it out.
 8.0 and before would have left it in the plan though.  This doesn't make
 all that much difference performance-wise in itself, but it does make me
 wonder what you are testing.
 
 Yes, the executables all say version 8.1.3.6044

Would you mind to look at the output of select version();, too?

I ask this because I stumbled over it myself, that I had installed the
correct postgresql and psql versions, but accidentally connected to a
different database installation due to strange environment and script
settings...


Thanks,
Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL runs a query much slower than BDE and MySQL

2006-08-17 Thread Markus Schaber
Hi, Peter,

Peter Hardman wrote:

 select version() returns
 
 PostgreSQL 8.1.3 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 
 (mingw-special)

That looks correct.

I also presume that your environment is not as fragile wr/t connecting
do wrong databases, compared to debian with their multi-cluster
multi-version script wrapper magic.

Don't mind.
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Postgresql Performance on an HP DL385 and

2006-08-16 Thread Markus Schaber
Hi, Jim,

Jim C. Nasby wrote:

 Well, if the controller is caching with a BBU, I'm not sure that order
 matters anymore, because the controller should be able to re-order at
 will. Theoretically. :) But this is why having some actual data posted
 somewhere would be great.

Well, actually, the controller should not reorder over write barriers.


Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] [BUGS] BUG #2567: High IOWAIT

2006-08-11 Thread Markus Schaber
Hi, Jim,

Jim Nasby wrote:

 Your biggest win would be to batch those inserts together into
 transactions, if possible.

Using COPY instead of INSERT might even give better wins, and AFAIK some
client libs use COPY internally (e. G. tablewriter from libpqxx).

 If not, the commit_delay settings might help you out.

As far as I understand, this will only help for concurrent inserts by
different clients, dealing throughput for latency. Please correct me if
I'm wrong.

HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Migrating data from DB2 to SQL Server

2006-08-11 Thread Markus Schaber
Hi, Eldhose,

contact1981 wrote:

 I am trying to migrate data from a DB2 database to SQL Server 2005
 database.  Does anyone know about any migration tool that does that?  I
 have heard about DB2 Migration Tool kit, but I think you can only
 migrate data to a DB2 database with that.  Thank you.

It seems that you, by accident, hit the wrong list with your question.

But, as you're here, why don't you migrate to PostgreSQL instead?


Have a nice day,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Beginner optimization questions, esp. regarding Tsearch2

2006-08-10 Thread Markus Schaber
Hi, Richard and Carl,

Richard Huxton wrote:
 Carl Youngblood wrote:
 - I noticed that there are six different postmaster daemons running.
 Only one of them is taking up a lot of RAM (1076m virtual and 584m
 resident).  The second one is using 181m resident while the others are
 less than 20m each.  Is it normal to have multiple postmaster
 processes?
 
 You should have one master backend process and one per connection. PG is
 a classic multi-process designed server.

There may be some additional background processes, such as the
background writer, stats collector or autovacuum, depending on your
version and configuration.

HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Disk writes

2006-08-07 Thread Markus Schaber
Hi, Reimer,

carlosreimer wrote:

 There is some performance problems with the server and I discovered with
 vmstat tool that there is some process writing a lot of information in
 the disk subsystem.
[..]
 I could I discover who is sending so many data to the disks?

It could be something triggered by your crontab (updatedb comes in my
mind, or texpire from leafnode etc.).

Another idea would be that you have statement logging on, or something
else that produces lots of kernel or syslog messages[1], and your
syslogd is configured to sync() after every line...

HTH,
Markus

[1] We once had such a problem because an ill-compiled kernel having USB
verbose logging on...
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performance with 2 AMD/Opteron 2.6Ghz and 8gig

2006-08-07 Thread Markus Schaber
Hi, Charles,

Charles Sprickman wrote:

 I've also got a 1U with a 9500SX-4 and 4 drives.  I like how the 3Ware
 card scales there - started with 2 drives and got drive speed
 mirroring. Added two more and most of the bonnie numbers doubled.  This
 is not what I'm used to with the Adaptec SCSI junk.

Well, for sequential reading, you should be able to get double drive
speed on a 2-disk mirror with a good controller, as it can balance the
reads among the drives.

Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] PostgreSQL scalability on Sun UltraSparc T1

2006-08-07 Thread Markus Schaber
Hi, Arjen,

Arjen van der Meijden wrote:

 It was the 8core version with 16GB memory... but actually that's just
 overkill, the active portions of the database easily fits in 8GB and a
 test on another machine with just 2GB didn't even show that much
 improvements when going to 7GB (6x1G, 2x 512M), it was mostly in the
 range of 10% improvement or less.

I'd be interested in the commit_siblings and commit_delay settings,
tuning them could give a high increase on throughput for highly
concurrent insert/update workloads, at the cost of latency (and thus
worse results for low concurrency situations).

Different fsync method settings can also make a difference (I presume
that syncing was enabled).

HTH,
Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] sub select performance due to seq scans

2006-08-07 Thread Markus Schaber
Hi, Scott and Hale,

Scott Marlowe wrote:
 Make sure analyze has been run and that the statistics are fairly
 accurate.

It might also help to increase the statistics_target on the column in
question.

HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] loading increase into huge table with 50.000.000 records

2006-07-26 Thread Markus Schaber
Hi, Larry,
Hi, Sven,

Sven Geisler wrote:

 You can increase wal_buffers, checkpoint_segments and checkpoint_timeout
 much higher.

You also should increase the free space map settings, it must be large
enough to cope with your weekly bunch.


Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] RAID stripe size question

2006-07-17 Thread Markus Schaber
Hi, Mikael,

Mikael Carneholm wrote:
 An 0+1 array of 4 disks *could* be enough, but I'm still unsure how WAL
 activity correlates to normal data activity (is it 1:1, 1:2, 1:4,
 ...?) 

I think the main difference is that the WAL activity is mostly linear,
where the normal data activity is rather random access. Thus, a mirror
of few disks (or, with good controller hardware, raid6 on 4 disks or so)
for WAL should be enough to cope with a large set of data and index
disks, who have a lot more time spent in seeking.

Btw, it may make sense to spread different tables or tables and indices
onto different Raid-Sets, as you seem to have enough spindles.

And look into the commit_delay/commit_siblings settings, they allow you
to deal latency for throughput (means a little more latency per
transaction, but much more transactions per second throughput for the
whole system.)


HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] RAID stripe size question

2006-07-17 Thread Markus Schaber
Hi, Mikael,

Mikael Carneholm wrote:

 This is something I'd also would like to test, as a common best-practice
 these days is to go for a SAME (stripe all, mirror everything) setup.
 From a development perspective it's easier to use SAME as the developers
 won't have to think about physical location for new tables/indices, so
 if there's no performance penalty with SAME I'll gladly keep it that
 way.

Usually, it's not the developers task to care about that, but the DBAs
responsibility.

 And look into the commit_delay/commit_siblings settings, they allow you
 to deal latency for throughput (means a little more latency per
 transaction, but much more transactions per second throughput for the
 whole system.)
 
 In a previous test, using cd=5000 and cs=20 increased transaction
 throughput by ~20% so I'll definitely fiddle with that in the coming
 tests as well.

How many parallel transactions do you have?

Markus



-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Kill a session

2006-07-14 Thread Markus Schaber
Hi, Tom,

Tom Lane wrote:
 Our expectation is that all or at least most queries should respond to
 SIGINT or SIGTERM interrupts pretty rapidly, say on a less-than-a-second
 timescale.  However there are various loops in the backend that fail to
 execute CHECK_FOR_INTERRUPTS sufficiently often :-(. 

The same is true for user-defined C funtions.

The PostGIS GEOS geometry functions come to mind, for complex
geometries, they can need hours to complete. And as GEOS is a 3rd-Party
library, I don't see an easy way to make them CHECK_FOR_INTERRUPTS.

Does anybody know how this is for plpgsql, pljava and plpython?


HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] getting better performance

2006-07-07 Thread Markus Schaber
Hi, Eugeny,

Eugeny N Dzhurinsky wrote:

 Do you add / remove tables a lot?  Could be you've got system catalog
 bloat.
 
 Yes, almost each table is dropped and re-created in 3-5 days.

If your really recreate the same table, TRUNCATE may be a better
solution than dropping and recreation.

HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] need vacuum after insert/truncate/insert?

2006-07-07 Thread Markus Schaber
Hi, Craig,

Craig A. James wrote:
 If I insert a bunch of rows, then truncate, then insert a bunch more
 rows, do I need to vacuum?  I've been assuming that TRUNCATE TABLE is a
 brute-force technique that more-or-less tosses the old table and starts
 fresh so that no vacuum is necessary.
 
 Second question: Same scenario as above, but now the table has indexes. 
 Is a reindex needed, or are the indexes they truncated too?

AFAIK, both table and indices are cut down nicely.

But you will need an ANALYZE after refilling of the table, to have
current statistics.


HTH,
Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Update INSERT RULE while running for Partitioning

2006-07-07 Thread Markus Schaber
Hi, Gene,

Gene wrote:
 I'm using PostgreSQL 8.1.4 in a Hibernate Application and I am
 attempting to use partitioning via Inherited tables. At first I was
 going to create a rule per sub-table based on a date range, but found
 out with multiple rules postgres will only return the affected-row count
 on the last rule which gives Hibernate problems.

This could be considered a PostgreSQL bug - maybe you should discuss
this on the appropriate list (general, hackers)?


HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Is postgresql ca do the job for software deployed in

2006-07-05 Thread Markus Schaber
Hi, Mikael,

Just my 2 cents:

Mikael Carneholm wrote:
 Do you really need to create one *DB* per client - that is, is one
 schema (in the same DB) per client out of the question?

Sometimes, schemas would work _technically_, but not politically, as a
postgresql user cannot be prevented from listing all schemas (or even
all databases in the same user), regardless whether he/she has access
rights.

But it is not always acceptable that a customer knows which other
customers one has.

This forces the use of the one cluster per customer paradigm.


Thanks,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Is postgresql ca do the job for software deployed in

2006-07-05 Thread Markus Schaber
Hi, Gregory,

Gregory S. Williamson wrote:
 A sodden late night idea ... schemas don't need to have names that
 are meaningful to outsiders.

Yes, but having schema names like A34FZ37 not only qualifies for
thedailywtf.com, but also tends to produce maintainance nightmares.

And it still allows the customer to estimate the amount of customers.

 Still, the point about political aspects is an important one. OTH,
 schemas provide an elegant way of segregating data.

Yes, they do, and in the ASP case (where we have control over all
software that connects to PostgreSQL) we use the one schema per customer
paradigm quite successfully.

 My $0.02 (not worth what it was)

Oh, I think the're at least $0.03 cents worth. :-)

Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] SAN performance mystery

2006-06-23 Thread Markus Schaber
Hi, Tim,

Tim Allen wrote:
 One thing that has been
 apparent is that autovacuum has not been able to keep the database
 sufficiently tamed. A pg_dump/pg_restore cycle reduced the total
 database size from 81G to 36G.

Two first shots:

- Increase your free_space_map settings, until (auto)vacuum does not
warn about a too small FSM setting any more

- Tune autovacuum to run more often, possibly with a higher delay
setting to lower the load.

If you still have the original database around,

 Performing the restore took about 23 hours.

Try to put the WAL on another spindle, and increase the WAL size /
checkpoint segments.

When most of the restore time was spent in index creation, increase the
sort mem / maintainance work mem settings.


HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] SAN performance mystery

2006-06-23 Thread Markus Schaber
Hi, Tim,

Seems I sent my message to fast, cut in middle of a sencence:

Markus Schaber wrote:
 A pg_dump/pg_restore cycle reduced the total
 database size from 81G to 36G.

 If you still have the original database around,

... can you check wether VACUUM FULL and REINDEX achieve the same effect?

Thanks,
Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Help tuning autovacuum - seeing lots of relationbloat

2006-06-23 Thread Markus Schaber
Hi, Csaba,

Csaba Nagy wrote:

 Well, your application might be completely well behaved and still your
 DBA (or your favorite DB access tool for that matter) can leave open
 transactions in an interactive session. It never hurts to check if you
 actually have idle in transaction sessions. It happened a few times to
 us, some of those were bad coding on ad-hoc tools written by us, others
 were badly behaved DB access tools opening a transaction immediately
 after connect and after each successful command, effectively leaving an
 open transaction when leaving it open while having lunch...

Some older JDBC driver versions had the bug that they always had an open
transaction, thus an application server having some pooled connections
lingering around could block vacuum forever.

Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Help tuning autovacuum - seeing lots of relationbloat

2006-06-23 Thread Markus Schaber
Jim C. Nasby wrote:
 On Wed, Jun 21, 2006 at 01:21:05PM -0300, jody brownell wrote:
 Jun 21 13:04:04 vanquish postgres[3311]: [19-1] DEBUG:  target: removed 
 5645231 row versions in 106508 pages
 Jun 21 13:04:04 vanquish postgres[3311]: [19-2] DETAIL:  CPU 3.37s/1.23u sec 
 elapsed 40.63 sec.
 Jun 21 13:04:04 vanquish postgres[3311]: [20-1] DEBUG:  target: found 
 5645231 removable, 1296817 nonremovable row versions in 114701 pages
 Jun 21 13:04:04 vanquish postgres[3311]: [20-2] DETAIL:  0 dead row versions 
 cannot be removed yet.
 
 So the table contained 5.6M dead rows and 1.3M live rows.
 
 I think you should forget about having autovacuum keep this table
 in-check and add manual vacuum commands to your code. Autovac is
 intended to deal with 99% of use cases; this is pretty clearly in the 1%
 it can't handle.

Maybe your free space map is configured to small, can you watch out for
log messages telling to increase it?


HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] scaling up postgres

2006-06-21 Thread Markus Schaber
Hi, Fzied,

[EMAIL PROTECTED] wrote:

 I'm using httperf/autobench for measurments and the best result I can
 get is that my system can handle a trafiic of almost 1600 New
 con/sec.

Are you using connection pooling or persistent connections between
PostgreSQL and the Apaches?

Maybe it simply is the network latency between the two machines - as the
database is read-only, did you think about having both PostgreSQL and
Apache on both machines, and then load-balancing ingoing http requests
between them?

 I cannot scale beyond that value and the funny thing, is that none of
 the servers is swapping, or heavy loaded, neither postgres nor apache
 are refusing connexions.

And for measuring, are you really throwing parallel http connections to
the server? This sounds like you measure request latency, but the
maximum throughput might be much higher.

 my database is only 58M it's a read only DB and will lasts only for a
 month.

I guess it is a simple table with a single PK (some subscription numer)
- no joins or other things.

For this cases, a special non-RDBMS like MySQL, SQLite, or even some
hancrafted thingy may give you better results.


Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] scaling up postgres

2006-06-21 Thread Markus Schaber
Hi, Zydoon,

Zydoon wrote:

 Now I'm trying to make my tests, and I'm not that sure I will make the
 switch to the PSeries, since my dual xeon with 4 G RAM can handle 3500
 concurrent postmasters consuming 3.7 G of the RAM. I cannot reach this
 number on the PSeries with 2 G.

This sounds like you want to have one postgresql backend per apache
frontend.

Did you try running pgpool on the Apache machine, and have only a few
(hundred) connections to the backend?

Maybe http://en.wikipedia.org/wiki/Memcached could be helpful, too.


Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Some queries starting to hang

2006-06-21 Thread Markus Schaber
Hi, Chris,

Chris Beecroft wrote:

 Query is now returning with results on our replicated database.  Will
 vacuum analyze production now.  So it seems to have done the trick.  Now
 the question is has our auto vacuum failed or was not set up properly...
 A question for my IT people.

Most of the cases when we had database bloat despite running autovacuum,
it was due to a low free_space_map setting.

Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] How can I make this query faster (resend)

2006-05-22 Thread Markus Schaber
Hi, Cstendis,

Cstdenis wrote:

 Server is a dual xeon with a gig of ram dedicated mostly to postgresql.
 Here is the changed lines in my postgresql.conf: http://pastebin.ca/57222

3M is really low for a production server.

Try using pg_pool and limiting it to about 30 or so backend connections,
and then give them at least 30 megs of RAM each.

This should also cut down the connection creation overhead.

HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-10 Thread Markus Schaber
Hi, PFC,

PFC wrote:

 The problem is that you need a set-returning function to retrieve
 the  values. SRFs don't have rowcount estimates, so the plans suck.

What about adding some way of rowcount estimation to SRFs, in the way of:

CREATE FUNCTION foo (para, meters) RETURNS SETOF bar AS
$$ ... function code ... $$ LANGUAGE plpgsql
ROWCOUNT_ESTIMATOR $$ ... estimation code ... $$ ;

Internally, this could create two functions, foo (para, meters) and
estimate_foo(para, meters) that are the same language and coupled
together (just like a SERIAL column and its sequence). The estimator
functions have an implicit return parameter of int8. Parameters may be
NULL when they are not known at query planning time.

What do you think about this idea?

The same scheme could be used to add a CPUCOST_ESTIMATOR to expensive
functions.

HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-10 Thread Markus Schaber
Hi, Scott  all,

Scott Lamb wrote:

 I don't know the answer to this question, but have you seen this tool?
 
 http://brad.livejournal.com/2116715.html

We had a simpler tool inhouse, which wrote a file byte-for-byte, and
called fsync() after every byte.

If the number of fsyncs/min is higher than your rotations per minute
value of your disks, they must be lying.

It does not find as much liers as the script above, but it is less
intrusive (can be ran on every low-io machine without crashing it), and
it found some liers in-house (some notebook disks, one external
USB/FireWire to IDE case, and an older linux cryptoloop implementations,
IIRC).

If you're interested, I can dig for the C source...

HTH,
Markus




-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-10 Thread Markus Schaber
Hi, Bruce,

Bruce Momjian wrote:


It does not find as much liers as the script above, but it is less
 
 Why does it find fewer liers?

It won't find liers that have a small lie-queue-length so their
internal buffers get full so they have to block. After a small burst at
start which usually hides in other latencies, they don't get more
throughput than spindle turns.

It won't find liers that first acknowledge to the host, and then
immediately write the block before accepting other commands. This
improves latency (which is measured in some benchmarks), but not
syncs/write rate.

Both of them can be captured by the other script, but not by my tool.

HTH,
Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] [GENERAL] Arguments Pro/Contra Software Raid

2006-05-10 Thread Markus Schaber
Hi, Bruce,

Markus Schaber wrote:

It does not find as much liers as the script above, but it is less
Why does it find fewer liers?
 
 It won't find liers that have a small lie-queue-length so their
 internal buffers get full so they have to block. After a small burst at
 start which usually hides in other latencies, they don't get more
 throughput than spindle turns.

I just reread my mail, and must admit that I would not understand what I
wrote above, so I'll explain a little more:

My test programs writes byte-for-byte. Let's say our FS/OS has 4k page-
and blocksize, that means 4096 writes that all write the same disk blocks.

Intelligent liers will see that the the 2nd and all further writes
obsolete the former writes who still reside in the internal cache, and
drop those former writes from cache, effectively going up to 4k
writes/spindle turn.

Dumb liers will keep the obsolete writes in the write cache / queue, and
so won't be caught by my program. (Note that I have no proof that such
disks actually exist, but I have enough experience with hardware that I
won't be surprised.)


HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-10 Thread Markus Schaber
Hi, PFC,

PFC wrote:

 You need to do some processing to know how many rows the function
 would  return.
 Often, this processing will be repeated in the function itself.
 Sometimes it's very simple (ie. the function will RETURN NEXT each 
 element in an array, you know the array length...)
 Sometimes, for functions returning few rows, it might be faster to 
 compute the entire result set in the cost estimator.

I know, but we only have to estmiate the number of rows to give a hint
to the query planner, so we can use lots of simplifications.

E. G. for generate_series we return ($2-$1)/$3, and for some functions
even constant estimates will be good enough.

 - please execute me and store my results in a temporary storage,
 count  the rows returned, and plan the outer query accordingly

That's an interesting idea.

Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-10 Thread Markus Schaber
Hi, Nils,

Nis Jorgensen wrote:

 It will probably be quite common for the number to depend on the number
 of rows in other tables. Even if this is fairly constant within one db
 (some assumption), it is likely to be different in others using the same
 function definition. Perhaps a better solution would be to cache the
 result of the estimator function.

Sophisticated estimator functions are free to use the pg_statistics
views for their row count estimation.


HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] extremely slow when execute select/delete for certain

2006-05-08 Thread Markus Schaber
Hi, Kah,

[EMAIL PROTECTED] wrote:

 I already vacuum those tables with full option but it still the same.
 
 What could be the possible causes of this problem?
 How can I solve it?
 
 CPU - Intel Xeon 2.40 GHz
 Memory - 1.5G
 Postgresql version: 7.2.2

First, you should consider to upgrade your PostgreSQL server to a newer
version, at least to 7.2.8 which fixes some critical bugs.

But it will be much better to upgrade to current 8.1 version, as I think
that your problem is caused by index bloat, and indices are handled much
better in 8.1.

Try recreating your indices using REINDEX command.

HTH,
Markus



-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Why so slow?

2006-05-02 Thread Markus Schaber
Hi, Bill,

Bill Moran wrote:

 My understanding is basically that if you vacuum with the correct
 frequency, you'll never need to vacuum full.  This is why the
 autovacuum system is so nice, it adjusts the frequency of vacuum according
 to how much use the DB is getting.

Additonally, the free_space_map setting has to be high enough, it has
to cover enough space to put in all pages that get dead rows between two
vacuum runs.

HTH,
Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] how unsafe (or worst scenarios) when setting fsync

2006-04-28 Thread Markus Schaber
Hk, Guoping,

Guoping Zhang wrote:

 a) The tests consists of ten thousands very small transactions, which are
 not grouped, that is why so slow with compare to set fsync off.

If those transactions are submitted by concurrent applications over
several simulataneous connections, playing with commit_delay and
commit_siblins may improve your situation.

HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Arrays and index scan

2006-04-28 Thread Markus Schaber
Hello,

I'm searching for a comfortable way to get a variable-size bunch of user
specified Objects via a single prepared statement, so I wanted to submit
an ARRAY.

However, the query planner seems to refuse to make index scans even with
8.1:

testdb=# EXPLAIN SELECT * from streets WHERE link_id = ANY(ARRAY[1,2,3]);
 QUERY PLAN

 Seq Scan on streets  (cost=0.00..288681.74 rows=1713754 width=393)
   Filter: (link_id = ANY ('{1,2,3}'::integer[]))
(2 rows)



Via IN, it works fine, but hast the disadvantage that we cannot use
prepared statements effectively:

testdb=# explain select * from streets where link_id in (1,2,3);
  QUERY PLAN

---
 Bitmap Heap Scan on streets  (cost=6.02..16.08 rows=5 width=393)
   Recheck Cond: ((link_id = 1) OR (link_id = 2) OR (link_id = 3))
   -  BitmapOr  (cost=6.02..6.02 rows=5 width=0)
 -  Bitmap Index Scan on streets_link_id_idx  (cost=0.00..2.01
rows=2 width=0)
   Index Cond: (link_id = 1)
 -  Bitmap Index Scan on streets_link_id_idx  (cost=0.00..2.01
rows=2 width=0)
   Index Cond: (link_id = 2)
 -  Bitmap Index Scan on streets_link_id_idx  (cost=0.00..2.01
rows=2 width=0)
   Index Cond: (link_id = 3)
(9 rows)


And on the net, I found a nice trick via an array flattening function,
which at least uses a nested loop of index scans instead of an index
bitmap scan:

testdb=# CREATE FUNCTION flatten_array(anyarray) RETURNS SETOF
anyelement AS
testdb-# 'SELECT ($1)[i] FROM (SELECT
generate_series(array_lower($1,1),array_upper($1,1)) as i) as foo;'
testdb-# language SQL STRICT IMMUTABLE;


testdb=# EXPLAIN SELECT * from streets JOIN flatten_array(ARRAY[1,2,3])
on flatten_array=link_id;
QUERY PLAN

--
 Nested Loop  (cost=0.00..5882.15 rows=1566 width=397)
   -  Function Scan on flatten_array  (cost=0.00..12.50 rows=1000 width=4)
   -  Index Scan using treets_link_id_idx on streets  (cost=0.00..5.84
rows=2 width=393)
 Index Cond: (outer.flatten_array = streets.link_id)
(4 rows)


Currently, we're planning to use the array flattening approach, but are
there any plans to enhance the query planner for the direct ARRAY approach?

Thanks,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Recovery will take 10 hours

2006-04-24 Thread Markus Schaber
Hi, Brandan,

Brendan Duddridge wrote:

 So how do you overlap the restore process with the retrieving of files?

You need a shell script as restore command that does both uncompressing
the current file, and starting a background decompress of the next
file(s). It also has to check whether the current file is already in
progress from a last run, and wait until this is finished instead of
decompressing it. Seems to be a little complicated than it sounds first.

 restore_command = 'gunzip /wal_archive/%f.gz%p '

Warning: Don't do it this way!

It will break things because PostgreSQL will try to access a
not-completely-restored wal file.


HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Introducing a new linux readahead framework

2006-04-21 Thread Markus Schaber
Hi, Wu,

Wu Fengguang wrote:

In adaptive readahead, the context based method may be of particular
interest to postgresql users. It works by peeking into the file cache
and check if there are any history pages present or accessed. In this
way it can detect almost all forms of sequential / semi-sequential read
patterns, e.g.
 - parallel / interleaved sequential scans on one file
 - sequential reads across file open/close
 - mixed sequential / random accesses
 - sparse / skimming sequential read

It also have methods to detect some less common cases:
 - reading backward
 - seeking all over reading N pages

Gread news, thanks!

 This call will disable readahead totally for fd:
 posix_fadvise(fd, any, any, POSIX_FADV_RANDOM);
 
 This one will reenable it:
 posix_fadvise(fd, any, any, POSIX_FADV_NORMAL);
 
 This one will enable readahead _and_ set max readahead window to
 2*max_readahead_kb:
 posix_fadvise(fd, any, any, POSIX_FADV_SEQUENTIAL);

I think that this is an easy, understandable and useful interpretation
of posix_fadvise() hints.


Are there any rough estimates when this will get into mainline kernel
(if you intend to submit)?

Thanks,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Quick Performance Poll

2006-04-20 Thread Markus Schaber
Hi, Luke,

Luke Lonergan wrote:

 The current drawback to bitmap index is that it isn't very maintainable
 under insert/update, although it is safe for those operations.  For now, you
 have to drop index, do inserts/updates, rebuild index.

So they effectively turn the table into a read-only table for now.

Are they capable to index custom datatypes like the PostGIS geometries
that use the GIST mechanism? This could probably speed up our Geo
Databases for Map rendering, containing static data that is updated
approx. 2 times per year.


Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Inserts optimization?

2006-04-19 Thread Markus Schaber
Hi, Magnus,

Magnus Hagander wrote:

 Bacula already serializes access to the database (they have to support
 mysql/myisam), so this shouldn't help.

Ouch, that hurts.

To support mysql, they break performance for _every other_ database system?

cynism
Now, I understand how the mysql people manage to spread the legend of
mysql being fast. They convince software developers to thwart all others.
/

Seriously: How can we convince developers to either fix MySQL or abandon
and replace it with a database, instead of crippling client software?

 Actually, it might well hurt by introducing extra delays.

Well, if you read the documentation, you will see that it will only wait
if there are at least commit_siblings other transactions active. So when
Bacula serializes access, there will be no delays, as there is only a
single transaction alive.


HTH
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Inserts optimization?

2006-04-19 Thread Markus Schaber
Hi, Magnus,

Magnus Hagander wrote:

To support mysql, they break performance for _every other_ 
database system?
 Actually, it probably helps on SQLite as well.

AFAICS from the FAQ http://www.sqlite.org/faq.html#q7 and #q8, SQLite
does serialize itsself.

 And considering they only
 support postgresql, mysql and sqlite, there is some merit to it from
 their perspective.

Okay, I understand, but I hesitate to endorse it.

IMHO, they should write their application in a normal way, and then
have the serialization etc. encapsulated in the database driver
interface (possibly a wrapper class or so).

cynism
Now, I understand how the mysql people manage to spread the 
legend of mysql being fast. They convince software developers 
to thwart all others.
/
 Yes, same as the fact that most (at least FOSS) web project-du-jour are
 dumbed down to the mysql featureset. (And not just mysql, but
 mysql-lowest-common-factors, which means myisam etc)

Well, most of those projects don't need a database, they need a bunch of
tables and a lock.

Heck, they even use client-side SELECT-loops in PHP instead of a JOIN
because I always confuse left and right.


Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Inserts optimization?

2006-04-18 Thread Markus Schaber
Hi, Francisco,

Francisco Reyes wrote:

 I only wonder what is safer.. using a second or two in commit_delay or
 using fsync = off.. Anyone cares to comment?

It might be that you misunderstood commit_delay. It will not only delay
the disk write, but also block your connnection until the write actually
is performed.

It will rise the throughput in multi-client scenarios, but will also
rise the latency, and it will absolutely bring no speedup in
single-client scenarios.

It does not decrease safety (in opposite to fsync=off), data will be
consistent, and any application that has successfully finished a commit
can be shure their data is on the platters.[1]

HTH,
Markus

[1] As long as the platters don't lie, but that's another subject.

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] mergehashloop

2006-04-18 Thread Markus Schaber
Hi, Tom,

Tom Lane wrote:

 Well, the other thing that's going on here is that we know we are
 overestimating the cost of nestloop-with-inner-indexscan plans.
 The current estimation for that is basically outer scan cost plus N
 times inner scan cost where N is the estimated number of outer tuples;
 in other words the repeated indexscan probes are each assumed to happen
 from a cold start.  In reality, caching of the upper levels of the index
 means that the later index probes are much cheaper than this model
 thinks.  We've known about this for some time but no one's yet proposed
 a more reasonable cost model.

My spontaneus guess would be to use log(N)*inner instead of N*inner. I
don't have any backings for that, it's just what my intuition tells me
as a first shot.

 In my mind this is tied into another issue, which is that the planner
 always costs on the basis of each query starting from zero.  In a real
 environment it's much cheaper to use heavily-used indexes than this cost
 model suggests, because they'll already be swapped in due to use by
 previous queries.  But we haven't got any infrastructure to keep track
 of what's been heavily used, let alone a cost model that could make use
 of the info.

An easy first approach would be to add a user tunable cache probability
value to each index (and possibly table) between 0 and 1. Then simply
multiply random_page_cost with (1-that value) for each scan.

Later, this value could be automatically tuned by stats analysis or
other means.

 I think part of the reason that people commonly reduce random_page_cost
 to values much lower than physical reality would suggest is that it
 provides a crude way of partially compensating for this basic problem.

I totall agree with this, it's just what we did here from time to time. :-)

Hmm, how does effective_cach_size correspond with it? Shouldn't a high
effective_cache_size have a similar effect?

Thanks,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Better index stategy for many fields with few values

2006-04-13 Thread Markus Schaber
Hi, Jim,

Jim C. Nasby wrote:

I was also thinking about about using a functional index.
If there's a logical relation between those values that they can easily
combined, that may be a good alternative.
 How would that be any better than just doing a multi-column index?

10 different values per column, and 20 columns are 10^20 value combinations.

Partitioning it for the first column gives 10^19 combinations which is
smaller than 2^64, and thus fits into a long value.

And I just guess that a 10-partition functional index on a long value
could perform better than a multi-column index on 20 columns of
character(10), if only because it is approx. 1/25th in size.

HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Better index stategy for many fields with few values

2006-04-13 Thread Markus Schaber
Hi, Jim,

Jim Nasby wrote:
 Adding -performance back in
 I would like to try it.
 
 However in an other post I added that contrary to what I stated
 initially all the paramXX columns are not mandatory in the query. So
 it seems that requirement make the problem more complexe.

Okay, this rules out my functional index over 19 columns.

 Doesn't this new requirement rule out this solution? 
 
 No, just group the columns logically.

Yes, that's the solution.

If you have common groups of columns that appear and disappear
synchroneously, pack those together in an (possibly partitioned and/or
functional) index.

Then rely on the query planner that the combines the appropriate indices
via index bitmap scan.

  By the way I have test to index each column individually and check
 what happens in relation to bitscan map. My test table  is 1
 million  rows. The explain analyze command shows that a bit scan is
 sometimes used but I still end up with queries that can take up to
 10s which is way to much.

Is it on the first query, or on repeated queries?

It might be that you're I/O bound, and the backend has to fetch indices
and rows from Disk into RAM.

I currently don't know whether the order of indices in a multi-index
bitmap scan is relevant, but I could imagine that it may be useful to
have the most selective index scanned first.

And keep in mind that, assuming an equal distribution of your
parameters, every index bitmap hits 1/10th of the whole table on
average, so the selectivity generally is low.

The selectivity of a partitioned 3-column index will be much better
(about 1/1th of the whole table), and less index scans and bitmaps
have to be generated.

A functional index may also make sense to CLUSTER the table to optimize
the locality of search results (and so reducing disk I/O). In case your
table has low write activity, but high read-only activity, the overhead
that comes with the additional index is neglible compared to the
performance improvement proper CLUSTERing can generate.

Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] pgmemcache

2006-04-13 Thread Markus Schaber
Hi, Tom,

Tom Lane wrote:

Why are AFTER COMMIT triggers impossible?
 
 What happens if such a trigger gets an error?  You can't un-commit.

Then it must be specified that those triggers are in their own
transaction, and cannot abort the transaction.

Or use the 2-phase-commit infrastructure for them.

Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Better index stategy for many fields with few values

2006-04-12 Thread Markus Schaber
Hi, Oscar,

Oscar Picasso wrote:

 [ all the 20 paramXX columns are used in the query}

 How can I optimize this kind of query?

PostgreSQL 8.1 has so-called bitmap index scans, which can combine
several index scans before actually accessing the data.

So I think it's best to create an index on each of the paramXX columns,
and see with EXPLAIN ANALYZE what it is doing.

 I was thinking about using a multicolumns index, but I have read that
 we should limit multicolumns indice to at most 2 or 3 columns.

Yes, that's true, the index overhead gets too high.

 If that's true then 22 columns for a multicolumn incdex seems way too
 much. Or maybe it is workable as every column uses only a very limited
 set of values?

Yes, I think that a 22 column index is way too much, especially with the
new bitmap index scans available.

 I was also thinking about about using a functional index.

If there's a logical relation between those values that they can easily
combined, that may be a good alternative.


I just had another weird idea:

As your paramXX values can have only 10 parameters, it also might be
feasible to use a bunch of 10 conditional indices, like:

CREATE INDEX foo1 ON table (param1, param2 WHERE param0='1st value';
CREATE INDEX foo2 ON table (param1, param2 WHERE param0='2nd value';
CREATE INDEX foo3 ON table (param1, param2 WHERE param0='3rd value';
[...]

This way, you don't have the index bloat of a 3-column index, but 10
2-column indices that cover 1/10th of the table each.

For 22 columns, you'd need a bunch of seven such indices plus a
single-column one, or can use some 3+1 and some 2+1 column index.

I'd like to see the query plans from explain analyze.

Btw, I expect query planning time to get rather significant for so much
columns, so gequo tuning, tuning work_mem (for the bitmap scans) and
prepared statements will pay off.

HTH,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Indexes with descending date columns

2006-04-11 Thread Markus Schaber
Hi, Bruce,

Bruce Momjian wrote:

Ahh. There's a hack to do that by defining a new opclass that reverses 
and , and then doing ORDER BY project_id, id, date USING new_opclass.

I think there's a TODO about this, but I'm not sure...
 
 Yes, and updated:
 
   * Allow the creation of indexes with mixed ascending/descending
 specifiers
   
 This is possible now by creating an operator class with reversed sort
 operators.  One complexity is that NULLs would then appear at the 
 start
 of the result set, and this might affect certain sort types, like
 merge join.

I think it would be better to allow index zig-zag scans for
multi-column index.[1]

So it traverses in a given order on the higher order column, and the sub
trees for each specific high order value is traversed in reversed order.
From my knowledge at least of BTrees, and given correct commutator
definitions, this should be not so complicated to implement.[2]

This would allow the query planner to use the same index for arbitrary
ASC/DESC combinations of the given columns.


Just a thought,
Markus


[1] It may make sense to implement the mixed specifiers on indices as
well, to allow CLUSTERing on mixed search order.

[2] But I admit that I currently don't have enough knowledge in
PostgreSQL index scan internals to know whether it really is easy to
implement.


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Sun Fire T2000 and PostgreSQL 8.1.3

2006-04-06 Thread Markus Schaber
Hi, Juan,

Juan Casero (FL FLC) wrote:
 Ok that is beginning to become clear to me.  Now I need to determine if
 this server is worth the investment for us.  Maybe it is not a speed
 daemon but to be honest the licensing costs of an SMP aware RDBMS is
 outside our budget.  When postgresql starts does it start up a super
 server process and then forks copies of itself to handle incoming
 requests? 

It starts a super server process (Postmaster) and some background
processes (background writer, stats collector). For each incoming
connection, the postmaster forks a single-threaded backend process,
which handles all queries and transactions on this connection, and
terminates when the connection terminates.

So as a thumb-rule, each connection can utilize only a single CPU. You
can utilize a few more CPUs than you have simultaneous connections, due
to the background processes and the OS needing CPU for I/O, but thats
rather marginal.

AFAIK, Bizgres MPP has extended the backend processes to be multi
threaded, so a single connection can utilize several CPUs for some types
of queries (large data sets, sorting/joining/hashing etc.). Btw, I
presume that they might offer you a free test license, and I also
presume their license fee is much lower than Oracle or DB/2.

 Or do I have to specify how many server processes should be
 started up? 

You can limit the number of server processes by setting the maximum
connection limit.

 I figured maybe I can take advantage of the multiple cpu's
 on this system by starting up enough postgres server processes to handle
 large numbers of incoming connections.  I have this server available for
 sixty days so I may as well explore the performance of postgresql on it.

Yes, you can take advantage if you have multiple clients (e. G. a wep
app, that's what the T2000 / Niagara were made for). You have a Tomcat
or Jboss sitting on it, each http connection forks its own thread. Each
customer has its own CPU :-)

Then use a connection pool to PostgreSQL, and you're fine. The more
customers, the more CPUs are utilized.

But beware, if you have floating point maths, it will be very slow. All
8 CPUs / 32 Threads share a single FPU. So if you need floating point
(e. G. Mapserver, PostGIS geoprocessing, Java2D chart drawing or
something), T2000 is not the right thing for you.


HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Decide between Postgresql and Mysql (help of

2006-03-30 Thread Markus Schaber
Hi, Craig,

Craig A. James wrote:

 I hope this was just a joke.  You should be sure to clarify - there
 might be some newbie out there who thinks you are seriously suggesting
 coding major web sites in some old-fashioned compiled language.

No, but perhaps with a CMS that pregenerates static content, or
http://www.tntnet.org/

Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Index scan startup time

2006-03-30 Thread Markus Schaber
Hi, Peter,

Peter Eisentraut wrote:
The table has seen VACUUM FULL and REINDEX before this.
But no analyze?
 ANALYZE as well, but the plan choice is not the point anyway.

Maybe you could add a combined Index on activity_id and state, or (if
you use this kind of query more often) a conditional index on
activity_id where state in (1,10001).

Btw, PostgreSQL 8.1 could AND two bitmap index scans on the activity and
state indices, and get the result faster (i presume).

Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Slow performance on Windows .NET and OleDb

2006-03-29 Thread Markus Schaber
Hi, Greg,

Greg Quinn wrote:
 I populate 3000 records into the table to test PostGreSql's speed.
 It takes about 3-4 seconds.
 When you do the population, is it via inserts or copy?
 Via insert

Are those inserts encapsulated into a single transaction? If not, that's
the reason why it's so slow, every transaction sync()s through to the disk.

Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] count(*) performance

2006-03-28 Thread Markus Schaber
Gábriel Ákos wrote:

 I thought that too. Autovacuum is running on our system but it didn't do
 the trick. Anyway the issue is solved, thank you all for helping. :)

Hi, Gabriel, it may be that your Free Space Map (FSM) setting is way to
low.

Try increasing it.

Btw, VACUUM outputs a Warning if FSM is not high enough, maybe you can
find useful hints in the log file.

HTH
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] simple join uses indexes, very slow

2006-03-28 Thread Markus Schaber
Hi, George,

george young wrote:

Looks like a hash join might be faster. What is your work_mem set to?
 
 work_mem= 1024

This is 1 Megabyte. By all means, increase it, if possible.

HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Massive Inserts Strategies

2006-03-28 Thread Markus Schaber
Hi, ashah,

ashah wrote:
 I tried this solution, but ran into following problem.
 
 The temp_table has columns (col1, col2, col3).
 
 The original_table has columns (col0, col1, col2, col3)

 Now the extra col0 on the original_table is the unique generated ID by
 the database.

INSERT INTO original_table (col1, col2, col3) SELECT col1, col2, col3
FROM temp_table WHERE ...

HTH,
Markus




-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


  1   2   >