Re: [PERFORM] filesystem performance with lots of files

2005-12-02 Thread David Lang

On Fri, 2 Dec 2005, Qingqing Zhou wrote:



I don't have all the numbers readily available (and I didn't do all the
tests on every filesystem), but I found that even with only 1000
files/directory ext3 had some problems, and if you enabled dir_hash some
functions would speed up, but writing lots of files would just collapse
(that was the 80 min run)



Interesting. I would suggest test small number but bigger file would be
better if the target is for database performance comparison. By small
number, I mean 10^2 - 10^3; By bigger, I mean file size from 8k to 1G
(PostgreSQL data file is at most this size under normal installation).


I agree, that round of tests was done on my system at home, and was in 
response to a friend who had rsync over a local lan take > 10 hours for 
<10G of data. but even so it generated some interesting info. I need to 
make a more controlled run at it though.



Let's take TPCC as an example, if we get a TPCC database of 500 files,
each one is at most 1G (PostgreSQL has this feature/limit in ordinary
installation), then this will give us a 500G database, which is big enough
for your current configuration.

Regards,
Qingqing



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Database restore speed

2005-12-02 Thread Luke Lonergan
David, 

> Luke, would it help to have one machine read the file and 
> have it connect to postgres on a different machine when doing 
> the copy? (I'm thinking that the first machine may be able to 
> do a lot of the parseing and conversion, leaving the second 
> machine to just worry about doing the writes)

Unfortunately not - the parsing / conversion core is in the backend,
where it should be IMO because of the need to do the attribute
conversion there in the machine-native representation of the attributes
(int4, float, etc) in addition to having the backend convert from client
encoding (like LATIN1) to the backend encoding (like UNICODE aka UTF8).

There are a few areas of discussion about continued performance
increases in the codebase for COPY FROM, here are my picks:
- More micro-optimization of the parsing and att conversion core - maybe
100% speedup in the parse/convert stage is possible
- A user selectable option to bypass transaction logging, similar to
Oracle's
- A well-defined binary input format, like Oracle's SQL*Loader - this
would bypass most parsing / att conversion
- A direct-to-table storage loader facility - this would probably be the
fastest possible load rate

- Luke


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 15,000 tables

2005-12-02 Thread Ron

Agreed, and I apologize for the imprecision of my post below.

I should have written:
"Best practice seems to be to use a journaling fs and log metadata 
only and put it on separate dedicated spindles."


I've seen enough HD failures that I tend to be paranoid and log the 
metadata of fs dedicated to WAL as well, but that may very well be overkill.


Ron

At 01:57 PM 12/1/2005, Tom Lane wrote:

Ron <[EMAIL PROTECTED]> writes:
> Agreed.  Also the odds of fs corruption or data loss are higher in a
> non journaling fs.  Best practice seems to be to use a journaling fs
> but to put the fs log on dedicated spindles separate from the actual
> fs or pg_xlog.

I think we've determined that best practice is to journal metadata only
(not file contents) on PG data filesystems.  PG does expect the filesystem
to remember where the files are, so you need metadata protection, but
journalling file content updates is redundant with PG's own WAL logging.

On a filesystem dedicated to WAL, you probably do not need any
filesystem journalling at all --- we manage the WAL files in a way
that avoids changing metadata for a WAL file that's in active use.
A conservative approach would be to journal metadata here too, though.

regards, tom lane





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] 15,000 tables

2005-12-02 Thread Michael Stone

On Fri, Dec 02, 2005 at 03:15:00AM -0500, Ron wrote:
I've seen enough HD failures that I tend to be paranoid and log the 
metadata of fs dedicated to WAL as well, but that may very well be overkill.


Especially since it wouldn't gain anything. Journalling doesn't give you
any advantage whatsoever in the face of a HD failure.

Mike Stone

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] COPY into table too slow with index: now an I/O

2005-12-02 Thread Michael Stone

On Fri, Dec 02, 2005 at 12:15:57AM -0500, Luke Lonergan wrote:

That's good to know - makes sense.  I suppose we might still thrash over
a 1GB range in seeks if the BG writer starts running at full rate in the
background, right?  Or is there some write combining in the BG writer?


That part your OS should be able to handle. Those writes aren't synced,
so the OS has plenty of opportunity to buffer & aggregate them.

Mike Stone

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] 15,000 tables

2005-12-02 Thread Alex Stapleton


On 1 Dec 2005, at 16:03, Tom Lane wrote:


Michael Riess <[EMAIL PROTECTED]> writes:

(We NEED that many tables, please don't recommend to reduce them)


No, you don't.  Add an additional key column to fold together  
different

tables of the same structure.  This will be much more efficient than
managing that key at the filesystem level, which is what you're
effectively doing now.

(If you really have 15000 distinct rowtypes, I'd like to know what
your database design is...)



Won't you end up with awful seek times if you just want data which  
previously been stored in a single table? E.g. whilst before you  
wanted 1000 contiguous rows from the table, now you want 1000 rows  
which now have 1000 rows you don't care about in between each one you  
do want.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] 15,000 tables

2005-12-02 Thread Alex Stapleton


On 2 Dec 2005, at 14:16, Alex Stapleton wrote:



On 1 Dec 2005, at 16:03, Tom Lane wrote:


Michael Riess <[EMAIL PROTECTED]> writes:

(We NEED that many tables, please don't recommend to reduce them)


No, you don't.  Add an additional key column to fold together  
different

tables of the same structure.  This will be much more efficient than
managing that key at the filesystem level, which is what you're
effectively doing now.

(If you really have 15000 distinct rowtypes, I'd like to know what
your database design is...)



Won't you end up with awful seek times if you just want data which  
previously been stored in a single table? E.g. whilst before you  
wanted 1000 contiguous rows from the table, now you want 1000 rows  
which now have 1000 rows you don't care about in between each one  
you do want.




I must of had a total and utter failure of intellect for a moment  
there. Please ignore that :P


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] 15,000 tables

2005-12-02 Thread Jan Wieck

On 12/1/2005 2:34 PM, Michael Riess wrote:

VACUUM FULL was probably always overkill, unless "always" includes
versions prior to 7.3...


Well, we tried switching to daily VACUUM ANALYZE and weekly VACUUM FULL, 
but the database got considerably slower near the end of the week.


This indicates that you have FSM settings that are inadequate for that 
many tables and eventually the overall size of your database. Try 
setting those to


max_fsm_relations = 8
max_fsm_pages = (select sum(relpages) / 2 from pg_class)

Another thing you might be suffering from (depending on the rest of your 
architecture) is file descriptor limits. Especially if you use some sort 
of connection pooling or persistent connections like PHP, you will have 
all the backends serving multiple of your logical applications (sets of 
30 tables). If on average one backend is called for 50 different apps, 
then we are talking 50*30*4=6000 files accessed by that backend. 80/20 
rule leaves 1200 files in access per backend, thus 100 active backends 
lead to 120,000 open (virtual) file descriptors. Now add to that any 
files that a backend would have to open in order to evict an arbitrary 
dirty block.


With a large shared buffer pool and little more aggressive background 
writer settings, you can avoid mostly that regular backends would have 
to evict dirty blocks.


If the kernel settings allow Postgres to keep that many file descriptors 
open, you avoid directory lookups.



Jan

--
#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Network permormance under windows

2005-12-02 Thread Teracat

Hello,

We used Postgresql 7.1 under Linux and recently we have changed it to 
Postgresql 8.1 under Windows XP. Our application uses ODBC and when we 
try to get some information from the server throw a TCP connection, it's 
very slow. We have also tried it using psql and pgAdmin III, and we get 
the same results. If we try it locally, it runs much faster.


We have been searching the mailing lists, we have found many people with 
the same problem, but we haven't found any final solution.


How can we solve this? Any help will be appreciated.

Thanks in advance.

Jordi.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Network permormance under windows

2005-12-02 Thread Merlin Moncure
> We used Postgresql 7.1 under Linux and recently we have changed it to
> Postgresql 8.1 under Windows XP. Our application uses ODBC and when we
> try to get some information from the server throw a TCP connection,
it's
> very slow. We have also tried it using psql and pgAdmin III, and we
get
> the same results. If we try it locally, it runs much faster.
> 
> We have been searching the mailing lists, we have found many people
with
> the same problem, but we haven't found any final solution.
> 
> How can we solve this? Any help will be appreciated.
> 
> Thanks in advance.
> 
by any chance are you working with large tuples/columns (long text,
bytea, etc)?

Also please define slow.

Merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] pg_dump slow

2005-12-02 Thread Merlin Moncure
> 
> That was the command used to restore a database
> 
> pg_restore.exe -i -h localhost -p 5432 -U postgres -d temp2 -v
> "D:\d\temp.bkp"
> 
> The database was created before using LATIN1 charset
> 
> With 100 rows you can´t feel the test, then I decided send the whole
> table.
> 
> Very Thanks
> 
> Franklin Haut

How are you dumping out your archive?  I confirmed unreasonably slow dump with 
pg_dump -Z temp2 > temp2.bkp on windows 2000 server.  I normally use bzip to 
compress my dumps.

Can you measure time to dump uncompressed and also with bzip and compare?

Merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] pg_dump slow

2005-12-02 Thread Merlin Moncure
> How are you dumping out your archive?  I confirmed unreasonably slow
dump
> with pg_dump -Z temp2 > temp2.bkp on windows 2000 server.  I normally
use
> bzip to compress my dumps.
> 
> Can you measure time to dump uncompressed and also with bzip and
compare?
> 
> Merlin

oops...cancel that.  I was dumping the wrong database. Dumping your
table from localhost on a dual Opteron win2k server took a few seconds
with Z=0 and Z=9.

Merlin

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Open request for benchmarking input (fwd)

2005-12-02 Thread Qingqing Zhou

"David Lang" <[EMAIL PROTECTED]> wrote
> here are the suggestions from the MySQL folks, what additional tests 
> should I do.
>

I think the tests you list are enough in this stage,

Regards,
Qingqing 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] two disks - best way to use them?

2005-12-02 Thread Rick Schumeyer








I installed another drive in my linux pc in an attempt to
improve performance

on a large COPY to a table with a geometry index.

 

Based on previous discussion, it seems there are three
things competing for the hard

drive:

 

1)  
the input data file

2)  
the pg table

3)  
the WAL

 

What is the best way to distribute these among two
drives?  From Tom’s comments

I would think that the pg table and the WAL should be
separate.  Does it matter where

the input data is?








Re: [PERFORM] Network permormance under windows

2005-12-02 Thread Josep Maria Pinyol Fontseca


Dear Merlin,

For instance, we have this table (with 22900 tuples):

CREATE TABLE tbl_empresa
(
id_empresa int4 NOT NULL DEFAULT nextval(('seq_empresa'::text)::regclass),
ref_poblacio int4 NOT NULL,
nom varchar(50) NOT NULL,
nif varchar(12),
carrer varchar(50),
telefon varchar(13),
fax varchar(13),
email varchar(50),
lab_materials int2 DEFAULT 0,
web varchar(50),
ref_empresa int4,
ref_classificacio_empresa int4,
ref_sector_empresa int4,
control int2,
origen_volcat int2,
data_modificacio date,
plantilla int4,
tamany int2,
autoritzacio_email int2,
ref_estat_empresa int2,
CONSTRAINT tbl_clients_pkey PRIMARY KEY (id_empresa),
CONSTRAINT fk_tbl_empresa_ref_classificacio_emp FOREIGN KEY 
(ref_classificacio_empresa)
REFERENCES tbl_classificacio_empresa (id_classificacio_empresa) MATCH 
SIMPLE

ON UPDATE RESTRICT ON DELETE RESTRICT,
CONSTRAINT fk_tbl_empresa_ref_empresa FOREIGN KEY (ref_empresa)
REFERENCES tbl_empresa (id_empresa) MATCH SIMPLE
ON UPDATE RESTRICT ON DELETE RESTRICT,
CONSTRAINT fk_tbl_empresa_ref_estat_emp FOREIGN KEY (ref_estat_empresa)
REFERENCES tbl_estat_empresa (id_estat_empresa) MATCH SIMPLE
ON UPDATE RESTRICT ON DELETE RESTRICT,
CONSTRAINT fk_tbl_empresa_ref_poblacio FOREIGN KEY (ref_poblacio)
REFERENCES tbl_poblacions (id_poblacio) MATCH SIMPLE
ON UPDATE RESTRICT ON DELETE RESTRICT,
CONSTRAINT fk_tbl_empresa_ref_sector_emp FOREIGN KEY (ref_sector_empresa)
REFERENCES tbl_sector_empresa (id_sector_empresa) MATCH SIMPLE
ON UPDATE RESTRICT ON DELETE RESTRICT
)
WITH OIDS;

When we select all data in local machine, we obtain results in 2-3 
seconds aprox. In remote connections:


Postgresql 7.1 usign pgAdminII:
Network traffic generated with remote applications is about 77-80% in a 
10Mb connection.

6 seconds aprox.

Postgresql 8.1 usign pgAdminIII:
Network traffic generated with remote applications is about 2-4% in a 
10Mb connection.

12 seconds or more...

I feel that is a problem with TCP_NODELAY of socket options... but I 
don't know.


Josep Maria


En/na Merlin Moncure ha escrit:


We used Postgresql 7.1 under Linux and recently we have changed it to
Postgresql 8.1 under Windows XP. Our application uses ODBC and when we
try to get some information from the server throw a TCP connection,
   


it's
 


very slow. We have also tried it using psql and pgAdmin III, and we
   


get
 


the same results. If we try it locally, it runs much faster.

We have been searching the mailing lists, we have found many people
   


with
 


the same problem, but we haven't found any final solution.

How can we solve this? Any help will be appreciated.

Thanks in advance.

   


by any chance are you working with large tuples/columns (long text,
bytea, etc)?

Also please define slow.

Merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
 




--

Josep Maria Pinyol i Fontseca
Responsable àrea de programació

ENDEPRO - Enginyeria de programari
Passeig Anselm Clavé, 19 Bx. 08263 Callús (Barcelona)
Tel. +34 936930018 - Mob. +34 600310755 - Fax. +34 938361994
[EMAIL PROTECTED] - http://www.endepro.com


Aquest missatge i els documents en el seu cas adjunts, 
es dirigeixen exclusivament al seu destinatari i poden contenir 
informació reservada i/o CONFIDENCIAL, us del qual no està 
autoritzat ni la divulgació del mateix, prohibit per la legislació 
vigent (Llei 32/2002 SSI-CE). Si ha rebut aquest missatge per error, 
li demanem que ens ho comuniqui immediatament per la mateixa via o 
bé per telèfon (+34936930018) i procedeixi a la seva destrucció. 
Aquest e-mail no podrà considerar-se SPAM.


Este mensaje, y los documentos en su caso anexos, 
se dirigen exclusivamente a su destinatario y pueden contener 
información reservada y/o CONFIDENCIAL cuyo uso no 
autorizado o divulgación está prohibida por la legislación 
vigente (Ley 32/2002 SSI-CE). Si ha recibido este mensaje por error, 
le rogamos que nos lo comunique inmediatamente por esta misma vía o 
por teléfono (+34936930018) y proceda a su destrucción. 
Este e-mail no podrá considerarse SPAM.


This message and the enclosed documents are directed exclusively 
to its receiver and can contain reserved and/or confidential 
information, from which use isn’t allowed its divulgation, forbidden 
by the current legislation (Law 32/2002 SSI-CE). If you have received 
this message by mistake, we kindly ask you to communicate it to us 
right away by the same way or by phone (+34936930018) and destruct it. 
This e-mail can’t be considered as SPAM. 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] two disks - best way to use them?

2005-12-02 Thread Ron

At 01:58 PM 12/2/2005, Rick Schumeyer wrote:

I installed another drive in my linux pc in an attempt to improve performance
on a large COPY to a table with a geometry index.

Based on previous discussion, it seems there are three things 
competing for the hard drive:


1)   the input data file
2)   the pg table
3)   the WAL

What is the best way to distribute these among two drives?  From 
Tom's comments
I would think that the pg table and the WAL should be 
separate.  Does it matter where the input data is?


Best is to have 3 HD or HD sets, one for each of the above.

With only 2, and assuming the input file is too large to fit 
completely into RAM at once, I'd test to see whether:

a=  input on one + pg table & WAL on the other, or
b= WAL on one + pg table & input file on the other
is best.

If the input file can be made 100% RAM resident, then use
c= pg table on one + WAL and input file on the other.

The big goal here is to minimize HD head seeks.

Ron



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Database restore speed

2005-12-02 Thread Stephen Frost
* Luke Lonergan ([EMAIL PROTECTED]) wrote:
> > Luke, would it help to have one machine read the file and 
> > have it connect to postgres on a different machine when doing 
> > the copy? (I'm thinking that the first machine may be able to 
> > do a lot of the parseing and conversion, leaving the second 
> > machine to just worry about doing the writes)
> 
> Unfortunately not - the parsing / conversion core is in the backend,
> where it should be IMO because of the need to do the attribute
> conversion there in the machine-native representation of the attributes
> (int4, float, etc) in addition to having the backend convert from client
> encoding (like LATIN1) to the backend encoding (like UNICODE aka UTF8).

Just a thought, but couldn't psql be made to use the binary mode of
libpq and do at least some of the conversion on the client side?  Or
does binary mode not work with copy (that wouldn't suprise me, but
perhaps copy could be made to support it)?

The other thought, of course, is that you could use PITR for your
backups instead of pgdump...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] Network permormance under windows

2005-12-02 Thread me
we experienced the same. had 2 win2003 servers - www and db connected to the 
same router through 100mbit. the performance was quite bad. now we run the 
db on the same machine as the web and everything runs smooth.


cheers,
thomas


- Original Message - 
From: "Josep Maria Pinyol Fontseca" <[EMAIL PROTECTED]>

Cc: ; <[EMAIL PROTECTED]>
Sent: Friday, December 02, 2005 6:24 PM
Subject: Re: [PERFORM] Network permormance under windows




Dear Merlin,

For instance, we have this table (with 22900 tuples):

CREATE TABLE tbl_empresa
(
id_empresa int4 NOT NULL DEFAULT nextval(('seq_empresa'::text)::regclass),
ref_poblacio int4 NOT NULL,
nom varchar(50) NOT NULL,
nif varchar(12),
carrer varchar(50),
telefon varchar(13),
fax varchar(13),
email varchar(50),
lab_materials int2 DEFAULT 0,
web varchar(50),
ref_empresa int4,
ref_classificacio_empresa int4,
ref_sector_empresa int4,
control int2,
origen_volcat int2,
data_modificacio date,
plantilla int4,
tamany int2,
autoritzacio_email int2,
ref_estat_empresa int2,
CONSTRAINT tbl_clients_pkey PRIMARY KEY (id_empresa),
CONSTRAINT fk_tbl_empresa_ref_classificacio_emp FOREIGN KEY 
(ref_classificacio_empresa)
REFERENCES tbl_classificacio_empresa (id_classificacio_empresa) MATCH 
SIMPLE

ON UPDATE RESTRICT ON DELETE RESTRICT,
CONSTRAINT fk_tbl_empresa_ref_empresa FOREIGN KEY (ref_empresa)
REFERENCES tbl_empresa (id_empresa) MATCH SIMPLE
ON UPDATE RESTRICT ON DELETE RESTRICT,
CONSTRAINT fk_tbl_empresa_ref_estat_emp FOREIGN KEY (ref_estat_empresa)
REFERENCES tbl_estat_empresa (id_estat_empresa) MATCH SIMPLE
ON UPDATE RESTRICT ON DELETE RESTRICT,
CONSTRAINT fk_tbl_empresa_ref_poblacio FOREIGN KEY (ref_poblacio)
REFERENCES tbl_poblacions (id_poblacio) MATCH SIMPLE
ON UPDATE RESTRICT ON DELETE RESTRICT,
CONSTRAINT fk_tbl_empresa_ref_sector_emp FOREIGN KEY (ref_sector_empresa)
REFERENCES tbl_sector_empresa (id_sector_empresa) MATCH SIMPLE
ON UPDATE RESTRICT ON DELETE RESTRICT
)
WITH OIDS;

When we select all data in local machine, we obtain results in 2-3 seconds 
aprox. In remote connections:


Postgresql 7.1 usign pgAdminII:
Network traffic generated with remote applications is about 77-80% in a 
10Mb connection.

6 seconds aprox.

Postgresql 8.1 usign pgAdminIII:
Network traffic generated with remote applications is about 2-4% in a 10Mb 
connection.

12 seconds or more...

I feel that is a problem with TCP_NODELAY of socket options... but I don't 
know.


Josep Maria


En/na Merlin Moncure ha escrit:


We used Postgresql 7.1 under Linux and recently we have changed it to
Postgresql 8.1 under Windows XP. Our application uses ODBC and when we
try to get some information from the server throw a TCP connection,


it's


very slow. We have also tried it using psql and pgAdmin III, and we


get


the same results. If we try it locally, it runs much faster.

We have been searching the mailing lists, we have found many people


with


the same problem, but we haven't found any final solution.

How can we solve this? Any help will be appreciated.

Thanks in advance.



by any chance are you working with large tuples/columns (long text,
bytea, etc)?

Also please define slow.

Merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster




--

Josep Maria Pinyol i Fontseca
Responsable àrea de programació

ENDEPRO - Enginyeria de programari
Passeig Anselm Clavé, 19 Bx. 08263 Callús (Barcelona)
Tel. +34 936930018 - Mob. +34 600310755 - Fax. +34 938361994
[EMAIL PROTECTED] - http://www.endepro.com


Aquest missatge i els documents en el seu cas adjunts, es dirigeixen 
exclusivament al seu destinatari i poden contenir informació reservada i/o 
CONFIDENCIAL, us del qual no està autoritzat ni la divulgació del mateix, 
prohibit per la legislació vigent (Llei 32/2002 SSI-CE). Si ha rebut 
aquest missatge per error, li demanem que ens ho comuniqui immediatament 
per la mateixa via o bé per telèfon (+34936930018) i procedeixi a la seva 
destrucció. Aquest e-mail no podrà considerar-se SPAM.


Este mensaje, y los documentos en su caso anexos, se dirigen 
exclusivamente a su destinatario y pueden contener información reservada 
y/o CONFIDENCIAL cuyo uso no autorizado o divulgación está prohibida por 
la legislación vigente (Ley 32/2002 SSI-CE). Si ha recibido este mensaje 
por error, le rogamos que nos lo comunique inmediatamente por esta misma 
vía o por teléfono (+34936930018) y proceda a su destrucción. Este e-mail 
no podrá considerarse SPAM.


This message and the enclosed documents are directed exclusively to its 
receiver and can contain reserved and/or confidential information, from 
which use isn’t allowed its divulgation, forbidden by the current 
legislation (Law 32/2002 SSI-CE). If you have received this message by 
mistake, we kindly ask you to communicate it to us right away by the same 
way or by phone (+34936930018) and destruct it. This e-mail can’t be 
considered as SPAM.


---(end of broadca

Re: [PERFORM] Network permormance under windows

2005-12-02 Thread Qingqing Zhou

"Josep Maria Pinyol Fontseca" <[EMAIL PROTECTED]> wrote
>
> When we select all data in local machine, we obtain results in 2-3 seconds 
> aprox. In remote connections:
>
> Postgresql 7.1 usign pgAdminII:
> Network traffic generated with remote applications is about 77-80% in a 
> 10Mb connection.
> 6 seconds aprox.
>
> Postgresql 8.1 usign pgAdminIII:
> Network traffic generated with remote applications is about 2-4% in a 10Mb 
> connection.
> 12 seconds or more...
>

Have you tried to use psql? And how you "select all data" - by "select 
count(*)"  or "select *"?

Regards,
Qingqing 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Database restore speed

2005-12-02 Thread Luke Lonergan
Stephen,

On 12/2/05 12:18 PM, "Stephen Frost" <[EMAIL PROTECTED]> wrote:

> Just a thought, but couldn't psql be made to use the binary mode of
> libpq and do at least some of the conversion on the client side?  Or
> does binary mode not work with copy (that wouldn't suprise me, but
> perhaps copy could be made to support it)?

Yes - I think this idea is implicit in what David suggested, and my response
as well.  The problem is that the way the client does conversions can
potentially differ from the way the backend does.  Some of the types in
Postgres are machine intrinsic and the encoding conversions use on-machine
libraries, each of which preclude the use of client conversion methods
(without a lot of restructuring).  We'd tackled this problem in the past and
concluded that the parse / convert stage really belongs in the backend.
 
> The other thought, of course, is that you could use PITR for your
> backups instead of pgdump...

Totally - great idea, if this is actually a backup / restore then PITR plus
filesystem copy (tarball) is hugely faster than dump / restore.

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Network permormance under windows

2005-12-02 Thread Josep Maria Pinyol Fontseca


Yes, with psql, pgAdminIII and our application with ODBC I experiment 
the same situation... the sentences that I execute are like "select * 
..." or similar like this.



Qingqing Zhou wrote:


"Josep Maria Pinyol Fontseca" <[EMAIL PROTECTED]> wrote
 

When we select all data in local machine, we obtain results in 2-3 seconds 
aprox. In remote connections:


Postgresql 7.1 usign pgAdminII:
Network traffic generated with remote applications is about 77-80% in a 
10Mb connection.

6 seconds aprox.

Postgresql 8.1 usign pgAdminIII:
Network traffic generated with remote applications is about 2-4% in a 10Mb 
connection.

12 seconds or more...

   



Have you tried to use psql? And how you "select all data" - by "select 
count(*)"  or "select *"?


Regards,
Qingqing 




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster
 




--
Josep Maria Pinyol i Fontseca
Responsable àrea de programació

ENDEPRO - Enginyeria de programari
Passeig Anselm Clavé, 19 Bx. 08263 Callús (Barcelona)
Tel. +34 936930018 - Mob. +34 600310755 - Fax. +34 938361994
[EMAIL PROTECTED] - http://www.endepro.com


Aquest missatge i els documents en el seu cas adjunts,
es dirigeixen exclusivament al seu destinatari i poden contenir
informació reservada i/o CONFIDENCIAL, us del qual no està
autoritzat ni la divulgació del mateix, prohibit per la legislació
vigent (Llei 32/2002 SSI-CE). Si ha rebut aquest missatge per error,
li demanem que ens ho comuniqui immediatament per la mateixa via o
bé per telèfon (+34936930018) i procedeixi a la seva destrucció.
Aquest e-mail no podrà considerar-se SPAM.

Este mensaje, y los documentos en su caso anexos,
se dirigen exclusivamente a su destinatario y pueden contener
información reservada y/o CONFIDENCIAL cuyo uso no
autorizado o divulgación está prohibida por la legislación
vigente (Ley 32/2002 SSI-CE). Si ha recibido este mensaje por error,
le rogamos que nos lo comunique inmediatamente por esta misma vía o
por teléfono (+34936930018) y proceda a su destrucción.
Este e-mail no podrá considerarse SPAM.

This message and the enclosed documents are directed exclusively
to its receiver and can contain reserved and/or confidential
information, from which use isn’t allowed its divulgation, forbidden
by the current legislation (Law 32/2002 SSI-CE). If you have received
this message by mistake, we kindly ask you to communicate it to us
right away by the same way or by phone (+34936930018) and destruct it.
This e-mail can’t be considered as SPAM.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] 15,000 tables

2005-12-02 Thread Andrew Sullivan
On Thu, Dec 01, 2005 at 08:34:43PM +0100, Michael Riess wrote:
> Well, we tried switching to daily VACUUM ANALYZE and weekly VACUUM FULL, 
> but the database got considerably slower near the end of the week.

If you have your FSM configured correctly and you are vacuuming
tables often enough for your turnover, than in regular operation you
should _never_ need VACUUM FULL.  So it sounds like your first
problem is that.  With the 15000 tables you were talking about,
though, that doesn't surprise me.

Are you sure more back ends wouldn't be a better answer, if you're
really wedded to this design?  (I have a feeling that something along
the lines of what Tom Lane said would be a better answer -- I think
you need to be more clever, because I don't think this will ever work
well, on any system.)

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
This work was visionary and imaginative, and goes to show that visionary
and imaginative work need not end up well. 
--Dennis Ritchie

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Database restore speed

2005-12-02 Thread Stephen Frost
* Luke Lonergan ([EMAIL PROTECTED]) wrote:
> On 12/2/05 12:18 PM, "Stephen Frost" <[EMAIL PROTECTED]> wrote:
> > Just a thought, but couldn't psql be made to use the binary mode of
> > libpq and do at least some of the conversion on the client side?  Or
> > does binary mode not work with copy (that wouldn't suprise me, but
> > perhaps copy could be made to support it)?
> 
> Yes - I think this idea is implicit in what David suggested, and my response
> as well.  The problem is that the way the client does conversions can
> potentially differ from the way the backend does.  Some of the types in
> Postgres are machine intrinsic and the encoding conversions use on-machine
> libraries, each of which preclude the use of client conversion methods
> (without a lot of restructuring).  We'd tackled this problem in the past and
> concluded that the parse / convert stage really belongs in the backend.

I've used the binary mode stuff before, sure, Postgres may have to
convert some things but I have a hard time believing it'd be more
expensive to do a network_encoding -> host_encoding (or toasting, or
whatever) than to do the ascii -> binary change.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PERFORM] Database restore speed

2005-12-02 Thread Luke Lonergan
Stephen,

On 12/2/05 1:19 PM, "Stephen Frost" <[EMAIL PROTECTED]> wrote:

> I've used the binary mode stuff before, sure, Postgres may have to
> convert some things but I have a hard time believing it'd be more
> expensive to do a network_encoding -> host_encoding (or toasting, or
> whatever) than to do the ascii -> binary change.

>From a performance standpoint no argument, although you're betting that you
can do parsing / conversion faster than the COPY core in the backend can (I
know *we* can :-).  It's a matter of safety and generality - in general you
can't be sure that client machines / OS'es will render the same conversions
that the backend does in all cases IMO.

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Database restore speed

2005-12-02 Thread Luke Lonergan
Stephen,

On 12/2/05 1:19 PM, "Stephen Frost" <[EMAIL PROTECTED]> wrote:
> 
>> I've used the binary mode stuff before, sure, Postgres may have to
>> convert some things but I have a hard time believing it'd be more
>> expensive to do a network_encoding -> host_encoding (or toasting, or
>> whatever) than to do the ascii -> binary change.
> 
> From a performance standpoint no argument, although you're betting that you
> can do parsing / conversion faster than the COPY core in the backend can (I
> know *we* can :-).  It's a matter of safety and generality - in general you
> can't be sure that client machines / OS'es will render the same conversions
> that the backend does in all cases IMO.

One more thing - this is really about the lack of a cross-platform binary
input standard for Postgres IMO.  If there were such a thing, it *would* be
safe to do this.  The current Binary spec is not cross-platform AFAICS, it
embeds native representations of the DATUMs, and does not specify a
universal binary representation of same.

For instance - when representing a float, is it an IEEE 32-bit floating
point number in little endian byte ordering? Or is it IEEE 64-bit?  With
libpq, we could do something like an XDR implementation, but the machinery
isn't there AFAICS.

- Luke



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Database restore speed

2005-12-02 Thread Michael Stone

On Fri, Dec 02, 2005 at 01:24:31PM -0800, Luke Lonergan wrote:

From a performance standpoint no argument, although you're betting that you
can do parsing / conversion faster than the COPY core in the backend can 


Not necessarily; you may be betting that it's more *efficient* to do the
parsing on a bunch of lightly loaded clients than your server. Even if
you're using the same code this may be a big win. 


Mike Stone

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] 15,000 tables

2005-12-02 Thread Francisco Reyes

Michael Riess writes:

Sorry, I should have included that info in the initial post. You're 
right in that most of these tables have a similar structure. But they 
are independent and can be customized by the users.




How about creating 50 databases and give each it's own tablespace?
It's not only whether PostgreSQL can be optimized, but also how well your 
filesystem is handling the directory with large number of files. by 
splitting the directories you will likely help the OS and will be able to 
perhaps better determine if the OS or the DB is at fault for the slowness.


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Database restore speed

2005-12-02 Thread Tom Lane
"Luke Lonergan" <[EMAIL PROTECTED]> writes:
> One more thing - this is really about the lack of a cross-platform binary
> input standard for Postgres IMO.  If there were such a thing, it *would* be
> safe to do this.  The current Binary spec is not cross-platform AFAICS, it
> embeds native representations of the DATUMs, and does not specify a
> universal binary representation of same.

Sure it does ... at least as long as you are willing to assume everybody
uses IEEE floats, and if they don't you have semantic problems
translating float datums anyhow.

What we lack is documentation, more than functionality.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] 15,000 tables - next step

2005-12-02 Thread Michael Riess

Hi,

thanks for your comments so far - I appreciate it. I'd like to narrow 
down my problem a bit:


As I said in the other thread, I estimate that only 20% of the 15,000 
tables are accessed regularly. So I don't think that vacuuming or the 
number of file handles is a problem. Have a look at this:


content2=# select relpages, relname from pg_class order by relpages desc 
limit 20;

 relpages | relname
--+-
11867 | pg_attribute
10893 | pg_attribute_relid_attnam_index
 3719 | pg_class_relname_nsp_index
 3310 | wsobjects_types
 3103 | pg_class
 2933 | wsobjects_types_fields
 2903 | wsod_133143
 2719 | pg_attribute_relid_attnum_index
 2712 | wsod_109727
 2666 | pg_toast_98845
 2601 | pg_toast_9139566
 1876 | wsod_32168
 1837 | pg_toast_138780
 1678 | pg_toast_101427
 1409 | wsobjects_types_fields_idx
 1088 | wso_log
  943 | pg_depend
  797 | pg_depend_depender_index
  737 | wsod_3100
  716 | wp_hp_zen

I don't think that postgres was designed for a situation like this, 
where a system table that should be fairly small (pg_attribute) is this 
large.


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Database restore speed

2005-12-02 Thread Luke Lonergan
Micahel,

On 12/2/05 1:46 PM, "Michael Stone" <[EMAIL PROTECTED]> wrote:

> Not necessarily; you may be betting that it's more *efficient* to do the
> parsing on a bunch of lightly loaded clients than your server. Even if
> you're using the same code this may be a big win.

If it were possible in light of the issues on client parse / convert, then
we should analyze whether it's a performance win.

In the restore case, where we've got a dedicated server with a dedicated
client machine, I don't see why there would be a speed benefit from running
the same parse / convert code on the client versus running it on the server.
Imagine a pipeline where there is a bottleneck, moving the bottleneck to a
different machine doesn't make it less of a bottleneck.

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Small table or partial index?

2005-12-02 Thread Francisco Reyes

I am in the process of designing a new system.
There will be a long list of words such as

-word table
word_id integer
word varchar
special boolean

Some "special" words are used to determine if some work is to be done and 
will be what we care the most for one type of operation. 

Will it be more effective to have a partial index 'where is special' or to 
copy those special emails to their own table?


The projected number of non special words is in the millions while the 
special ones will be in the thousands at most (under 10K for sure).


My personal view is that performance should be pretty much equal, but one of 
my co-worker's believes that the smaller table would likely get cached by 
the OS since it would be used so frequently and would perform better.


In both instances we would be hitting an index of exactly the same size.

The searches will be 'where word =  and is special' 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Database restore speed

2005-12-02 Thread Mitch Skinner
On Fri, 2005-12-02 at 13:24 -0800, Luke Lonergan wrote:
> It's a matter of safety and generality - in general you
> can't be sure that client machines / OS'es will render the same conversions
> that the backend does in all cases IMO.

Can't binary values can safely be sent cross-platform in DataRow
messages?  At least from my ignorant, cursory look at printtup.c,
there's a binary format code path.  float4send in utils/adt/float.c uses
pq_sendfloat4.  I obviously haven't followed the entire rabbit trail,
but it seems like it happens.

IOW, why isn't there a cross-platform issue when sending binary data
from the backend to the client in query results?  And if there isn't a
problem there, why can't binary data be sent from the client to the
backend?

Mitch

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Database restore speed

2005-12-02 Thread Luke Lonergan
And how do we compose the binary data on the client?  Do we trust that the 
client encoding conversion logic is identical to the backend's?  If there is a 
difference, what happens if the same file loaded from different client machines 
has different results?  Key conflicts when loading a restore from one machine 
and not from another?
- Luke
--
Sent from my BlackBerry Wireless Device


-Original Message-
From: Mitch Skinner <[EMAIL PROTECTED]>
To: Luke Lonergan <[EMAIL PROTECTED]>
CC: Stephen Frost <[EMAIL PROTECTED]>; David Lang <[EMAIL PROTECTED]>; Steve 
Oualline <[EMAIL PROTECTED]>; pgsql-performance@postgresql.org 

Sent: Fri Dec 02 22:26:06 2005
Subject: Re: [PERFORM] Database restore speed

On Fri, 2005-12-02 at 13:24 -0800, Luke Lonergan wrote:
> It's a matter of safety and generality - in general you
> can't be sure that client machines / OS'es will render the same conversions
> that the backend does in all cases IMO.

Can't binary values can safely be sent cross-platform in DataRow
messages?  At least from my ignorant, cursory look at printtup.c,
there's a binary format code path.  float4send in utils/adt/float.c uses
pq_sendfloat4.  I obviously haven't followed the entire rabbit trail,
but it seems like it happens.

IOW, why isn't there a cross-platform issue when sending binary data
from the backend to the client in query results?  And if there isn't a
problem there, why can't binary data be sent from the client to the
backend?

Mitch


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings