Re: [PERFORM] Improving PostgreSQL insert performance

2017-06-10 Thread Joe Conway
On 06/10/2017 07:32 PM, Alvaro Herrera wrote:
> Frits Jalvingh wrote:
> 
>> So, I am still very interested in getting normal inserts faster, because
>> that will gain speed for all work.. If Oracle can do it, and Postgres is
>> able to insert fast with copy- where lies the bottleneck with the insert
>> command? There seems to be quite a performance hit with the JDBC driver
>> itself (as the stored procedure is a lot faster), so I can look into that.
>> But even after that there is quite a gap..
> 
> Did you try inserting multiple tuples in one command?  Something like
> INSERT INTO .. VALUES ('col1', 'col2'), ('col1', 'col2'), ('col1', 'col2')
> It's supposed to be faster than single-row inserts, though I don't
> know by how much.

When I did the testing of the patch originally I saw significant
improvements, e.g. 8x in early versions. The thread is here:
https://www.postgresql.org/message-id/flat/44C4451A.4010906%40joeconway.com#44c4451a.4010...@joeconway.com

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Do work_mem and shared buffers have 1g or 2g limit on 64 bit linux?

2015-06-13 Thread Joe Conway
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 06/13/2015 10:43 AM, Joshua D. Drake wrote:
 
 On 06/13/2015 10:27 AM, Kaijiang Chen wrote:
 Hi, I am using postgresql 9.2.10 on centos 6.2, 64 bit version.
 The server has 512 GB mem.
 
 The jobs are mainly OLAP like. So I need larger work_mem and
 shared buffers. From the source code, there is a constant
 MaxAllocSize==1GB. So, I wonder whether work_mem and shared
 buffers can exceed 2GB in the 64 bit Linux server?

 Work_mem IIRC can go past 2GB but has never been proven to be
 effective after that.
 
 It does depend on the version you are running.

Starting with 9.4 work_mem and maintenance_work_mem can be usefully
set to  2 GB.

I've done testing with index creation, for example, and you can set
maintenance_work_mem high enough (obviously depending on how much RAM
you have and how big the sort memory footprint is) to get the entire
sort to happen in memory without spilling to disk. In some of those
cases I saw time required to create indexes drop by a factor of 3 or
more...YMMV.

I have not tested with large work_mem to encourage hash aggregate
plans, but I suspect there is a lot to be gained there as well.

HTH,

Joe


- -- 
Joe Conway
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJVfHITAAoJEDfy90M199hlGvcP/ijyCsXnWZAeZSUAW4qb20YJ
AHKn0Gl8D9mH9cfPfJeCO+60dcWINzUE6l7qOWWN8JtT6pgbRPGvQsCkx9xRzq+V
aXv/d/r5wW4g06krcootliQJ1TWnLbPBCQiqmI27HSvnEgDKmJ3kOdDji1FMrcdm
tuBdNxppoSx0sIFMJ6Xe/brt9O8wG/a81E0lAnsyh2nncaaXba96ldIhUbKvU0ie
7In88Rn1UYZDXnoQEtZLmF6ArdTN5dQZkyEZvNKR0CHrPVddVYXP/gMWm/XwnOu6
k3Rg/evCY2yCyxveuQXU5AZhDFXB/VLoOQoZ5MhLxnoLCNDJrqJzymE1shsgIIji
i8PfXkKU92/N2kxfDBGwO0LdBpjZzzgg8zMHBsk8FIpXiJvVQKtAfCxYpYkSaL8y
L0g4Qi16s2/fFZcn1ORH23BaBlcmS1cnRWWyx/amyqPHX0v4XZvp3/kSj2jCSw+E
V7HD8qLut4rEAxwA5AGCy+9iugZp8DKQUUNiXOYbuysAdjceAa9LzPE0BbB4kuFC
OfOOjRstr97RyDKwRHjfGs2EnJSENGGcPdGz2HYgup0d4DlIctKww8xeSo55Khp/
HhBjtk7rpnqqEmEeA8+N8w5Z60x4mK900Anr1xhX2x4ETTIG2g9mYkEEZL/OZRUC
lihTXLyUhvd57/v7li5p
=s0U8
-END PGP SIGNATURE-


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hints (was Poor performance using CTE)

2012-11-21 Thread Joe Conway
On 11/21/2012 08:05 AM, Heikki Linnakangas wrote:
 Rather than telling the planner what to do or not to do, I'd much rather
 have hints that give the planner more information about the tables and
 quals involved in the query. A typical source of bad plans is when the
 planner gets its cost estimates wrong. So rather than telling the
 planner to use a nested loop join for a INNER JOIN b ON a.id = b.id,
 the user could tell the planner that there are only 10 rows that match
 the a.id = b.id qual. That gives the planner the information it needs
 to choose the right plan on its own. That kind of hints would be much
 less implementation specific and much more likely to still be useful, or
 at least not outright counter-productive, in a future version with a
 smarter planner.
 
 You could also attach that kind of hints to tables and columns, which
 would be more portable and nicer than decorating all queries.

I like this idea, but also think that if we have a syntax to allow
hints, it would be nice to have a simple way to ignore all hints (yes, I
suppose I'm suggesting yet another GUC). That way after sprinkling your
SQL with hints, you could easily periodically (e.g. after a Postgres
upgrade) test what would happen if the hints were removed.

Joe
-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hints (was Poor performance using CTE)

2012-11-21 Thread Joe Conway
On 11/21/2012 09:28 AM, Craig James wrote:
 
 
 On Wed, Nov 21, 2012 at 9:25 AM, Joe Conway m...@joeconway.com
 mailto:m...@joeconway.com wrote:
 
 On 11/21/2012 08:05 AM, Heikki Linnakangas wrote:
  Rather than telling the planner what to do or not to do, I'd much
 rather
  have hints that give the planner more information about the tables and
  quals involved in the query. A typical source of bad plans is when the
  planner gets its cost estimates wrong. So rather than telling the
  planner to use a nested loop join for a INNER JOIN b ON a.id
 http://a.id = b.id http://b.id,
  the user could tell the planner that there are only 10 rows that match
  the a.id http://a.id = b.id http://b.id qual. That gives the
 planner the information it needs
  to choose the right plan on its own. That kind of hints would be much
  less implementation specific and much more likely to still be
 useful, or
  at least not outright counter-productive, in a future version with a
  smarter planner.
 
  You could also attach that kind of hints to tables and columns, which
  would be more portable and nicer than decorating all queries.
 
 I like this idea, but also think that if we have a syntax to allow
 hints, it would be nice to have a simple way to ignore all hints (yes, I
 suppose I'm suggesting yet another GUC). That way after sprinkling your
 SQL with hints, you could easily periodically (e.g. after a Postgres
 upgrade) test what would happen if the hints were removed.
 
 
 Or a three-way choice: Allow, ignore, or generate an error.  That would
 allow developers to identify where hints are being used.

+1

Joe


-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  24x7 Support




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Queries with conditions using bitand operator

2010-07-13 Thread Joe Conway
On 07/13/2010 04:48 AM, Elias Ghanem wrote:
 Hi,
 I have table ARTICLE containing a String a field STATUS that
 represents a number in binary format (for ex: 10011101).
 My application issues queries with where conditions that uses BITAND
 operator on this field (for ex: select * from article where status  4 =
 4).
 Thus i'm facing performance problemes with these select queries: the
 queries are too slow.
 Since i'm using the BITAND operator in my conditions, creating an index
 on the status filed is useless
  and since the second operator variable (status  4 = 4; status  8 = 8;
 status  16 = 16...) a functional index is also usless (because a
 functional index require the use of a function that accept only table
 column as input parameter: constants are not accepted).
 So is there a way to enhance the performance of these queries?

You haven't given a lot of info to help us help you, but would something
along these lines be useful to you?

drop table if exists testbit;
create table testbit(
 id serial primary key,
 article text,
 status int
);

insert into testbit (article, status) select 'article ' ||
generate_series::text, generate_series % 256 from
generate_series(1,100);

create index idx1 on testbit(article) where status  1 = 1;
create index idx2 on testbit(article) where status  2 = 2;
create index idx4 on testbit(article) where status  4 = 4;
create index idx8 on testbit(article) where status  8 = 8;
create index idx16 on testbit(article) where status  16 = 16;
create index idx32 on testbit(article) where status  512 = 512;

update testbit set status = status + 512 where id in (42, 4242, 424242);
explain analyze select * from testbit where status  512 = 512;
  QUERY PLAN
--
 Index Scan using idx32 on testbit  (cost=0.00..4712.62 rows=5000
width=22) (actual time=0.080..0.085 rows=3 loops=1)
 Total runtime: 0.170 ms


HTH,

Joe

-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  Support



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Highly Efficient Custom Sorting

2010-07-06 Thread Joe Conway
On 07/06/2010 12:42 PM, Eliot Gable wrote:
 Thanks for suggesting array_unnest(). I think that will actually prove
 more useful to me than the other example I'm using for extracting my
 data from an array. I was actually planning on computing the order on
 the first call and storing it in a linked list which gets returned one
 item at a time until all rows have been returned. Also, I found a code
 example using Google that showed someone storing data across function
 calls using that pointer. I used their example to produce this:
 
 snip
 if(SRF_IS_FIRSTCALL()) {
 funcctx = SRF_FIRSTCALL_INIT();
 
 /* This is where we stick or sorted data for returning later */
 funcctx-user_fctx =
 MemoryContextAlloc(funcctx-multi_call_memory_ctx, sizeof(sort_data));
 oldcontext = MemoryContextSwitchTo(funcctx-multi_call_memory_ctx);
 data = (sort_data*) funcctx-user_fctx;
 /snip
 
 I have a structure set up that is typedef'd to sort_data which stores
 pointers to various things that I need to survive across the calls.
 Since this seems to be what you are suggesting, I assume this is the
 correct approach.

This approach works, but you could also use the SFRM_Materialize mode
and calculate the entire result set in one go. That tends to be simpler.
See, for example crosstab_hash() in contrib/tablefunc for an example.

FWIW, there are also some good examples of array handling in PL/R, e.g.
pg_array_get_r() in pg_conversion.c

HTH,

Joe

-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting,  Support



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Need to increase performance of a query

2010-06-10 Thread Joe Conway
On 06/10/2010 12:56 PM, Anne Rosset wrote:
 Craig James wrote:
   create index item_rank_null_idx on item_rank(pf_id)
where item_rank.pf_id is null;

 Craig

 Hi Craig,
 I tried again after adding your suggested index but I didn't see any
 improvements: (seems that the index is not used)

 Filter: ((rank IS NOT NULL) AND (pf_id IS NULL) AND
 ((project_id)::text = 'proj2783'::text))
 Total runtime: 11.988 ms
 (6 rows)
 
 Time: 13.654 ms

try:

create index item_rank_null_idx on item_rank(pf_id)
where rank IS NOT NULL AND pf_id IS NULL;

Joe



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Need to increase performance of a query

2010-06-10 Thread Joe Conway
On 06/10/2010 01:10 PM, Joe Conway wrote:
 try:
 
 create index item_rank_null_idx on item_rank(pf_id)
 where rank IS NOT NULL AND pf_id IS NULL;

oops -- that probably should be:

create index item_rank_null_idx on item_rank(project_id)
where rank IS NOT NULL AND pf_id IS NULL;

Joe



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Need to increase performance of a query

2010-06-10 Thread Joe Conway
On 06/10/2010 01:21 PM, Anne Rosset wrote:
   
 I tried that and it didn't make any difference. Same query plan.

A little experimentation suggests this might work:

create index item_rank_project on item_rank(project_id, rank) where
pf_id IS NULL;

Joe



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] crosstab speed

2008-11-13 Thread Joe Conway

Jeremiah Elliott wrote:
ok, I have an application that I am trying to speed up. Its a reporting 
application that makes heavy use of the crosstab function.


snip


here is an example query




The crostab function is taking between 5 and 15 seconds to return.


Please run the two embedded queries independently, i.e.

select
ARRAY[site::text,product_line_description::text,report_sls::text,fy_period::text] 
as COL_HEADER, fy_year, sum(invoice_value) from order_data_tbl 
   where  fy_year is not null group by 
site::text,product_line_description::text,report_sls::text,fy_period::text, 
fy_year order by 
site::text,product_line_description::text,report_sls::text,fy_period::text;


-- and --

select fy_year from order_data_tbl
 where fy_year is not null
 group by fy_year
 order by fy_year;

How long does each take? crosstab cannot run any faster than the sum of 
these two queries run on their own.


If the second one doesn't change often, can you pre-calculate it, 
perhaps once a day?


Joe

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance penalty for remote access of postgresql

2006-07-19 Thread Joe Conway

Stephen Frost wrote:

* Guoping Zhang ([EMAIL PROTECTED]) wrote:


Obviously, if there is no better solution, the TCP round trip penalty will
stop us doing so as we do have performance requirement.


Actually, can't you stick multiple inserts into a given 'statement'?
ie: insert into abc (123); insert into abc (234);

I'm not 100% sure if that solves the round-trip issue, but it might..
Also, it looks like we might have multi-value insert support in 8.2 (I
truely hope so anyway), so you could do something like this:
insert into abc (123),(234);


Yeah, see my post from last night on PATCHES. Something like insert 
into abc (123); insert into abc (234); ... actually seems to work 
pretty well as long as you don't drive the machine into swapping. If 
you're doing a very large number of INSERTs, break it up into bite-sized 
chunks and you should be fine.


Joe

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Big differences in plans between 8.0 and 8.1

2006-07-16 Thread Joe Conway

Gabriele Turchi wrote:

Il giorno sab, 15/07/2006 alle 13.04 -0700, Joe Conway ha scritto:
Why not just periodically (once an hour?) run ANALYZE registrazioni; 
during the day. This will only update the statistics, and should be very 
low impact.


This is my solution too... but: is enough? Or else: there is a better
way to do this? If the performance in the better case is 50 times faster
than the worse case, during an hour (50/100 record inserted in
registrazioni) how much the performance can fall before the new
ANALYZE is run? Otherwise, running ANALYZE more frequently can badly
affect the overall performance?


One thing I noticed is that in both plans there is a seq scan on 
registrazioni. Given that performance degrades so quickly as records are 
inserted into registrazioni, I'm wondering if you're missing an index. 
What indexes do you have on registrazioni?


Joe

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Big differences in plans between 8.0 and 8.1

2006-07-15 Thread Joe Conway

Gabriele Turchi wrote:

Running an ANALYZE really change the plan, now it is fast as before
(8.0).

On the production system a VACUUM FULL ANALYZE is run every morning
after a clean-up, when the registrazioni table is empty. During the
day this table fills up (about 500 record any day), and apparently the
performances are free-falling very quickly. This behaviour has not
changed between the old and the new installation.   

Can you suggest an easy way to collect and keep up-to-date these
statistics in a very low-impact way?



Why not just periodically (once an hour?) run ANALYZE registrazioni; 
during the day. This will only update the statistics, and should be very 
low impact.


HTH,

Joe

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Dynamically loaded C function performance

2006-05-11 Thread Joe Conway

Jim C. Nasby wrote:

On Fri, May 05, 2006 at 03:47:53PM -0700, Adam Palmblad wrote:


Hi,
We've got a C function that we use here and we find that for every
connection, the first run of the function is much slower than any
subsequent runs.  ( 50ms compared to 8ms)

Besides using connection pooling, are there any options to improve
performance?


In my experience, connection startup takes a heck of a lot longer than
50ms, so why are you worrying about 50ms for the first run of a
function?

BTW, sorry, but I don't know a way to speed this up, either.


I think Tom nailed the solution already in a nearby reply -- see 
preload_libraries on this page:


http://www.postgresql.org/docs/8.1/interactive/runtime-config-resource.html

Joe

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] [sfpug] DATA directory on network attached storage

2005-04-11 Thread Joe Conway
Aditya wrote:
We have not, AFAICT, had any problems with the traffic over NFS as far as
reliability -- I'm sure there is a performance penalty, but the reliability
and scalability gains more than offset that.
My experience agrees with yours. However we did find one gotcha -- see 
the thread starting here for details:
http://archives.postgresql.org/pgsql-hackers/2004-12/msg00479.php

In a nutshell, be careful when using an nfs mounted data directory 
combined with an init script that creates a new data dir when it doesn't 
find one.

FWIW, if I were to do this anew, I would probably opt for iSCSI over GigE with
a NetApp.
Any particular reason? Our NetApp technical rep advised nfs over iSCSI, 
IIRC because of performance.

Joe
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [sfpug] DATA directory on network attached storage

2005-04-11 Thread Joe Conway
Aditya wrote:
On Mon, Apr 11, 2005 at 10:59:51AM -0700, Joe Conway wrote:
Any particular reason? Our NetApp technical rep advised nfs over iSCSI, 
IIRC because of performance.
I would mount the Netapp volume(s) as a block level device on my server 
using
iSCSI (vs. a file-based device like NFS) so that filesystem parameters could
be more finely tuned and one could really make use of jumbo frames over GigE.
Actually, we're using jumbo frames over GigE with nfs too.
I'm not sure I understand why NFS would perform better than iSCSI -- in any
case, some large Oracle dbs at my current job are moving to iSCSI on Netapp
and in that environment both Oracle and Netapp advise iSCSI (probably because
Oracle uses the block-level device directly), so I suspend the difference in
performance is minimal.
We also have Oracle DBs via nfs mounted Netapp, again per the local 
guru's advice. It might be one of those things that is still being 
debated even within Netapp's ranks (or maybe our info is dated - worth a
check).

Thanks,
Joe
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-13 Thread Joe Conway
Alex Turner wrote:
I'm not advocating that people switch to Oracle at all, It's still
much more expensive than Postgresql, and for most small and medium
applications Postgresql is much easier to manage and maintain.  I
would just like to make sure people get their facts straight.  I
worked for a company that selected MS SQL Server because it was
'cheaper' than Oracle, when infact with the correct Oracle pricing,
Oracle was cheaper, and had superior features.  I would have prefered
that they use Postgresql, which for the project in question would have
been more appropriate and cost much less in hardware and software
requirements, but they had to have 'Industry Standard'.  Oracle ended
up costing $10k with licenses at $149 ea for 25 users, and the
support contract wasn't that much of a bear - I can't remember exactly
how much, I think it was around $1800/yr.
My facts were straight, and they come from firsthand experience. The 
point is, it is easy to get trapped into thinking to yourself, great, I 
can get a dual CPU oracle server for ~$10K, that's not too bad But 
then later you figure out you really need table partitioning or RAC, and 
suddenly you have to jump directly to multiple 6 figures. The entry 
level Oracle pricing is mainly a marketing gimmick -- it is intended to 
get you hooked.

Also note that the per named user license scheme is subject to per CPU 
minimums that guarantee you'll never spend less than half the per CPU 
price. Oracle's licensing is so complex that there are businesses out 
there that subsist solely on helping companies figure it out to save 
money, and they take a cut of the savings. Oracle's own account reps had 
a hard time answering this question -- does a hyperthreaded Intel CPU 
count as 1 or 2 CPUs from a licensing standpoint? We were eventually 
told 1, but that the decision was subject to change in the future.

Joe
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-13 Thread Joe Conway
Alex Turner wrote:
I appreciate your information, but it's not valid.  Most people don't
need RAC or table partitioning.
From a small company perspective, maybe, but not in the least invalid 
for larger companies.

Many of the features in Oracle EE are just not available in Postgresql at all, 
and many aren't available in
any version of SQL Server (table partitioning, bitmap indexes and
others).
I never claimed otherwise. I said the low end product gets you hooked. 
Once you're hooked, you'll start to wish for all the wiz-bang features 
-- after all, that's why you picked Oracle in the first place.

Just because Oracle reps are a little clueless
sometimes doesn't mean that the product pricing sucks.
The minimum user requirement for standard one is 5 users.  5*149=$745,
much less than half the price of a dual or single CPU config.
And what happens once you need a quad server?
I'm sorry that you had a bad experience with Oracle, but Oracle is a
fine product, that is available for not alot of $$ if you are willing
to use a bit of elbow grease to learn how it works and don't need
enterprise features, which many other database product simply don't
have, or work very poorly.
I never said I had a bad experience with Oracle. I pointed out the 
gotchas. We have several large Oracle boxes running, several MSSQL, and 
several Postgres -- they all have their strengths and weaknesses.

Nuff said -- this thread is way off topic now...
Joe
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft

2005-01-12 Thread Joe Conway
Greg Sabino Mullane wrote:
Don't forget your support contract cost, as well as licenses for each
of your servers: development, testing, QA, etc.
 
Is it really as cheap as 5K? I've heard that for any fairly modern
system, it's much more, but that may be wrong.
 
Sort of -- see:
http://oraclestore.oracle.com/OA_HTML/ibeCCtpSctDspRte.jsp?section=15105
  It is available on single server systems supporting up to a maximum
   of 2 CPUs
Also note that most industrial strength features (like table 
partitioning, RAC, OLAP, Enterprise Manager plugins, etc, etc) are high 
priced options (mostly $10K to $20K per CPU) and they can only be used 
with the Enterprise edition (which is $40K/CPU *not* $2.5K/CPU).
http://oraclestore.oracle.com/OA_HTML/ibeCCtpSctDspRte.jsp?section=10103

And you are correct, they expect to be paid for each dev, test, and QA 
machine too.

The $5K edition is just there to get you hooked ;-) By the time you add 
up what you really want/need, figure you'll spend a couple of orders of 
magnatude higher, and then  20% per year for ongoing 
maintenance/upgrades/support.

Joe
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Trying to create multi db query in one large querie

2004-12-13 Thread Joe Conway
Spiegelberg, Greg wrote:
My experience with dblink() is that each dblink() is executed serially
Correct.
If you really want to do multiple queries simultaneously, you would need 
to write a function very similar to dblink_record, but using asynchonous 
libpq calls to both remote hosts. See:
  http://www.postgresql.org/docs/current/static/libpq-async.html

HTH,
Joe
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Alternatives to Dell?

2004-12-02 Thread Joe Conway
Josh Berkus wrote:
Thing is, some companies are required to use 1st-tier or at least 2nd-tier 
vendors for hardware; they won't home-build.   For those people, what vendors 
do others on this list recommend?   What have been your good/bad experiences?
I've had very good experiences with IBM hardware, and found their sales 
and support to be responsive.

Joe
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

2004-09-16 Thread Joe Conway
Iain wrote:
Joe's example wasn't excluding partions, as he didn't use a predicated UNION
ALL view to select from. His queries use an indexed column that allow the
various partitions to be probed at low cost, and he was satisfied wth that.
Right.
My point in my previous post was that you could still do all that that if
you wanted to, by building the predicated view with UNION ALL of each of the
child tables.
Right. It doesn't look that much different:
create or replace view foo_vw as
select * from foo_2004_01 where f2 = '2004-jan-01' and f2 = '2004-jan-31'
union all
select * from foo_2004_02 where f2 = '2004-feb-01' and f2 = '2004-feb-29'
union all
select * from foo_2004_03 where f2 = '2004-mar-01' and f2 = '2004-mar-31'
;
 -- needed just for illustration since these are toy tables
set enable_seqscan to false;
explain analyze select * from foo_vw where f2 = '2004-feb-15';
 QUERY PLAN
--
 Subquery Scan foo_vw  (cost=0.00..14.54 rows=3 width=16) (actual 
time=0.022..0.027 rows=1 loops=1)
   -  Append  (cost=0.00..14.51 rows=3 width=16) (actual 
time=0.019..0.022 rows=1 loops=1)
 -  Subquery Scan *SELECT* 1  (cost=0.00..4.84 rows=1 
width=16) (actual time=0.004..0.004 rows=0 loops=1)
   -  Index Scan using foo_2004_01_idx2 on foo_2004_01 
(cost=0.00..4.83 rows=1 width=16) (actual time=0.003..0.003 rows=0 loops=1)
 Index Cond: ((f2 = '2004-01-01'::date) AND (f2 = 
'2004-01-31'::date) AND (f2 = '2004-02-15'::date))
 -  Subquery Scan *SELECT* 2  (cost=0.00..4.84 rows=1 
width=16) (actual time=0.013..0.015 rows=1 loops=1)
   -  Index Scan using foo_2004_02_idx2 on foo_2004_02 
(cost=0.00..4.83 rows=1 width=16) (actual time=0.009..0.010 rows=1 loops=1)
 Index Cond: ((f2 = '2004-02-01'::date) AND (f2 = 
'2004-02-29'::date) AND (f2 = '2004-02-15'::date))
 -  Subquery Scan *SELECT* 3  (cost=0.00..4.84 rows=1 
width=16) (actual time=0.001..0.001 rows=0 loops=1)
   -  Index Scan using foo_2004_03_idx2 on foo_2004_03 
(cost=0.00..4.83 rows=1 width=16) (actual time=0.001..0.001 rows=0 loops=1)
 Index Cond: ((f2 = '2004-03-01'::date) AND (f2 = 
'2004-03-31'::date) AND (f2 = '2004-02-15'::date))
 Total runtime: 0.188 ms
(12 rows)

regression=# explain analyze select * from foo where f2 = '2004-feb-15';
 QUERY PLAN
--
 Result  (cost=1.00..10073.70 rows=20 width=16) (actual 
time=0.059..0.091 rows=1 loops=1)
   -  Append  (cost=1.00..10073.70 rows=20 width=16) 
(actual time=0.055..0.086 rows=1 loops=1)
 -  Seq Scan on foo  (cost=1.00..10022.50 rows=5 
width=16) (actual time=0.001..0.001 rows=0 loops=1)
   Filter: (f2 = '2004-02-15'::date)
 -  Index Scan using foo_2004_01_idx2 on foo_2004_01 foo 
(cost=0.00..17.07 rows=5 width=16) (actual time=0.045..0.045 rows=0 loops=1)
   Index Cond: (f2 = '2004-02-15'::date)
 -  Index Scan using foo_2004_02_idx2 on foo_2004_02 foo 
(cost=0.00..17.07 rows=5 width=16) (actual time=0.008..0.009 rows=1 loops=1)
   Index Cond: (f2 = '2004-02-15'::date)
 -  Index Scan using foo_2004_03_idx2 on foo_2004_03 foo 
(cost=0.00..17.07 rows=5 width=16) (actual time=0.029..0.029 rows=0 loops=1)
   Index Cond: (f2 = '2004-02-15'::date)
 Total runtime: 0.191 ms
(11 rows)

The main difference being that the view needs to be recreated every time 
a table is added or dropped, whereas with the inherited tables method 
that isn't needed.

Joe
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

2004-09-15 Thread Joe Conway
[EMAIL PROTECTED] wrote:
Joe Conway [EMAIL PROTECTED] wrote on 15.09.2004, 06:30:24:
We're not completely done with our data conversion (from a commercial 
RDBMSi), but so far the results have been excellent. Similar to what 
others have said in this thread, the conversion involved restructuring 
the data to better suit Postgres, and the application (data 
analysis/mining vs. the source system which is operational). As a result 
we've compressed a  1TB database down to ~0.4TB, and seen at least one 
typical query reduced from ~9 minutes down to ~40 seconds.
Sounds interesting.
The performance gain comes from partition elimination of the inherited
tables under the root?
I take it the compression comes from use of arrays, avoiding the need
for additional rows and key overhead?
Sorry, in trying to be concise I was not very clear. I'm using the term 
compression very generally here. I'll try to give a bit more background,

The original data source is a database schema designed for use by an 
operational application that my company sells to provide enhanced 
management of equipment that we also sell. The application needs to be 
very flexible in exactly what data it stores in order to be useful 
across a wide variety of equipment models and versions. In order to do 
that there is a very large central transaction table that stores 
name-value pairs in varchar columns. The name-value pairs come from 
parsed output of the equipment, and as such there is a fair amount of 
redundancy and unneeded data that ends up getting stored. At each 
installation in the field this table can get very large ( billion 
rows). Additionally the application prematerializes a variety of 
summaries for use by the operators using the GUI.

We collect the data exported from each of the systems in the field and 
accumulate it in a single central database for data mining and analysis. 
This is the database that is actually being converted. By compression I 
really mean that unneeded and redundant data is being stripped out, and 
data known to be of a certain datatype is stored in that type instead of 
varchar (e.g. values known to be int are stored as int). Also the 
summaries are not being converted (although we do some post processing 
to create new materialized summaries).

My points in telling this were:
 - the use of inherited tables to partition this huge number of rows and
   yet allow simple query access to it seems to work well, at least in
   early validation tests
 - had we simply taken the original database and slammed it into
   Postgres with no further thought, we would not have seen the big
   improvements, and thus the project might have been seen as a failure
   (even though it saves substantial $)
Hope that's a bit more clear. I'm hoping to write up a more detailed 
case study once we've cut the Postgres system into production and the 
dust settles a bit.

Joe
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

2004-09-15 Thread Joe Conway
Josh Berkus wrote:
 - the use of inherited tables to partition this huge number of rows and
   yet allow simple query access to it seems to work well, at least in
   early validation tests
 - had we simply taken the original database and slammed it into
   Postgres with no further thought, we would not have seen the big
   improvements, and thus the project might have been seen as a failure
   (even though it saves substantial $)

Any further thoughts on developing this into true table partitioning?
Just that I'd love to see it happen ;-)
Maybe someday I'll be able to find the time to work on it myself, but 
for the moment I'm satisfied with the workarounds we've made.

Joe
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

2004-09-15 Thread Joe Conway
Simon Riggs wrote:
Joe,
Your application is very interesting. I've just read your OSCON paper. I'd
like to talk more about that. Very similar to Kalido.
...but back to partitioning momentarily: Does the performance gain come from
partition elimination of the inherited tables under the root?
I think the major part of the peformance gain comes from the fact that 
the source database has different needs in terms of partitioning 
criteria because of it's different purpose. The data is basically 
partitioned by customer installation instead of by date. Our converted 
scheme partitions by date, which is in line with the analytical queries 
run at the corporate office. Again, this is an argument in favor of not 
simply porting what you're handed.

We might get similar query performance with a single large table and 
multiple partial indexes (e.g. one per month), but there would be one 
tradeoff and one disadvantage to that:
1) The indexes would need to be generated periodically -- this is a 
tradeoff since we currently need to create inherited tables at the same 
periodicity
2) It would be much more difficult to roll off a month's worth of data 
when needed. The general idea is that each month we create a new monthly 
table, then archive and drop the oldest monthly table. If all the data 
were in one big table we would have to delete many millions of rows from 
a (possibly) multibillion row table, and then vacuum that table -- no 
thanks ;-)

Joe
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

2004-09-15 Thread Joe Conway
Iain wrote:
That's exactly what we're doing, but using inherited tables instead of a
union view. With inheritance, there is no need to rebuild the view each
time a table is added or removed. Basically, in our application, tables
are partitioned by either month or week, depending on the type of data
involved, and queries are normally date qualified.
That sounds interesting. I have to admit that I havn't touched iheritance in
pg at all yet so I find it hard to imagine how this would work. If you have
a chance, would you mind elaborating on it just a little?
OK, see below:
=
create table foo(f1 int, f2 date, f3 float8);
create table foo_2004_01() inherits (foo);
create table foo_2004_02() inherits (foo);
create table foo_2004_03() inherits (foo);
create index foo_2004_01_idx1 on foo_2004_01(f2);
create index foo_2004_02_idx1 on foo_2004_02(f2);
create index foo_2004_03_idx1 on foo_2004_03(f2);
insert into foo_2004_02 values(1,'2004-feb-15',3.14);
 -- needed just for illustration since these are toy tables
set enable_seqscan to false;
explain analyze select * from foo where f2 = '2004-feb-15';
QUERY PLAN

 Result  (cost=1.00..10061.32 rows=16 width=16) (actual 
time=0.224..0.310 rows=1 loops=1)
   -  Append  (cost=1.00..10061.32 rows=16 width=16) 
(actual time=0.214..0.294 rows=1 loops=1)
 -  Seq Scan on foo  (cost=1.00..10022.50 rows=5 
width=16) (actual time=0.004..0.004 rows=0 loops=1)
   Filter: (f2 = '2004-02-15'::date)
 -  Index Scan using foo_2004_01_idx1 on foo_2004_01 foo 
(cost=0.00..17.07 rows=5 width=16) (actual time=0.101..0.101 rows=0 loops=1)
   Index Cond: (f2 = '2004-02-15'::date)
 -  Index Scan using foo_2004_02_idx1 on foo_2004_02 foo 
(cost=0.00..4.68 rows=1 width=16) (actual time=0.095..0.101 rows=1 loops=1)
   Index Cond: (f2 = '2004-02-15'::date)
 -  Index Scan using foo_2004_03_idx1 on foo_2004_03 foo 
(cost=0.00..17.07 rows=5 width=16) (actual time=0.066..0.066 rows=0 loops=1)
   Index Cond: (f2 = '2004-02-15'::date)
 Total runtime: 0.582 ms
(11 rows)

create table foo_2004_04() inherits (foo);
create index foo_2004_04_idx1 on foo_2004_04(f2);
explain analyze select * from foo where f2 = '2004-feb-15';
QUERY PLAN

 Result  (cost=1.00..10078.38 rows=21 width=16) (actual 
time=0.052..0.176 rows=1 loops=1)
   -  Append  (cost=1.00..10078.38 rows=21 width=16) 
(actual time=0.041..0.159 rows=1 loops=1)
 -  Seq Scan on foo  (cost=1.00..10022.50 rows=5 
width=16) (actual time=0.004..0.004 rows=0 loops=1)
   Filter: (f2 = '2004-02-15'::date)
 -  Index Scan using foo_2004_01_idx1 on foo_2004_01 foo 
(cost=0.00..17.07 rows=5 width=16) (actual time=0.012..0.012 rows=0 loops=1)
   Index Cond: (f2 = '2004-02-15'::date)
 -  Index Scan using foo_2004_02_idx1 on foo_2004_02 foo 
(cost=0.00..4.68 rows=1 width=16) (actual time=0.016..0.022 rows=1 loops=1)
   Index Cond: (f2 = '2004-02-15'::date)
 -  Index Scan using foo_2004_03_idx1 on foo_2004_03 foo 
(cost=0.00..17.07 rows=5 width=16) (actual time=0.008..0.008 rows=0 loops=1)
   Index Cond: (f2 = '2004-02-15'::date)
 -  Index Scan using foo_2004_04_idx1 on foo_2004_04 foo 
(cost=0.00..17.07 rows=5 width=16) (actual time=0.095..0.095 rows=0 loops=1)
   Index Cond: (f2 = '2004-02-15'::date)
 Total runtime: 0.443 ms
(13 rows)

For loading data, we COPY into foo, and have a trigger that redirects 
the rows to the appropriate partition.

Notice that the partitions which do not contain any data of interest are 
still probed for data, but since they have none it is very quick. In a 
real life example I got the following results just this afternoon:

 - aggregate row count = 471,849,665
 - total number inherited tables = 216
   (many are future dated and therefore contain no data)
 - select one month's worth of data for one piece of equipment by serial
   number (49,257 rows) = 526.015 ms
Not too bad -- quick enough for my needs. BTW, this is using NFS mounted 
storage (NetApp NAS).

Joe
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

2004-09-15 Thread Joe Conway
Christopher Browne wrote:
In the last exciting episode, [EMAIL PROTECTED] (Joe Conway) wrote:
That's exactly what we're doing, but using inherited tables instead of
a union view. With inheritance, there is no need to rebuild the view
each time a table is added or removed. Basically, in our application,
tables are partitioned by either month or week, depending on the type
of data involved, and queries are normally date qualified.

Where does the constraint come in that'll allow most of the data to be
excluded?
Not sure I follow this.
Or is this just that the entries are all part of bigtable so that
the self join is only 2-way?
We don't have a need for self-joins in our application. We do use a 
crosstab function to materialize some transposed views of the data, 
however. That allows us to avoid self-joins in the cases where we might 
otherwise need them.

Joe
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres --

2004-09-14 Thread Joe Conway
Chris Browne wrote:
Might we set up the view as:
create view combination_of_logs as
  select * from table_1 where txn_date between 'this' and 'that' 
   union all
  select * from table_2 where txn_date between 'this2' and 'that2' 
   union all
  select * from table_3 where txn_date between 'this3' and 'that3' 
   union all
  select * from table_4 where txn_date between 'this4' and 'that4' 
   union all
   ... ad infinitum
   union all
  select * from table_n where txn_date  'start_of_partition_n';

and expect that to help, as long as the query that hooks up to this
has date constraints?
We'd have to regenerate the view with new fixed constants each time we
set up the tables, but that sounds like it could work...
That's exactly what we're doing, but using inherited tables instead of a 
union view. With inheritance, there is no need to rebuild the view each 
time a table is added or removed. Basically, in our application, tables 
are partitioned by either month or week, depending on the type of data 
involved, and queries are normally date qualified.

We're not completely done with our data conversion (from a commercial 
RDBMSi), but so far the results have been excellent. Similar to what 
others have said in this thread, the conversion involved restructuring 
the data to better suit Postgres, and the application (data 
analysis/mining vs. the source system which is operational). As a result 
we've compressed a  1TB database down to ~0.4TB, and seen at least one 
typical query reduced from ~9 minutes down to ~40 seconds.

Joe
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] NAS, SAN or any alternate solution ?

2004-07-20 Thread Joe Conway
[EMAIL PROTECTED] wrote:
Would NAS or SAN be good solutions ? (I've read that NAS uses NFS which 
could slow down the transfer rate ??)

Has anyone ever tried one of these with postgresql ? 
Not (yet) with Postgres, but my company has run ~100GB Oracle database 
on NAS (NetApp) for the past couple of years. We've found it to 
outperform local attached storage, and it has been extremely reliable 
and flexible. Our DBAs wouldn't give it up without a fight.

Joe
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] postgresql and openmosix migration

2004-06-22 Thread Joe Conway
Bill wrote:
Ok, so maybe someone on this group will have a better idea.  We have a
database of financial information, and this has literally millions of
entries.  I have installed indicies, but for the rather computationally
demanding processes we like to use, like a select query to find the
commodity with the highest monthly or annual returns, the computer generally
runs unacceptably slow.  So, other than clustring, how could I achieve a
speed increase in these complex queries?  Is this better in mysql or
postgresql?
If the bottleneck is really computational, not I/O, you might try PL/R 
in conjunction with the rpvm R package. rpvm allows R to make use of pvm 
to split its load among a cluster. See:

R:
  http://www.r-project.org/
PL/R:
  http://www.joeconway.com/plr/
rpvm:
  http://cran.r-project.org/src/contrib/Descriptions/rpvm.html
  http://cran.r-project.org/doc/packages/rpvm.pdf
I haven't had a chance to play with this myself yet, but I hope to 
relatively soon.

HTH,
Joe
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Looking for ideas on how to speed up warehouse loading

2004-04-22 Thread Joe Conway
Sean Shanny wrote:
explain analyze SELECT t1.id, t2.url FROM referral_temp t2 LEFT OUTER 
JOIN d_referral t1 ON t2.url = t1.referral_raw_url ORDER BY t1.id;

What I would like to know is if there are better ways to do the join?  I 
need to get all the rows back from the referral_temp table as they are 
used for assigning FK's for the fact table later in processing.  When I 
iterate over the values that I get back those with t1.id = null I assign 
a new FK and push both into the d_referral table as new entries as well 
as a text file for later use.  The matching records are written to a 
text file for later use.
Would something like this work any better (without disabling index scans):

SELECT t1.id, t2.url
FROM referral_temp t2, d_referral t1
WHERE t1.referral_raw_url = t2.url;
process rows with a match

SELECT t1.id, t2.url
FROM referral_temp t2
WHERE NOT EXISTS
(select 1 FROM d_referral t1 WHERE t1.referral_raw_url = t2.url);
process rows without a match

?

Joe

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-20 Thread Joe Conway
Joe Conway wrote:
In isolation, test_run.sql should do essentially no syscalls at all once
it's past the initial ramp-up.  On a machine that's functioning per
expectations, multiple copies of test_run show a relatively low rate of
semop() calls --- a few per second, at most --- and maybe a delaying
select() here and there.
Here's results for 7.4 on a dual Athlon server running fedora core:

CPU states:  cpuusernice  systemirq  softirq  iowaitidle
   total   86.0%0.0%   52.4%   0.0% 0.0%0.0%   61.2%
   cpu00   37.6%0.0%   29.7%   0.0% 0.0%0.0%   32.6%
   cpu01   48.5%0.0%   22.7%   0.0% 0.0%0.0%   28.7%
procs  memory  swap  io system 
   cpu
 r  b   swpd   free   buff  cache   si   sobibo   incs
 1  0 120448  25764  48300 109457600 0   124  170   187
 1  0 120448  25780  48300 109457600 0 0  15289
 2  0 120448  25744  48300 109458000 060  141 78290
 2  0 120448  25752  48300 109458000 0 0  131 140326
 2  0 120448  25756  48300 109457600 040  122 140100
 2  0 120448  25764  48300 109458400 060  133 136595
 2  0 120448  24284  48300 109458400 0   200  138 135151

The jump in cs corresponds to starting the query in the second session.

Joe

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Joe Conway
scott.marlowe wrote:
On Mon, 19 Apr 2004, Bruce Momjian wrote:
I have BSD on a SuperMicro dual Xeon, so if folks want another
hardware/OS combination to test, I can give out logins to my machine.
I can probably do some nighttime testing on a dual 2800MHz non-MP Xeon 
machine as well.  It's a Dell 2600 series machine and very fast.  It has 
the moderately fast 533MHz FSB so may not have as many problems as the MP 
type CPUs seem to be having.
I've got a quad 2.8Ghz MP Xeon (IBM x445) that I could test on. Does 
anyone have a test set that can reliably reproduce the problem?

Joe

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Wierd context-switching issue on Xeon

2004-04-19 Thread Joe Conway
Tom Lane wrote:
Here is a test case.  To set up, run the test_setup.sql script once;
then launch two copies of the test_run.sql script.  (For those of
you with more than two CPUs, see whether you need one per CPU to make
trouble, or whether two test_runs are enough.)  Check that you get a
nestloops-with-index-scans plan shown by the EXPLAIN in test_run.
Check.

In isolation, test_run.sql should do essentially no syscalls at all once
it's past the initial ramp-up.  On a machine that's functioning per
expectations, multiple copies of test_run show a relatively low rate of
semop() calls --- a few per second, at most --- and maybe a delaying
select() here and there.
What I actually see on Josh's client's machine is a context swap storm:
vmstat 1 shows CS rates around 170K/sec.  strace'ing the backends
shows a corresponding rate of semop() syscalls, with a few delaying
select()s sprinkled in.  top(1) shows system CPU percent of 25-30
and idle CPU percent of 16-20.
Your test case works perfectly. I ran 4 concurrent psql sessions, on a 
quad Xeon (IBM x445, 2.8GHz, 4GB RAM), hyperthreaded. Heres what 'top' 
looks like:

177 processes: 173 sleeping, 3 running, 1 zombie, 0 stopped
CPU states:  cpuusernice  systemirq  softirq  iowaitidle
   total   35.9%0.0%7.2%   0.0% 0.0%0.0%   56.8%
   cpu00   19.6%0.0%4.9%   0.0% 0.0%0.0%   75.4%
   cpu01   44.1%0.0%7.8%   0.0% 0.0%0.0%   48.0%
   cpu020.0%0.0%0.0%   0.0% 0.0%0.0%  100.0%
   cpu03   32.3%0.0%   13.7%   0.0% 0.0%0.0%   53.9%
   cpu04   21.5%0.0%   10.7%   0.0% 0.0%0.0%   67.6%
   cpu05   42.1%0.0%9.8%   0.0% 0.0%0.0%   48.0%
   cpu06  100.0%0.0%0.0%   0.0% 0.0%0.0%0.0%
   cpu07   27.4%0.0%   10.7%   0.0% 0.0%0.0%   61.7%
Mem: 4123700k av, 3933896k used, 189804k free, 0k shrd, 221948k buff
  2492124k actv,  760612k in_d,   41416k in_c
Swap: 2040244k av, 5632k used, 2034612k free 3113272k cached
Note that cpu06 is not a postgres process. The output of vmstat looks 
like this:

# vmstat 1
procs  memory  swap  io system 
   cpu
r  b swpd   free   buff  cache  si  so   bi   bo  in   cs us sy id wa
4  0 5632 184264 221948 3113308  0   000   00  0  0  0  0
3  0 5632 184264 221948 3113308  0   000  112 211894 36  9 55  0
5  0 5632 184264 221948 3113308  0   000  125 222071 39  8 53  0
4  0 5632 184264 221948 3113308  0   000  110 215097 39 10 52  0
1  0 5632 184588 221948 3113308  0   00   96  139 187561 35 10 55  0
3  0 5632 184588 221948 3113308  0   000  114 241731 38 10 52  0
3  0 5632 184920 221948 3113308  0   000  132 257168 40  9 51  0
1  0 5632 184912 221948 3113308  0   000  114 251802 38  9 54  0

Note the test case assumes you've got shared_buffers set to at least
1000; with smaller values, you may get some I/O syscalls, which will
probably skew the results.
 shared_buffers

 16384
(1 row)
I found that killing three of the four concurrent queries dropped 
context switches to about 70,000 to 100,000. Two or more sessions brings 
it up to 200K+.

Joe

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Toooo many context switches (maybe SLES8?)

2004-04-15 Thread Joe Conway
Dirk Lutzebäck wrote:
Joe, do you know where I should look in the 7.4.2 code to find this out?
I think I was wrong. I just looked in CVS and found the commit I was 
thinking about:

http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/backend/storage/lmgr/s_lock.c.diff?r1=1.22r2=1.23
http://developer.postgresql.org/cvsweb.cgi/pgsql-server/src/include/storage/s_lock.h.diff?r1=1.123r2=1.124
=
Revision 1.23 / (download) - [select for diffs] , Sat Dec 27 20:58:58 
2003 UTC (3 months, 2 weeks ago) by tgl
Changes since 1.22: +5 -1 lines
Diff to previous 1.22

Improve spinlock code for recent x86 processors: insert a PAUSE
instruction in the s_lock() wait loop, and use test before test-and-set
in TAS() macro to avoid unnecessary bus traffic.  Patch from Manfred
Spraul, reworked a bit by Tom.
=
I thought this had been committed to the 7.4 stable branch as well, but 
it appears not.

Joe

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] SETOF performance

2004-04-05 Thread Joe Conway
Jeff wrote:
I think it was on this list - someone posted a  message about SETOF 
being slower.  Tom replied saying it was because it needed to create an 
on-disk tuplestore.

I was just looking for some clarification - a SETOF function will always 
write the reslting tuples to disk (Not buffering in say a sort_mem sized 
buffer)?
I think at least part of what you're seeing is normal function call 
overhead. As far as tuplestores writing to disk, here's what the source 
says:

In src/backend/utils/sort/tuplestore.c
8---
 * maxKBytes: how much data to store in memory (any data beyond this
 * amount is paged to disk).  When in doubt, use work_mem.
 */
Tuplestorestate *
tuplestore_begin_heap(bool randomAccess, bool interXact, int maxKBytes)
8---
In src/backend/executor/execQual.c:ExecMakeTableFunctionResult():
8---
tupstore = tuplestore_begin_heap(true, false, work_mem);
8---
So up to work_mem (sort_mem in 7.4 and earlier) should be stored in memory.

Joe

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-16 Thread Joe Conway
Tom Lane wrote:
Joe Conway [EMAIL PROTECTED] writes:

Any idea where I can get my hands on the latest version. I found the 
original post from Tom, but I thought there was a later version with 
both number of pages and time to sleep as knobs.
That was as far as I got.  I think Jan posted a more complex version
that would still be reasonable to apply to 7.4.
I have tested Tom's original patch now. The good news -- it works great 
in terms of reducing the load imposed by vacuum -- almost to the level 
of being unnoticeable. The bad news -- in a simulation test which loads 
an hour's worth of data, even with delay set to 1 ms, vacuum of the 
large table exceeds two hours (vs 12-14 minutes with delay = 0). Since 
that hourly load is expected 7 x 24, this obviously isn't going to work.

The problem with Jan's more complex version of the patch (at least the 
one I found - perhaps not the right one) is it includes a bunch of other 
experimental stuff that I'd not want to mess with at the moment. Would 
changing the input units (for the original patch) from milli-secs to 
micro-secs be a bad idea? If so, I guess I'll get to extracting what I 
need from Jan's patch.

Thanks,

Joe

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-16 Thread Joe Conway
Matthew T. O'Connor wrote:
If memory serves, the problem is that you actually sleep 10ms even when
you set it to 1.  One of the thing changed in Jan's later patch was the
ability to specify how many pages to work on before sleeping, rather
than how long to sleep inbetween every 1 page.  You might be able to do
a quick hack and have it do 10 pages or so before sleeping.
I thought I remembered something about that.

It turned out to be less difficult than I first thought to extract the 
vacuum delay stuff from Jan's performance patch. I haven't yet tried it 
out, but it's attached in case you are interested. I'll report back once 
I have some results.

Joe
Index: src/backend/access/nbtree/nbtree.c
===
RCS file: /home/pgsql/CvsRoot/pgsql-server/src/backend/access/nbtree/nbtree.c,v
retrieving revision 1.106
diff -c -b -r1.106 nbtree.c
*** src/backend/access/nbtree/nbtree.c  2003/09/29 23:40:26 1.106
--- src/backend/access/nbtree/nbtree.c  2003/11/03 17:56:54
***
*** 18,23 
--- 18,25 
   */
  #include postgres.h
  
+ #include unistd.h
+ 
  #include access/genam.h
  #include access/heapam.h
  #include access/nbtree.h
***
*** 27,32 
--- 29,39 
  #include storage/smgr.h
  
  
+ extern intvacuum_page_delay;
+ extern intvacuum_page_groupsize;
+ extern intvacuum_page_groupcount;
+ 
+ 
  /* Working state for btbuild and its callback */
  typedef struct
  {
***
*** 610,615 
--- 617,631 
  
CHECK_FOR_INTERRUPTS();
  
+   if (vacuum_page_delay  0)
+   {
+   if (++vacuum_page_groupcount = vacuum_page_groupsize)
+   {
+   vacuum_page_groupcount = 0;
+   usleep(vacuum_page_delay * 1000);
+   }
+   }
+ 
ndeletable = 0;
page = BufferGetPage(buf);
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
***
*** 736,741 
--- 752,768 
Buffer  buf;
Pagepage;
BTPageOpaque opaque;
+ 
+   CHECK_FOR_INTERRUPTS();
+ 
+   if (vacuum_page_delay  0)
+   {
+   if (++vacuum_page_groupcount = vacuum_page_groupsize)
+   {
+   vacuum_page_groupcount = 0;
+   usleep(vacuum_page_delay * 1000);
+   }
+   }
  
buf = _bt_getbuf(rel, blkno, BT_READ);
page = BufferGetPage(buf);
Index: src/backend/commands/vacuumlazy.c
===
RCS file: /home/pgsql/CvsRoot/pgsql-server/src/backend/commands/vacuumlazy.c,v
retrieving revision 1.32
diff -c -b -r1.32 vacuumlazy.c
*** src/backend/commands/vacuumlazy.c   2003/09/25 06:57:59 1.32
--- src/backend/commands/vacuumlazy.c   2003/11/03 17:57:27
***
*** 37,42 
--- 37,44 
   */
  #include postgres.h
  
+ #include unistd.h
+ 
  #include access/genam.h
  #include access/heapam.h
  #include access/xlog.h
***
*** 88,93 
--- 90,99 
  static TransactionId OldestXmin;
  static TransactionId FreezeLimit;
  
+ int   vacuum_page_delay = 0;  /* milliseconds per page group */
+ int   vacuum_page_groupsize = 10; /* group size */
+ int   vacuum_page_groupcount = 0; /* current group size count */
+ 
  
  /* non-export function prototypes */
  static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
***
*** 228,233 
--- 234,248 
  
CHECK_FOR_INTERRUPTS();
  
+   if (vacuum_page_delay  0)
+   {
+   if (++vacuum_page_groupcount = vacuum_page_groupsize)
+   {
+   vacuum_page_groupcount = 0;
+   usleep(vacuum_page_delay * 1000);
+   }
+   }
+ 
/*
 * If we are close to overrunning the available space for
 * dead-tuple TIDs, pause and do a cycle of vacuuming before we
***
*** 469,474 
--- 484,498 
  
CHECK_FOR_INTERRUPTS();
  
+   if (vacuum_page_delay  0)
+   {
+   if (++vacuum_page_groupcount = vacuum_page_groupsize)
+   {
+   vacuum_page_groupcount = 0;
+   usleep(vacuum_page_delay * 1000);
+   }
+   }
+ 
tblk = ItemPointerGetBlockNumber(vacrelstats-dead_tuples[tupindex]);
buf = ReadBuffer(onerel, tblk);
   

Re: [PERFORM] rapid degradation after postmaster restart

2004-03-15 Thread Joe Conway
Matthew T. O'Connor wrote:
I think you understand correctly.  A table with 1,000,000 rows should 
get vacuumed approx every 2,000,000 changes (assuming default values for 
-V ).  FYI and insert and a delete count as one change, but and update 
counts as two.

Unfortunately, the running with -d2 would show the numbers that 
pg_autovacuum is using to decide if it when it should vacuum or 
analyze.Also, are you sure that it vacuumed more than once and 
wasn't doing analyzes most of the time?
Yeah, I'm sure. Snippets from the log:

[...lots-o-tables...]
[2004-03-14 12:44:48 PM] added table: specdb.public.parametric_states
[2004-03-14 12:49:48 PM] Performing: VACUUM ANALYZE 
public.transaction_data
[2004-03-14 01:29:59 PM] Performing: VACUUM ANALYZE 
public.transaction_data
[2004-03-14 02:08:26 PM] Performing: ANALYZE public.out_of_spec
[2004-03-14 02:08:26 PM] Performing: VACUUM ANALYZE 
public.transaction_data
[2004-03-14 02:22:44 PM] Performing: VACUUM ANALYZE public.spc_graphs
[2004-03-14 03:06:45 PM] Performing: VACUUM ANALYZE public.out_of_spec
[2004-03-14 03:06:45 PM] Performing: VACUUM ANALYZE 
public.transaction_data
[2004-03-14 03:19:51 PM] Performing: VACUUM ANALYZE public.spc_graphs
[2004-03-14 03:21:09 PM] Performing: ANALYZE public.parametric_states
[2004-03-14 03:54:57 PM] Performing: ANALYZE public.out_of_spec
[2004-03-14 03:54:57 PM] Performing: VACUUM ANALYZE 
public.transaction_data
[2004-03-14 04:07:52 PM] Performing: VACUUM ANALYZE public.spc_graphs
[2004-03-14 04:09:33 PM] Performing: ANALYZE public.equip_status_history
[2004-03-14 04:09:33 PM] Performing: VACUUM ANALYZE 
public.parametric_states
[2004-03-14 04:43:46 PM] Performing: VACUUM ANALYZE public.out_of_spec
[2004-03-14 04:43:46 PM] Performing: VACUUM ANALYZE 
public.transaction_data
[2004-03-14 04:56:35 PM] Performing: VACUUM ANALYZE public.spc_graphs
[2004-03-14 04:58:32 PM] Performing: ANALYZE public.parametric_states
[2004-03-14 05:28:58 PM] added database: specdb

This is the entire period of the first test, with default autovac 
settings. The table public.transaction_data is the one with 28 
million active rows. The entire test run inserts about 600 x 600 = 
360,000 rows, out of which roughly two-thirds are later deleted.

That's unfortunate as that is the detail we need to see what 
pg_autovacuum thinks is really going on.  We had a similar sounding 
crash on FreeBSD due to some unitialized variables that were being 
printed out by the debug code, however that was fixed a long time ago.  
Any chance you can look into this?
I can try. The server belongs to another department, and they are under 
the gun to get back on track with their testing. Also, they compiled 
without debug symbols, so I need to get permission to recompile.

Yes I would be very curious to see the results with the vacuum delay 
patch installed (is that patch applied to HEAD?)
Any idea where I can get my hands on the latest version. I found the 
original post from Tom, but I thought there was a later version with 
both number of pages and time to sleep as knobs.

Thanks,

Joe

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-15 Thread Joe Conway
Tom Lane wrote:
Joe Conway [EMAIL PROTECTED] writes:
Any idea where I can get my hands on the latest version. I found the 
original post from Tom, but I thought there was a later version with 
both number of pages and time to sleep as knobs.
That was as far as I got.  I think Jan posted a more complex version
that would still be reasonable to apply to 7.4.
I thought that too, but was having trouble finding it. I'll look again.

Thanks,

Joe

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-15 Thread Joe Conway
Matthew T. O'Connor wrote:
Strange... I wonder if this is some integer overflow problem.  There was 
one reported recently and fixed as of CVS head yesterday, you might try 
that, however without the -d2 output I'm only guessing at why 
pg_autovacuum is vacuuming so much / so often.
I'll see what I can do tomorrow to track it down.

I have already recommended to the program manager that they switch to 
7.4.2 plus the autovacuum patch. Not sure they will be willing to make 
any changes at this stage in their release process though.

If we can't find one, any chance you can 
do some testing with CVS HEAD just to see if that works any better.  I 
know there has been a fair amount of work done to improve this situation 
(not just vacuum delay, but ARC etc...)
I might do that, but not likely on Solaris. I can probably get a copy of 
the current database and testing scripts, and give it a try on one of my 
own machines (all Linux, either RHAS3, RH9, or Fedora).

Joe

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] rapid degradation after postmaster restart

2004-03-13 Thread Joe Conway
Marty Scholes wrote:
I have seen similar results to what you are describing.

I found that running a full vacuum:

vacuumdb -fza

followed by a checkpoint makes it run fast again.

Try timing the update with and without a full vacuum.
Will do. I'll let you know how it goes.

Thanks for the reply.

Joe

---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] rapid degradation after postmaster restart

2004-03-12 Thread Joe Conway
I'm trying to troubleshoot a performance issue on an application ported 
from Oracle to postgres. Now, I know the best way to get help is to post 
the schema, explain analyze output, etc, etc -- unfortunately I can't do 
that at the moment. However, maybe someone can point me in the right 
direction to figure this out on my own. That said, here are a few details...

PostgreSQL 7.4.1
bash-2.03$ uname -a
SunOS col65 5.8 Generic_108528-27 sun4u sparc SUNW,Sun-Fire-280R
The problem is this: the application runs an insert, that fires off a 
trigger, that cascades into a fairly complex series of functions, that 
do a bunch of calculations, inserts, updates, and deletes. Immediately 
after a postmaster restart, the first insert or two take about 1.5 
minutes (undoubtedly this could be improved, but it isn't the main 
issue). However by the second or third insert, the time increases to 7 - 
9 minutes. Restarting the postmaster causes the cycle to repeat, i.e. 
the first one or two inserts are back to the 1.5 minute range.

Any ideas spring to mind? I don't have much experience with Postgres on 
Solaris -- could it be related to that somehow?

Thanks for any insights.

Joe

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Maximum Possible Insert Performance?

2003-11-23 Thread Joe Conway
William Yu wrote:
My situation is this. We have a semi-production server where we 
pre-process data and then upload the finished data to our production 
servers. We need the fastest possible write performance. Having the DB 
go corrupt due to power loss/OS crash is acceptable because we can 
always restore from last night and re-run everything that was done since 
then.
If you can, use COPY -- it is far faster than INSERT.

See:
http://www.postgresql.org/docs/current/static/sql-copy.html
HTH,

Joe



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Another weird one with an UPDATE

2003-10-13 Thread Joe Conway
David Griffiths wrote:
Yes, the query operates only on indexed columns (all numeric(10)'s).

Column |Type |
Modifiers
---+-+--
---
 user_account_id   | numeric(10,0)   | not null
[snip]
Indexes: user_account_pkey primary key btree (user_account_id),
Foreign Key constraints: $1 FOREIGN KEY (lang_id) REFERENCES lang(lang_id)
ON UPDATE NO ACTION ON DELETE NO ACTION,
 $2 FOREIGN KEY (source_id) REFERENCES
source(source_id) ON UPDATE NO ACTION ON DELETE NO ACTION,
 $3 FOREIGN KEY (user_role_id) REFERENCES
user_role(user_role_id) ON UPDATE NO ACTION ON DELETE NO ACTION
And what about commercial_entity.user_account_id. Is it indexed and what 
is its data type (i.e. does it match numeric(10,0))?

Also, have you run VACUUM ANALYZE lately?

Joe

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Another weird one with an UPDATE

2003-10-12 Thread Joe Conway
David Griffiths wrote:
I think you want something like:
UPDATE user_account SET last_name = 'abc'
WHERE EXISTS (SELECT 1 FROM commercial_entity ce, commercial_service cs
WHERE user_account.user_account_id = ce.user_account_id AND
ce.commercial_entity_id = cs.commercial_entity_id);
Unfort, this is still taking a long time.
---
 Seq Scan on user_account  (cost=0.00..748990.51 rows=36242 width=716)
Do you have an index on user_account.user_account_id?

Joe

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Compare rows

2003-10-08 Thread Joe Conway
Greg Spiegelberg wrote:
The reason for my initial question was this.  We save changes only.
In other words, if system S has row T1 for day D1 and if on day D2
we have another row T1 (excluding our time column) we don't want
to save it.
It still isn't entirely clear to me what you are trying to do, but 
perhaps some sort of calculated checksum or hash would work to determine 
if the data has changed?

Joe



---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [SQL] [PERFORM] EXTERNAL storage and substring on long strings

2003-08-14 Thread Joe Conway
Scott Cain wrote:
Oh, and I forgot to mention: it's highly compressed (bzip2 -9) and is
109M.
Thanks. I'll grab a copy from home later today and see if I can find 
some time to poke at it.

Joe



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] EXTERNAL storage and substring on long strings

2003-08-06 Thread Joe Conway
Tom Lane wrote:
Scott Cain [EMAIL PROTECTED] writes:

A few days ago, I asked for advice on speeding up substring queries on
the GENERAL mailing list.  Joe Conway helpfully pointed out the ALTER
TABLE STORAGE EXTERNAL documentation.  After doing the alter,
the queries got slower!  Here is the background:
Ah-hah, I've sussed it ... you didn't actually change the storage
representation.  You wrote:
Yeah, I came to the same conclusion this morning (update longdna set dna 
= dna || '';), but it still seems that the chunked table is very 
slightly faster than the substring on the  externally stored column:

dna=# explain analyze select pdna from dna where foffset  600 and 
foffset  6024000;
QUERY PLAN
--
 Index Scan using foffset_idx on dna  (cost=0.00..4.22 rows=14 
width=32) (actual time=0.06..0.16 rows=11 loops=1)
   Index Cond: ((foffset  600) AND (foffset  6024000))
 Total runtime: 0.27 msec
(3 rows)

dna=# explain analyze select pdna from dna where foffset  600 and 
foffset  6024000;
QUERY PLAN
--
 Index Scan using foffset_idx on dna  (cost=0.00..4.22 rows=14 
width=32) (actual time=0.07..0.16 rows=11 loops=1)
   Index Cond: ((foffset  600) AND (foffset  6024000))
 Total runtime: 0.25 msec
(3 rows)

dna=# explain analyze select substr(dna,6002000,2) from longdna;
   QUERY PLAN

 Seq Scan on longdna  (cost=0.00..1.01 rows=1 width=32) (actual 
time=0.46..0.47 rows=1 loops=1)
 Total runtime: 0.58 msec
(2 rows)

dna=# explain analyze select substr(dna,6002000,2) from longdna;
   QUERY PLAN

 Seq Scan on longdna  (cost=0.00..1.01 rows=1 width=32) (actual 
time=0.23..0.24 rows=1 loops=1)
 Total runtime: 0.29 msec
(2 rows)

I ran each command twice after starting psql to observe the effects of 
caching.

However with the provided sample data, longdna has only one row, and dna 
has 122,540 rows, all of which are chunks of the one longdna row. I 
would tend to think that if you had 1000 or so longdna records indexed 
on some id column, versus 122,540,000 dna chunks indexed on both an id 
and segment column, the substring from longdna would win.

Joe

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Odd performance results - more info

2003-08-05 Thread Joe Conway
Medora Schauer wrote:
I would greatly appreciate it if someone could run this code in their
environment and let me know if you get results similiar to mine.
The INT test results in execution times of 11 - 50+ secs increasing
each time the test is run.  The FLOAT test execution times are
consistently  3 secs regardless of how many times it is run.
Without actually trying the code, I'd bet that an index is getting used 
for the float8 case, but not in the int8 case:

if (dtype == INT_TYPE){
snprintf(buffer, sizeof(buffer),
 UPDATE shot_record SET trace_count = %d  \
 WHERE shot_line_num = %d  \
   AND shotpoint = %d  \
   AND index = %d ,
 0, shotline, shotpoint + i, 0);
} else {
snprintf(buffer, sizeof(buffer),
 UPDATE shot_record SET trace_count = %d  \
 WHERE shot_line_num = %f  \
   AND shotpoint = %f  \
   AND index = %d ,
 0, (float)shotline, (float)shotpoint + (float)i, 0);
}
Try running EXPLAIN ANALYZE on these update statements manually. It also 
might help to run VACUUM ANALYZE after populating the tables.

HTH,

Joe

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] One table or many tables for data set

2003-07-22 Thread Joe Conway
Castle, Lindsay wrote:
The data structure looks like this:
element
date
num1
num2
num3
num4
units
There are approx 12,000 distinct elements for a total of about 6 million
rows of data.
Ahh, that helps! So are the elements evenly distributed, i.e. are there 
approx 500 rows of each element? If so, it should be plenty quick to put 
all the data in one table with an index on element (and maybe a 
multicolumn key, depending on other factors).

The scanning technology I want to use may need a different number of rows
and different columns depending on the scan formula;
eg scan1 may need num1, num2 and num3 from the last 200 rows for
element x
   scan2 may need num1, units from the last 10 rows for element y
When you say last X rows, do you mean sorted by date? If so, you 
might want that index to be on (element, date). Then do:

SELECT num1, num2, num3 FROM mytable WHERE element = 'an_element' order 
by date DESC LIMIT 20;

Replace num1, num2, num3 by whatever columns you want, and LIMIT X as 
the number of rows you want.

HTH,

Joe

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Hardware performance

2003-07-17 Thread Joe Conway
Hannu Krosing wrote:
How do you do RAID 1+0 with just two drives ?

Hmm, good point -- I must have been tired last night ;-). With two 
drives you can do mirroring or striping, but not both.

Usually I've seen a pair of mirrored drives for the OS, and a RAID 1+0 
array for data. But that requires 6 drives, not 5. On non-database 
servers usually the data array is RAID 5, and you could get away with 5 
drives (as someone else pointed out).

As I said, I've never personally found it necessary to move WAL off to a 
different physical drive. What do you think is the best configuration 
given the constraint of 5 drives? 1 drive for OS, and 4 for RAID 1+0 for 
data-plus-WAL? I guess the ideal would be to find enough money for that 
6th drive, use the mirrored pair for both OS and WAL.

Joe



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Hardware performance

2003-07-17 Thread Joe Conway
Adam Witney wrote:
I think the issue from the original posters point of view is that the Dell
PE2650 can only hold a maximum of 5 internal drives
True enough, but maybe that's a reason to be looking at other 
alternatives. I think he said the hardware hasn't been bought yet.

Joe



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Hardware performance

2003-07-17 Thread Joe Conway
Jean-Luc Lachance wrote:
I am currious. How can you have RAID 1+0 with only 2 drives?
If you are thinking about partitioning the drives, wont this defeate the
purpose?
Yeah -- Hannu already pointed out that my mind was fuzzy when I made 
that statement :-(. See subsequent posts.

Joe



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Hardware performance

2003-07-17 Thread Joe Conway
Adam Witney wrote:
Actually I am going through the same questions myself at the moment I
would like to have a 2 disk RAID1 and a 4 disk RAID5, so need at least 6
disks
Anybody have any suggestions or experience with other hardware manufacturers
for this size of setup? (2U rack, up to 6 disks, 2 processors, ~2GB RAM, if
possible)
I tend to use either 1U or 4U servers, depending on the application. But 
I've had good experiences with IBM recently, and a quick look on their 
site shows the x345 with these specs:

  2U, 2-way server delivers extreme performance and availability for 
demanding applications
  Up to 2 Intel Xeon processors up to 3.06GHz with 533MHz front-side 
bus speed for outstanding performance
  Features up to 8GB of DDR memory, 5 PCI (4 PCI-X) slots and up to 6 
hard disk drives for robust expansion
  Hot-swap redundant cooling, power and hard disk drives for high 
availability
  Integrated dual Ultra320 SCSI with RAID-1 for data protection

This may not wrap well, but here is the url:
http://www-132.ibm.com/webapp/wcs/stores/servlet/CategoryDisplay?catalogId=-840storeId=1categoryId=2559454langId=-1dualCurrId=73
Handles 6 drives; maybe that fits the bill?

Joe

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Sanity check requested

2003-07-17 Thread Joe Conway
Nick Fankhauser wrote:
Thanks for the correction- it sounds like this is one where usage
can't be precisely controlled in a dynamic user environment  I just
need to get a feel for what works under a load that approximates my
production system.
I think the most important point here is that if you set sort_mem too 
high, and you have a lot of simultaneous sorts, you can drive the server 
into swapping, which obviously is a very bad thing. You want it set as 
high as possible, but not so high given your usage patterns that you 
wind up swapping.

Joe



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html