Re: [PERFORM] Horribly slow hash join

2004-04-17 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes:

> Greg Stark <[EMAIL PROTECTED]> writes:
> > Tom Lane <[EMAIL PROTECTED]> writes:
> >> (Also, I have been harboring some notions of supporting cross-type hash
> >> joins for integer types, which will not work unless small int8 values hash
> >> the same as int4 etc.)
> 
> > The obvious way to modify the hash function is to xor the high 32 bits with
> > the low 32 bits. That maintains the property you need
> 
> No it doesn't ...

Eh? Oh, negative numbers? So low^high^sign.


I wonder if it makes sense to have check the hash distribution after
generating the table and if it's bad then throw it away and try again with a
different hash function. The "different hash function" would probably just be
a seed value changing. Probably way overkill though.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Horribly slow hash join

2004-04-17 Thread Dennis Bjorklund
On Sat, 17 Apr 2004, Tom Lane wrote:

> *some* set of inputs.  (Also, I have been harboring some notions of
> supporting cross-type hash joins for integer types, which will not work
> unless small int8 values hash the same as int4 etc.)

The simple solution would be to always extend integers to 64 bits (or
whatever the biggest integer is) before calculating the hash. It makes the
hash function a little slower for smaller types, but it's mostly an
operation in the cpu and no memory involved, so it's probably not
noticable.

-- 
/Dennis Björklund


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Horribly slow hash join

2004-04-17 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> writes:
>> (Also, I have been harboring some notions of supporting cross-type hash
>> joins for integer types, which will not work unless small int8 values hash
>> the same as int4 etc.)

> The obvious way to modify the hash function is to xor the high 32 bits with
> the low 32 bits. That maintains the property you need

No it doesn't ...

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Poor performance of group by query

2004-04-17 Thread Mark Kirkwood
It might be worth trying out a build with -O2, just to rule out any -O3 
oddness.

regards

Mark

Jim C. Nasby wrote:

PostgreSQL 7.4.2 compiled with -O3.

 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Horribly slow hash join

2004-04-17 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> We could change the hash function, perhaps, but then we'd just have
> different cases where there's a problem ... hashing will always fail on
> *some* set of inputs.

Sure, but completely ignoring part of the input seems like an unfortunate
choice of hash function.

> (Also, I have been harboring some notions of supporting cross-type hash
> joins for integer types, which will not work unless small int8 values hash
> the same as int4 etc.)

The obvious way to modify the hash function is to xor the high 32 bits with
the low 32 bits. That maintains the property you need and at least ensures
that all the bits are taken into account.

-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Horribly slow hash join

2004-04-17 Thread Marcos Martínez(R)
I didn't follow the conversation from the begining, bu I imagine that you
could improve
performance using the value (work_units % (2^32) ) instead of work_units.
You could even make an index on this value. Like that, the HASH function
will work well. This is not a good solution, but ...

For example.

create index ind1 on table1 ( work_units % (2^32) );

create index ind1 on table2 ( work_units % (2^32) );

Select * from table1 join table2 on (table1.work_units % (2^32) ) =
(table2.work_units % (2^32) )


- Original Message - 
From: "Tom Lane" <[EMAIL PROTECTED]>
To: "Jim C. Nasby" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Saturday, April 17, 2004 6:08 PM
Subject: Re: [PERFORM] Horribly slow hash join


> [ resending because I fat-fingered the cc: to the list ]
>
> I see the problem: all the entries in your work_units column have the
> low 32 bits equal to zero.
>
> regression=# select distinct work_units % (2^32)::bigint from
Trank_work_overall;
>  ?column?
> --
> 0
> (1 row)
>
> The hash function for int8 only takes the low word into account, so all
> of the entries end up on the same hash chain, resulting in worst-case
> behavior.  This applies to both your hash join and hash aggregate cases.
>
> We could change the hash function, perhaps, but then we'd just have
> different cases where there's a problem ... hashing will always fail on
> *some* set of inputs.  (Also, I have been harboring some notions of
> supporting cross-type hash joins for integer types, which will not work
> unless small int8 values hash the same as int4 etc.)
>
> I guess the real issue is why are you encoding work_units like that?
>
> regards, tom lane
>
> ---(end of broadcast)---
> TIP 6: Have you searched our list archives?
>
>http://archives.postgresql.org



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Horribly slow hash join

2004-04-17 Thread Tom Lane
[ resending because I fat-fingered the cc: to the list ]

I see the problem: all the entries in your work_units column have the
low 32 bits equal to zero.

regression=# select distinct work_units % (2^32)::bigint from Trank_work_overall;
 ?column?
--
0
(1 row)
 
The hash function for int8 only takes the low word into account, so all
of the entries end up on the same hash chain, resulting in worst-case
behavior.  This applies to both your hash join and hash aggregate cases.

We could change the hash function, perhaps, but then we'd just have
different cases where there's a problem ... hashing will always fail on
*some* set of inputs.  (Also, I have been harboring some notions of
supporting cross-type hash joins for integer types, which will not work
unless small int8 values hash the same as int4 etc.)

I guess the real issue is why are you encoding work_units like that?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] [ SOLVED ] select count(*) very slow on an already

2004-04-17 Thread Jeff
On Apr 16, 2004, at 4:23 AM, Rajesh Kumar Mallah wrote:



 I am running an update on the same table

 update rfis set inquiry_status='APPROVED' where inquiry_status='a';

 Its running for past 20 mins. and top output is below.
 The PID which is executing the query above is 6712. Can anyone
 tell me why it is in an uninterruptable sleep and does it relate
 to the apparent poor performance? Is it problem with the disk
 hardware. I know at nite this query will run reasonably fast.
I've had this problem recently.  The problem is simply that the disk 
cannot keep up.  Most likely you don't see it at night because traffic 
is lower.  There are only 2 solutions: 1. get more disks 2. write to 
the db less

The machine I was running on had a single(!) disk.  It was a quad xeon 
so there was plenty of cpu.   I'd see 8-9 processes stuck in the "D" 
state.  Doing a simple ls -l somefile would take 10-15 seconds and of 
course, db performance was abysmal.

I had a lowly P2 with a few disks in it that was able to run circles 
around it for the simple fact the machine was not waiting for disk.  
Again, proof that disk is far more important than CPU in a db.

good luck.

--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Long running queries degrade performance

2004-04-17 Thread Christopher Browne
A long time ago, in a galaxy far, far away, [EMAIL PROTECTED] (Mike Nolan) wrote:
>> We have a web app with a postgres backend.  Most queries have subsecond 
>> response times through the web even with high usage.  Every once in awhile 
>> someone will run either an ad-hoc query or some other long running db 
>> process.  
>
> Are you sure it is postgres where the delay is occurring?  I ask this
> because I also have a web-based front end to postgres, and while most of
> the time the queries respond in about a second every now and then I see
> one that takes much longer, sometimes 10-15 seconds.
>
> I've seen this behavior on both my development system and on the
> production server.  
>
> The same query a while later might respond quickly again.
>
> I'm not sure where to look for the delay, either, and it is
> intermittent enough that I'm not even sure what monitoring
> techniques to use.

Well, a first thing to do is to see what query plans get set up for
the queries.  If the plans are varying over time, that suggests
something's up with ANALYZEs.

If the plans look a bit questionable, then you may be encountering the
situation where cache is helping you on the _second_ query but not the
first.  I did some tuning yesterday involving the same sort of
"symptoms," and that turned out to be what was happening.

I'll describe (in vague detail ;-)) what I was seeing.

- The table being queried was a "transaction" table, containing tens of
  thousands of records per day.  

- The query was pulling summary information about one or another
  customer's activity on that day.

- The best index we had was on transaction date.

Thus, the query would walk through the "txn date" index, pulling
records into memory, and filtering them against the other selection
criteria.

The table is big, so that data is pretty widely scattered across many
pages.

The _first_ time the query is run, the data is all out on disk, and
there are hundreds-to-thousands of page reads to collect it all.  That
took 10-15 seconds.

The _second_ time it was run (as well as subsequent occasions), those
pages were all in cache, so the query runs in under a second.

What I wound up doing was to add an index on transaction date and
customer ID, so that a query that specifies both criteria will look
just for the few hundred (at most) records relevant to a particular
customer.  That's fast even the first time around.

We had a really useful "hook" on this one because the developer
noticed that the first time he queried for a particular day, it was
slow.  We could "repeat" the test easily by just changing to a day
that we hadn't pulled into cache yet.
-- 
(format nil "[EMAIL PROTECTED]" "cbbrowne" "acm.org")
http://cbbrowne.com/info/lisp.html
Referring to undocumented  private communications allows one to  claim
virtually anything: "we discussed this idea in  our working group last
year, and concluded that it was totally brain-damaged".
-- from the Symbolics Guidelines for Sending Mail

---(end of broadcast)---
TIP 8: explain analyze is your friend