date:20050930

Re: [HACKERS] Socket problem using beta2 on Windows-XP

2005-09-30 Thread Magnus Hagander

 IIRC, the win32 installer will enable autovacuum by default. 
 And yes, 
 autovacuum was my first thought as well after Thomas last 
 mail - that 
 would be a good explanation to why it happens when the postmaster is 
 idle.
   
 
 I used  the win32 installer defaults so autovacuum is 
 probably a safe assumption.

Right. Please try turning it off and see if the problem goes away.

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Socket problem using beta2 on Windows-XP

2005-09-30 Thread Thomas Hallgren


Magnus Hagander wrote:


Right. Please try turning it off and see if the problem goes away.
 


It does (go away).

- thomas




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Socket problem using beta2 on Windows-XP

2005-09-30 Thread Thomas Hallgren


Magnus Hagander wrote:


Right. Please try turning it off and see if the problem goes away.
 

No, wait! It does *not* go away. Do I need to do anything more than 
setting this in my postgresql.conf file:


autovacuum = false# enable autovacuum subprocess?

and restart the service?

The two zombie entries occurs directly when I start the service, then 
there's two new entries popping up every minute.


- thomas



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Socket problem using beta2 on Windows-XP

2005-09-30 Thread Magnus Hagander

 Right. Please try turning it off and see if the problem goes away.
   
 
 No, wait! It does *not* go away. Do I need to do anything 
 more than setting this in my postgresql.conf file:
 
 autovacuum = false# enable autovacuum subprocess?
 
 and restart the service?
 
 The two zombie entries occurs directly when I start the 
 service, then there's two new entries popping up every minute.

Yes, that should be enough.

Hmm. Weird!

If you can get a backtrace from the point where the error msg shows up,
that certainly would help - this means it's not coming from where we
thought it was coming from :-(

//Magnus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] FW: PGBuildfarm member snake Branch HEAD Status changed from OK to Contrib failure

2005-09-30 Thread Dave Page

Looks like there more unprotable code in the recent changes to pgbench
:-(

Regards, Dave.

 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] 
 Sent: 30 September 2005 02:17
 To: [EMAIL PROTECTED]; 
 [EMAIL PROTECTED]
 Subject: PGBuildfarm member snake Branch HEAD Status changed 
 from OK to Contrib failure
 
 
 The PGBuildfarm member snake had the following event on branch HEAD:
 
 Status changed from OK to Contrib failure
 
 The snapshot timestamp for the build that triggered this 
 notification is: 2005-09-30 01:00:01
 
 The specs of this machine are:
 OS:  Windows / Server 2003 SP1
 Arch: i686
 Comp: gcc / 3.4.2
 
 For more information, see 
 http://www.pgbuildfarm.org/cgi-bin/show_history.pl?nm=snakebr=HEAD
 
 

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Socket problem using beta2 on Windows-XP

2005-09-30 Thread Alvaro Herrera

On Fri, Sep 30, 2005 at 08:29:07AM +0200, Thomas Hallgren wrote:
 Magnus Hagander wrote:
 
 Right. Please try turning it off and see if the problem goes away.
  
 
 No, wait! It does *not* go away. Do I need to do anything more than 
 setting this in my postgresql.conf file:
 
 autovacuum = false# enable autovacuum subprocess?
 
 and restart the service?
 
 The two zombie entries occurs directly when I start the service, then 
 there's two new entries popping up every minute.

If it's two zombies per minute, then I bet it's the stat collector and
stat bufferer.  They are restarted by the postmaster if not found to be
running.

The weird thing is that the postmaster _should_ call wait() for them if
it detects that they died (when receiving a SIGCHLD signal AFAIR).  If
it doesn't, maybe it indicates there's a problem with the signal
handling on Win32.

-- 
Alvaro Herrera   Valdivia, Chile   ICBM: S 39º 49' 17.7, W 73º 14' 26.8
We are who we choose to be, sang the goldfinch
when the sun is high (Sandman)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Install pg_regress script to support PGXS?

2005-09-30 Thread Fabien COELHO



Dear Tom,


While testing the recent pgxs patch, I noticed that you can build and
install contrib with PGXS:
...
It seems that it ought to work to run installcheck too:
gmake USE_PGXS=1 installcheck
but this does not quite work because the pg_regress script isn't
included in the installation tree.  (If you copy it to where it'd
need to be, installcheck works.)

Is it worth including pg_regress in the installation to make this
work?  Seems like it might be handy for external modules to be able
to run self-tests.


Indeed, I noticed this when I did pgxs;-) It is on my todo list for 
postgresql, but I haven't had time much time this year to contribute 
anything.


The reason it was not included at first is that it seemed to me that 
selftests require a temporary installation which make sense easilly when 
building from sources, but I was not sure about what would be really 
needed for the feature out of the compilation tree (special conf files, 
diff/comparison commands...). I was planning to investigate the details, 
but if the pg_regress command is enough, that was a little bit stupid of 
me not to do it directly. ISTM that the pg_regress command is not compiled 
by default, only under make test, so it should be added to the default 
compilation?


--
Fabien.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread Ron Peacetree

From: Pailloncy Jean-Gerard [EMAIL PROTECTED]
Sent: Sep 29, 2005 7:11 AM
Subject: Re: [HACKERS] [PERFORM] A Better External Sort?
Jeff Baker:
Your main example seems to focus on a large table where a key  
column has constrained values.  This case is interesting in
proportion to the number of possible values.  If I have billions
of rows, each having one of only two values, I can think of a
trivial and very fast method of returning the table sorted by
that key: make two sequential passes, returning the first value
on the first pass and the second value on the second pass.
 This will be faster than the method you propose.

Ron Peacetree:
1= No that was not my main example.  It was the simplest example  
used to frame the later more complicated examples.  Please don't
get hung up on it.

2= You are incorrect.  Since IO is the most expensive operation we  
can do, any method that makes two passes through the data at top
scanning speed will take at least 2x as long as any method that only
takes one such pass.

You do not get the point.
As the time you get the sorted references to the tuples, you need to  
fetch the tuples themself, check their visbility, etc. and returns  
them to the client.

As PFC correctly points out elsewhere in this thread, =maybe= you
have to do all that.  The vast majority of the time people are not
going to want to look at a detailed record by record output of that
much data.

The most common usage is to calculate or summarize some quality
or quantity of the data and display that instead or to use the tuples
or some quality of the tuples found as an intermediate step in a
longer query process such as a join.

Sometimes there's a need to see _some_ of the detailed records; a
random sample or a region in a random part of the table or etc.
It's rare that there is a RW need to actually list every record in a
table of significant size.

On the rare occasions where one does have to return or display all
records in such large table, network IO and/or display IO speeds
are the primary performance bottleneck.  Not HD IO.

Nonetheless, if there _is_ such a need, there's nothing stopping us
from rearranging the records in RAM into sorted order in one pass
through RAM (using at most space for one extra record) after
constructing the cache conscious Btree index.  Then the sorted
records can be written to HD in RAM buffer sized chunks very
efficiently.  
Repeating this process until we have stepped through the entire
data set will take no more HD IO than one HD scan of the data
and leave us with a permanent result that can be reused for
multiple purposes.  If the sorted records are written in large
enough chunks, rereading them at any later time can be done
at maximum HD throughput

In a total of two HD scans (one to read the original data, one
to write out the sorted data) we can make a permanent
rearrangement of the data.  We've essentially created a 
cluster index version of the data.

So, if there is only 2 values in the column of big table that is larger  
than available RAM, two seq scans of the table without any sorting
is the fastest solution.

If you only need to do this once, yes this wins.  OTOH, if you have
to do this sort even twice, my method is better.

regards,
Ron  

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread Ron Peacetree

From: Zeugswetter Andreas DAZ SD [EMAIL PROTECTED]
Sent: Sep 29, 2005 9:28 AM
Subject: RE: [HACKERS] [PERFORM] A Better External Sort?

In my original example, a sequential scan of the 1TB of 2KB 
or 4KB records, = 250M or 500M records of data, being sorted 
on a binary value key will take ~1000x more time than reading 
in the ~1GB Btree I described that used a Key+RID (plus node 
pointers) representation of the data.

Imho you seem to ignore the final step your algorithm needs of
collecting the data rows. After you sorted the keys the collect
step will effectively access the tuples in random order (given a 
sufficiently large key range).

Collecting the data rows can be done for each RAM buffer full of
of data in one pass through RAM after we've built the Btree.  Then
if desired those data rows can be read out to HD in sorted order
in essentially one streaming burst.  This combination of index build
+ RAM buffer rearrangement + write results to HD can be repeat
as often as needed until we end up with an overall Btree index and
a set of sorted sublists on HD.  Overall HD IO for the process is only
two effectively sequential passes through the data.

Subsequent retrieval of the sorted information from HD can be
done at full HD streaming speed and whatever we've decided to
save to HD can be reused later if we desire.

Hope this helps,
Ron

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread Ron Peacetree

From: Josh Berkus josh@agliodbs.com
Sent: Sep 29, 2005 12:54 PM
Subject: Re: [HACKERS] [PERFORM] A Better External Sort?

The biggest single area where I see PostgreSQL external
sort sucking is on index creation on large tables.   For
example, for free version of TPCH, it takes only 1.5 hours to
load a 60GB Lineitem table on OSDL's hardware, but over 3
hours to create each index on that table.  This means that
over all our load into TPCH takes 4 times as long to create 
the indexes as it did to bulk load the data.

Hmmm.
60GB/5400secs= 11MBps.  That's ssllooww.  So the first
problem is evidently our physical layout and/or HD IO layer
sucks.

Creating the table and then creating the indexes on the table
is going to require more physical IO than if we created the
table and the indexes concurrently in chunks and then
combined the indexes on the chunks into the overall indexes
for the whole table, so there's a potential speed-up.

The method I've been talking about is basically a recipe for
creating indexes as fast as possible with as few IO operations,
HD or RAM, as possible and nearly no random ones, so it
could help as well.

OTOH, HD IO rate is the fundamental performance metric.
As long as our HD IO rate is pessimal, so will the performance
of everything else be.   Why can't we load a table at closer to
the peak IO rate of the HDs?   

Anyone restoring a large database from pg_dump is in the
same situation.  Even worse, if you have to create a new
index on a large table on a production database in use,
because the I/O from the index creation swamps everything.

Fix for this in the works ;-)

Following an index creation, we see that 95% of the time
required is the external sort, which averages 2mb/s.

Assuming decent HD HW, this is HORRIBLE.

What's kind of instrumenting and profiling has been done of
the code involved?

This is with seperate drives for the WAL, the pg_tmp, the table
and the index.  I've confirmed that increasing work_mem 
beyond a small minimum (around 128mb) had no benefit on
the overall index creation speed.

No surprise.  The process is severely limited by the abyssmally
slow HD IO.

Ron

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] postgresql clustering

2005-09-30 Thread Daniel Duvall

Thanks for your reply Luke.

Bizgres looks like a very promissing project.  I'll be sure to follow
it.

Thanks to everyone for their comments.  I'm starting to understand the
truth behind the hype and where these performance gains and hits stem
from.

-Dan


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] postgresql clustering

2005-09-30 Thread Daniel Duvall

What about clustered filesystems?  At first blush I would think the
overhead of something like GFS might kill performance.  Could one
potentially achieve a fail-over config using multiple nodes with GFS,
each having there own instance of PostgreSQL (but only one running at
any given moment)?

Best,
Dan


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread PFC



Just to add a little anarchy in your nice debate...

Who really needs all the results of a sort on your terabyte table ?

	I guess not many people do a SELECT from such a table and want all the  
results. So, this leaves :

- Really wanting all the results, to fetch using a cursor,
- CLUSTER type things, where you really want everything in order,
	- Aggregates (Sort-GroupAggregate), which might really need to sort the  
whole table.
	- Complex queries where the whole dataset needs to be examined, in order  
to return a few values

- Joins (again, the whole table is probably not going to be selected)
- And the ones I forgot.

However,

Most likely you only want to SELECT N rows, in some ordering :
- the first N (ORDER BY x LIMIT N)
- last N (ORDER BY x DESC LIMIT N)
- WHERE xvalue ORDER BY x LIMIT N
- WHERE xvalue ORDER BY x DESC LIMIT N
- and other variants

	Or, you are doing a Merge JOIN against some other table ; in that case,  
yes, you might need the whole sorted terabyte table, but most likely there  
are WHERE clauses in the query that restrict the set, and thus, maybe we  
can get some conditions or limit values on the column to sort.


	Also the new, optimized hash join, which is more memory efficient, might  
cover this case.


	Point is, sometimes, you only need part of the results of your sort. And  
the bigger the sort, the most likely it becomes that you only want part of  
the results. So, while we're in the fun hand-waving, new algorithm trying  
mode, why not consider this right from the start ? (I know I'm totally in  
hand-waving mode right now, so slap me if needed).


I'd say your new, fancy sort algorithm needs a few more input values :

- Range of values that must appear in the final result of the sort :
		none, minimum, maximum, both, or even a set of values from the other  
side of the join, hashed, or sorted.

- LIMIT information (first N, last N, none)
	- Enhanced Limit information (first/last N values of the second column to  
sort, for each value of the first column) (the infamous top10 by  
category query)

- etc.

	With this, the amount of data that needs to be kept in memory is  
dramatically reduced, from the whole table (even using your compressed  
keys, that's big) to something more manageable which will be closer to the  
size of the final result set which will be returned to the client, and  
avoid a lot of effort.


	So, this would not be useful in all cases, but when it applies, it would  
be really useful.


Regards !


















---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Fwd: Re: [HACKERS] postgresql clustering

2005-09-30 Thread Trent Shipley

What is the relationship between database support for clustering and grid 
computing and support for distributed databases?

Two-phase COMMIT is comming in 8.1.  What effect will this have in promoting 
FOSS grid support or distribution solutions for Postgresql? 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Install pg_regress script to support PGXS?

2005-09-30 Thread Alvaro Herrera

On Thu, Sep 29, 2005 at 03:24:15PM +0200, Fabien COELHO wrote:
 
 Dear Tom,
 
 While testing the recent pgxs patch, I noticed that you can build and
 install contrib with PGXS:
 ...
 It seems that it ought to work to run installcheck too:
  gmake USE_PGXS=1 installcheck
 but this does not quite work because the pg_regress script isn't
 included in the installation tree.  (If you copy it to where it'd
 need to be, installcheck works.)
 
 Is it worth including pg_regress in the installation to make this
 work?  Seems like it might be handy for external modules to be able
 to run self-tests.
 
 The reason it was not included at first is that it seemed to me that 
 selftests require a temporary installation [...]

The temporary installation is required to run make check, but not to
run make installcheck (which uses the regular installation instead.)
So just including pg_regress is enough, and since it's a small and
useful program, I'd say it's a reasonable thing to do.

(Maybe we should audit it for security problems, just in case the
distributors choose to include it in packages.  The temp file usage for
$TMPFILE looks predictable, thus maybe it _is_ insecure.  Apparently
it can be made to clobber any existing file owned by whoever runs it.)

-- 
Alvaro Herrerahttp://www.PlanetPostgreSQL.org
Entristecido, Wutra (canción de Las Barreras)
echa a Freyr a rodar
y a nosotros al mar

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Install pg_regress script to support PGXS?

2005-09-30 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 (Maybe we should audit it for security problems, just in case the
 distributors choose to include it in packages.  The temp file usage for
 $TMPFILE looks predictable, thus maybe it _is_ insecure.  Apparently
 it can be made to clobber any existing file owned by whoever runs it.)

pg_regress already *is* shipped in packages (certainly the RPMs contain it)
so if you see any such problems, please do fix 'em.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Socket problem using beta2 on Windows-XP

2005-09-30 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 If it's two zombies per minute, then I bet it's the stat collector and
 stat bufferer.  They are restarted by the postmaster if not found to be
 running.

That would make some sense, because the stat processes need to set up new
sockets (for the pipe between them).  The autovacuum theory didn't hold
any water in my eyes because autovacuum doesn't create any new sockets.

However, why two zombies?  That would mean that the grandchild process
started, which should mean that the pipe was already created ...

Does Windows have any equivalent of strace whereby we could watch what's
happening during stats process launch?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Socket problem using beta2 on Windows-XP

2005-09-30 Thread Magnus Hagander

  If it's two zombies per minute, then I bet it's the stat 
 collector and 
  stat bufferer.  They are restarted by the postmaster if not 
 found to 
  be running.
 
 That would make some sense, because the stat processes need 
 to set up new sockets (for the pipe between them).  The 
 autovacuum theory didn't hold any water in my eyes because 
 autovacuum doesn't create any new sockets.
 
 However, why two zombies?  That would mean that the 
 grandchild process started, which should mean that the pipe 
 was already created ...
 
 Does Windows have any equivalent of strace whereby we could 
 watch what's happening during stats process launch?


First of all, I won't be able to dig into this any more until next week
- sorry about that. But others are always free to :-)

There is no strace equivalent builtin, but you can get an addon from
http://www.bindview.com/Services/RAZOR/Utilities/Windows/strace_readme.c
fm. Don't put it on a production box permanently, though, it tends to
cause BSODs in some cases.

//Magnus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Found small issue with OUT params

2005-09-30 Thread Tony Caduto


Tom Lane wrote:


Mike Rylander [EMAIL PROTECTED] writes:
 


Using that logic, a functions with one OUT param would be the same as
a function returning a rowtype with only one column,
   



But it's not (and no, I don't want to make it so, because the overhead
for the useless record result would be significant).

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

 


Tom,
I hardly think the overhead would be significant on modern processors, I 
don't think the majority of users are running on Pentium 90s.( I am 
assuming you mean a performance overhead)


The whole point is the current behavior is inconsistent and not expected 
and should be changed to be inline with the way other DB systems work.
What is the point of even allowing a single OUT param then?  You might 
as well just raise a error and tell the user that a single OUT param is 
not allowed.
8.1 is going to bring even more users over from systems like Firebird, 
MS SQL and even Oracle, and all of these allow a single OUT param and it 
returns the name of the OUT param, not the name of the function.  Like I 
said before this behavior is going to make it more difficult to port 
applications from other systems.


How difficult can it be to check if the function has a single OUT param 
as compared to the old way of using RETURN?


Sorry if I am being a pain in the you know what, but I really think I am 
correct on this matter.



Thanks,

Tony

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Found small issue with OUT params

2005-09-30 Thread Martijn van Oosterhout

On Fri, Sep 30, 2005 at 10:20:34AM -0500, Tony Caduto wrote:
 Tom,
 I hardly think the overhead would be significant on modern processors, I 
 don't think the majority of users are running on Pentium 90s.( I am 
 assuming you mean a performance overhead)

Um, please read the documention. Returning a tuple is *significantly*
more expensive than returning a single value. You have to get the tuple
descriptor, allocate memory for the tuple, fill in all the fields with
your data... For a single value you just return it.

See here for all the details, you really don't want to do it if you
don't need to.

http://www.postgresql.org/docs/8.0/interactive/xfunc-c.html#AEN30497

Now, you could fudge the parser to automatically alter the name of the
value in the function but I'm have no idea how hard that would be...
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


pgp0BOnOk9s4S.pgp
Description: PGP signature

Re: [HACKERS] Socket problem using beta2 on Windows-XP

2005-09-30 Thread Thomas Hallgren


Tom Lane wrote:


However, why two zombies?  That would mean that the grandchild process
started, which should mean that the pipe was already created ...
 

To clarify, I talk about the tcpview window and connections, and thus 
zombi-connections. They both belong to the same pid and seems to point 
to eachother. The actual process no longer exists (it can't be viewed 
anywhere).


Regards,
Thomas Hallgren



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread Josh Berkus


Ron,


Hmmm.
60GB/5400secs= 11MBps.  That's ssllooww.  So the first
problem is evidently our physical layout and/or HD IO layer
sucks.


Actually, it's much worse than that, because the sort is only dealing 
with one column.  As I said, monitoring the iostat our top speed was 
2.2mb/s.


--Josh

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Found small issue with OUT params

2005-09-30 Thread Tony Caduto


Martijn van Oosterhout wrote:


On Fri, Sep 30, 2005 at 10:20:34AM -0500, Tony Caduto wrote:
 


Tom,
I hardly think the overhead would be significant on modern processors, I 
don't think the majority of users are running on Pentium 90s.( I am 
assuming you mean a performance overhead)
   



Um, please read the documention. Returning a tuple is *significantly*
more expensive than returning a single value. You have to get the tuple
descriptor, allocate memory for the tuple, fill in all the fields with
your data... For a single value you just return it.

See here for all the details, you really don't want to do it if you
don't need to.

http://www.postgresql.org/docs/8.0/interactive/xfunc-c.html#AEN30497

Now, you could fudge the parser to automatically alter the name of the
value in the function but I'm have no idea how hard that would be...
 



So you might notice little performance hit bringing back a million rows, 
and most of these type of single OUT params functions only return one 
row/value anyway.
There would be zero perceivable difference in performance regardless of 
the extra overhead for a single value/row.


As a application developer, I don't care about tuples etc, I just want 
it to work as expected without having to
resort to hacks like creating a second OUT param that is not used, 
otherwise I would have to change a lot of client code where ever the OUT 
param is refernced by name instead of position and that is done a lot 
because the position is more likely to change than the name.


The bottom line(regardless of any overhead or if I read the docs about 
returning a tuple) is that if you have a OUT param it should return that 
name, not the name of the function, period.


Thanks,

Tony




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread Dann Corbit

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of PFC
 Sent: Thursday, September 29, 2005 9:10 AM
 To: [EMAIL PROTECTED]
 Cc: Pg Hackers; pgsql-performance@postgresql.org
 Subject: Re: [HACKERS] [PERFORM] A Better External Sort?

   Just to add a little anarchy in your nice debate...

   Who really needs all the results of a sort on your terabyte
table ?

Reports with ORDER BY/GROUP BY, and many other possibilities.  40% of
mainframe CPU cycles are spent sorting.  That is because huge volumes of
data require lots of energy to be meaningfully categorized.  Let's
suppose that instead of a terabyte of data (or a petabyte or whatever)
we have 10% of it.  That's still a lot of data.

   I guess not many people do a SELECT from such a table and want
all
 the
 results. 

What happens when they do?  The cases where it is already fast are not
very important.  The cases where things go into the crapper are the ones
that need attention.

So, this leaves :
   - Really wanting all the results, to fetch using a cursor,
   - CLUSTER type things, where you really want everything in
order,
   - Aggregates (Sort-GroupAggregate), which might really need to
sort
 the
 whole table.
   - Complex queries where the whole dataset needs to be examined,
in
 order
 to return a few values
   - Joins (again, the whole table is probably not going to be
 selected)
   - And the ones I forgot.

   However,

   Most likely you only want to SELECT N rows, in some ordering :
   - the first N (ORDER BY x LIMIT N)
   - last N (ORDER BY x DESC LIMIT N)

For these, the QuickSelect algorithm is what is wanted.  For example:
#include stdlib.h
typedef double  Etype;

extern EtypeRandomSelect(Etype * A, size_t p, size_t r, size_t i);
extern size_t   RandRange(size_t a, size_t b);
extern size_t   RandomPartition(Etype * A, size_t p, size_t r);
extern size_t   Partition(Etype * A, size_t p, size_t r);

/*
**
** In the following code, every reference to CLR means:
**
**Introduction to Algorithms
**By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest
**ISBN 0-07-013143-0
*/

/*
** CLR, page 187
*/
Etype   RandomSelect(Etype A[], size_t p, size_t r, size_t i)
{
size_t  q,
k;
if (p == r)
return A[p];
q = RandomPartition(A, p, r);
k = q - p + 1;

if (i = k)
return RandomSelect(A, p, q, i);
else
return RandomSelect(A, q + 1, r, i - k);
}

size_t  RandRange(size_t a, size_t b)
{
size_t  c = (size_t) ((double) rand() / ((double) RAND_MAX +
1) * (b - a));
return c + a;
}

/*
** CLR, page 162
*/
size_t  RandomPartition(Etype A[], size_t p, size_t r)
{
size_t  i = RandRange(p, r);
Etype   Temp;
Temp = A[p];
A[p] = A[i];
A[i] = Temp;
return Partition(A, p, r);
}

/*
** CLR, page 154
*/
size_t  Partition(Etype A[], size_t p, size_t r)
{
Etype   x,
temp;
size_t  i,
j;

x = A[p];
i = p - 1;
j = r + 1;

for (;;) {
do {
j--;
} while (!(A[j] = x));
do {
i++;
} while (!(A[i] = x));
if (i  j) {
temp = A[i];
A[i] = A[j];
A[j] = temp;
} else
return j;
}
}

   - WHERE xvalue ORDER BY x LIMIT N
   - WHERE xvalue ORDER BY x DESC LIMIT N
   - and other variants

   Or, you are doing a Merge JOIN against some other table ; in
that
 case,
 yes, you might need the whole sorted terabyte table, but most likely
there
 are WHERE clauses in the query that restrict the set, and thus, maybe
we
 can get some conditions or limit values on the column to sort.

Where clause filters are to be applied AFTER the join operations,
according to the SQL standard.

   Also the new, optimized hash join, which is more memory
efficient,
 might
 cover this case.

For == joins.  Not every order by is applied to joins.  And not every
join is an equal join.

   Point is, sometimes, you only need part of the results of your
sort.
 And
 the bigger the sort, the most likely it becomes that you only want
part of
 the results.

That is an assumption that will sometimes be true, and sometimes not.
It is not possible to predict usage patterns for a general purpose
database system.

 So, while we're in the fun hand-waving, new algorithm trying
 mode, why not consider this right from the start ? (I know I'm totally
in
 hand-waving mode right now, so slap me if needed).

   I'd say your new, fancy sort algorithm needs a few more input
values
 :

   - Range of values that must appear in the final result of the
sort :
   none, minimum, maximum, both, or even a set of values
from the
 other
 side of the join, hashed, or sorted.

That will already happen (or it certainly

Re: [HACKERS] Found small issue with OUT params

2005-09-30 Thread Joshua D. Drake



So you might notice little performance hit bringing back a million rows, 
and most of these type of single OUT params functions only return one 
row/value anyway.
There would be zero perceivable difference in performance regardless of 
the extra overhead for a single value/row.


Sounds like we need a test case... up for it?

Sincerely,

Joshua D. Drake




--
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Found small issue with OUT params

2005-09-30 Thread Robert Treat

On Friday 30 September 2005 11:49, Martijn van Oosterhout wrote:
 On Fri, Sep 30, 2005 at 10:20:34AM -0500, Tony Caduto wrote:
  Tom,
  I hardly think the overhead would be significant on modern processors, I
  don't think the majority of users are running on Pentium 90s.( I am
  assuming you mean a performance overhead)

 Um, please read the documention. Returning a tuple is *significantly*
 more expensive than returning a single value. You have to get the tuple
 descriptor, allocate memory for the tuple, fill in all the fields with
 your data... For a single value you just return it.


ISTM it is better for us to be consistent with the visible behavior than to 
have two different behaviors for out param functions just so one can be 
faster.  That way if people are concerned about the speed difference, they 
can rewrite the function without an out param...  afaict, as it stands now 
you've given them no choice and are forcing them to handle two different 
scenarios. 
 
-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread Josh Berkus

Ron,

 That 11MBps was your =bulk load= speed.  If just loading a table
 is this slow, then there are issues with basic physical IO, not just
 IO during sort operations.

Oh, yeah.  Well, that's separate from sort.  See multiple posts on this 
list from the GreenPlum team, the COPY patch for 8.1, etc.  We've been 
concerned about I/O for a while.  

Realistically, you can't do better than about 25MB/s on a single-threaded 
I/O on current Linux machines, because your bottleneck isn't the actual 
disk I/O.   It's CPU.   Databases which go faster than this are all, to 
my knowledge, using multi-threaded disk I/O.

(and I'd be thrilled to get a consistent 25mb/s on PostgreSQL, but that's 
another thread ... )

 As I said, the obvious candidates are inefficient physical layout
 and/or flawed IO code.

Yeah, that's what I thought too.   But try sorting an 10GB table, and 
you'll see: disk I/O is practically idle, while CPU averages 90%+.   We're 
CPU-bound, because sort is being really inefficient about something. I 
just don't know what yet.

If we move that CPU-binding to a higher level of performance, then we can 
start looking at things like async I/O, O_Direct, pre-allocation etc. that 
will give us incremental improvements.   But what we need now is a 5-10x 
improvement and that's somewhere in the algorithms or the code.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread Luke Lonergan

Ron,

On 9/30/05 1:20 PM, Ron Peacetree [EMAIL PROTECTED] wrote:

 That 11MBps was your =bulk load= speed.  If just loading a table
 is this slow, then there are issues with basic physical IO, not just
 IO during sort operations.

Bulk loading speed is irrelevant here - that is dominated by parsing, which
we have covered copiously (har har) previously and have sped up by 500%,
which still makes Postgres  1/2 the loading speed of MySQL.
 
 As I said, the obvious candidates are inefficient physical layout
 and/or flawed IO code.

Yes.
 
 Until the basic IO issues are addressed, we could replace the
 present sorting code with infinitely fast sorting code and we'd
 still be scrod performance wise.

Postgres' I/O path has many problems that must be micro-optimized away.  Too
small of an operand size compared to disk caches, memory, etc etc are the
common problem.  Another is lack of micro-parallelism (loops) with long
enough runs to let modern processors pipeline and superscale.
  
The net problem here is that a simple select blah from blee order
by(blah.a); runs at 1/100 of the sequential scan rate.

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] FW: PGBuildfarm member snake Branch HEAD Status changed from OK to Contrib failure

2005-09-30 Thread Alvaro Herrera

On Fri, Sep 30, 2005 at 10:08:11AM +0100, Dave Page wrote:
 Looks like there more unprotable code in the recent changes to pgbench
 :-(

Here, the culprits are tfind() and tsearch().  These apparently aren't
portable enough, but they seem to exist on all other platforms.  Maybe
we could come up with a replacement on Windows?  Are there simple
btree/hash table functions on Windows, with a similar API?

-- 
Alvaro Herrera Architect, http://www.EnterpriseDB.com
La web junta la gente porque no importa que clase de mutante sexual seas,
tienes millones de posibles parejas. Pon buscar gente que tengan sexo con
ciervos incendiándose, y el computador dirá especifique el tipo de ciervo
(Jason Alexander)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] postgresql clustering

2005-09-30 Thread Luke Lonergan

Dan,

On 9/29/05 3:23 PM, Daniel Duvall [EMAIL PROTECTED] wrote:

 What about clustered filesystems?  At first blush I would think the
 overhead of something like GFS might kill performance.  Could one
 potentially achieve a fail-over config using multiple nodes with GFS,
 each having there own instance of PostgreSQL (but only one running at
 any given moment)?

Interestingly - my friend Matt O'Keefe built GFS at UMN, I was one of his
first customers/sponsors of the research in 1998 when I implemented an
8-node shared disk cluster on Alpha Linux using GFS and Fibre Channel.

Again - it depends on what you're doing - if it's OLTP, you will spend too
much time in lock management for disk access and things like Oracle RAC's
CacheFusion becomes critical to reduce the number of times you have to hit
disks.  For warehousing/sequential scans, this kind of clustering is
irrelevant.

- Luke 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread Dann Corbit

I see the following routines that seem to be related to sorting.

If I were to examine these routines to consider ways to improve it, what
routines should I key in on?  I am guessing that tuplesort.c is the hub
of activity for database sorting.

Directory of U:\postgresql-snapshot\src\backend\access\nbtree

08/11/2005  06:22 AM24,968 nbtsort.c
   1 File(s) 24,968 bytes

 Directory of U:\postgresql-snapshot\src\backend\executor

03/16/2005  01:38 PM 7,418 nodeSort.c
   1 File(s)  7,418 bytes

 Directory of U:\postgresql-snapshot\src\backend\utils\sort

09/23/2005  08:36 AM67,585 tuplesort.c
   1 File(s) 67,585 bytes

 Directory of U:\postgresql-snapshot\src\bin\pg_dump

06/29/2005  08:03 PM31,620 pg_dump_sort.c
   1 File(s) 31,620 bytes

 Directory of U:\postgresql-snapshot\src\port

07/27/2005  09:03 PM 5,077 qsort.c
   1 File(s)  5,077 bytes

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] postgresql clustering

2005-09-30 Thread Hans-Jürgen Schönig


Luke Lonergan wrote:

Dan,

On 9/29/05 3:23 PM, Daniel Duvall [EMAIL PROTECTED] wrote:



What about clustered filesystems?  At first blush I would think the
overhead of something like GFS might kill performance.  Could one
potentially achieve a fail-over config using multiple nodes with GFS,
each having there own instance of PostgreSQL (but only one running at
any given moment)?



Interestingly - my friend Matt O'Keefe built GFS at UMN, I was one of his
first customers/sponsors of the research in 1998 when I implemented an
8-node shared disk cluster on Alpha Linux using GFS and Fibre Channel.

Again - it depends on what you're doing - if it's OLTP, you will spend too
much time in lock management for disk access and things like Oracle RAC's
CacheFusion becomes critical to reduce the number of times you have to hit
disks.  



Hitting the disk is really bad. However, we have seen that consulting 
the network for small portions of data (e.g. locks) is even more 
critical. you will see that the CPU on all nodes is running at 1% or so 
while the network is waiting for data to be exchanged (latency) - this 
is the real problem.


i don't know what oracle is doing in detail but they have real problem 
when losing a node inside the cluster (syncing again is really time 
consuming).




For warehousing/sequential scans, this kind of clustering is
irrelevant.


I suggest to look at Teradata - for do really nice query partitioning on 
so called AMPs (we'd simply call it node). It is really nice for really 
ugly warehousing queries (ugly in terms of amount of data).


Hans



--
Cybertec Geschwinde  Schönig GmbH
Schöngrabern 134; A-2020 Hollabrunn
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] \d on database with a lot of tables is slow

2005-09-30 Thread Jim C. Nasby

On Wed, Sep 28, 2005 at 11:35:31AM -0400, Tom Lane wrote:
 Jim C. Nasby [EMAIL PROTECTED] writes:
  Sorry, remembered it wrong. It's 'Did not find any relation named',
  which appears to be in bin/psql/describe.c. It does occur when trying to
  do a \d on a specific table.
 
 Hmm, no immediate ideas.  You haven't provided a lot of context about
 this --- when it happens, is it repeatable?  Are they giving an exact
 table name or a pattern to \d?  Is a schema name included in what they
 give to \d?  What PG version are they running exactly?

Sorry, had the error message wrong:

ERROR:  cache lookup failed for relation 1906465919

It is on an exact table name.  When we retry the describe on a failure,
sometimes it works and sometimes it fails again.  When it fails again
the relation number is different. Also, \d schema.table always returns
quickly and never errors. \d table is slow and produces the error fairly
often.

They're using 8.0.x; I'm pretty certain it's 8.0.3.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] Bug 1473, pthread python on FreeBSD

2005-09-30 Thread Jim C. Nasby

http://archives.postgresql.org/pgsql-bugs/2005-02/msg00135.php

I think it may have been a bit early to disable pthread python support
(http://archives.postgresql.org/pgsql-hackers/2005-09/msg01136.php), as
Python was working fine on buildfarm member platypus. Maybe it's only an
issue with 4.x machines?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Vacuum questions...

2005-09-30 Thread Jim C. Nasby

On Thu, Sep 29, 2005 at 12:50:13AM +0300, Hannu Krosing wrote:
 On T, 2005-09-27 at 17:57 -0500, Jim C. Nasby wrote:
  On Tue, Sep 27, 2005 at 02:47:46PM -0400, Jan Wieck wrote:
   On 9/24/2005 8:17 PM, Jim C. Nasby wrote:
   
   Would it be difficult to vacuum as part of a dump? The reasoning behind
   this is that you have to read the table to do the dump anyway, 
   
   I think aside from what's been said so far, it would be rather difficult 
   anyway. pg_dump relies on MVCC and requires to run in one transaction to 
   see a consistent snapshot while vacuum jiggles around with transactions 
   in some rather non-standard way.
  
  Is this true even if they were in different connections?
  
  My (vague) understanding of the vacuum process is that it first vacuums
  indexes, and then vacuums the heap. 
 
 actually (lazy) vacuum does this
 
 1) scan heap, collect ctids of rows to remove
 2) clean indexes
 3) clean heap
 
  Since we don't dump indexes, there's
  nothing for backup to do while those are vacuumed, so my idea is:
  
  pg_dump:
  foreach (table)
  spawn vacuum
  wait for vacuum to hit heap
  start copy
  wait for analyze to finish
  next;
 
 probably the first heap scan of vacuum would go faster than dump as it
 does not have to write out anything, and the second scan ( nr 3 in above
 list ) would be either faster or slower, as it has to lock each page and
 rearrange tuples there.
 
 so it would be very hard to synchronize vacuum with either of them.

Well, I guess it depends on what the dump was writing to. Also depends
on available cache I expect.

Is this something that could be hacked together fairly easy just for
testing purposes? Would firing off a VACUUM tablename at the same time
as a COPY tablename be a good enough approximation?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] roundoff problem in time datatype

2005-09-30 Thread Jim C. Nasby

On Mon, Sep 26, 2005 at 06:23:06PM +0200, Andreas Pflug wrote:
 Tom Lane wrote:
 Dennis Bjorklund [EMAIL PROTECTED] writes:
 
 Do the sql standard say anything on the matter?
 
 
 It doesn't seem very helpful.  AFAICS, we should interpret storing
 '23:59:59.99' into a TIME(0) field as a cast from TIME(2) to TIME(0),
 and the spec defines that as
 
 15) If TD is the datetime data type TIME WITHOUT TIME ZONE, then 
 let
 TSP be the time precision of TD.
 
 b) If SD is TIME WITHOUT TIME ZONE, then TV is SV, with
   implementation-defined rounding or truncation if necessary.
 
 So it's implementation-defined what we do.
 
 IMHO Since 23:59:59.99 probably means the last milliseconds of this 
 day, as far as precision allows to express it, this should be truncated 
 to 23:59:59, not rounded to 24:00:00. Until the last microsecond has 
 elapsed, it's not 24 hours (you wouldn't round happy new year at 
 23:59:30 from a clock with minutes only either)

Maybe also allow for a warning to be generated? Or some way to signal an
overflow?

I think it could be valid to do this, or round up to 24:00:00 or 'round
up' to 00:00:00, depending on what the app was trying to accomplish.
Would it be possible to allow an option to the datatype that specifies
the rounding behavior, or would they need to be different datatypes?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] roundoff problem in time datatype

2005-09-30 Thread Jim C. Nasby

On Mon, Sep 26, 2005 at 11:46:47AM -0400, Tom Lane wrote:
 Jochem van Dieten [EMAIL PROTECTED] writes:
  On 9/26/05, Dennis Bjorklund wrote:
  One reason is because it's what the standard demand.
 
  Could you cite that? The only thing I can find in the SQL standard is
  that the hour field in an INTERVAL can not exceed 23, not datetimes.
 
 SQL99 has
 
  _Table_11-Valid_values_for_datetime_fields_
 
  _KeywordValid_values_of_datetime_fields
 
 | YEAR | 0001 to   |
 |  |   |
 | MONTH| 01 to 12  |
 |  |   |
 | DAY  | Within the range 1 (one) to 31, but further   |
  constrained by the value of MONTH and YEAR
  fields, according to the rules for well-
  formed dates in the Gregorian calendar.
 
 | HOUR | 00 to 23  |
 |  |   |
 | MINUTE   | 00 to 59  |
 |  |   |
 | SECOND   | 00 to 61.9(N) where 9(N) indicates  |
  the number of digits specified by time
  fractional seconds precision.
 
 | TIMEZONE_HOUR| -12 to 13 |
 |  |   |
 |_TIMEZONE_MINUTE__|_-59_to_59_|
 |  |   |
 NOTE 62 - Datetime data types will allow dates in the Gregorian
 format to be stored in the date range 0001-01-01 CE through
 -12-31 CE. The range for SECOND allows for as many as two
 leap seconds. Interval arithmetic that involves leap seconds
 or discontinuities in calendars will produce implementation-
 defined results.
 
 The urban legend about needing 2 leap seconds in the same minute has
 infected the standard I see.  It should only allow 60. as the max
 value for SECOND.
 
 Note however that we feel free to exceed the spec in other aspects of
 this --- we exceed their year range for instance.  So I don't think we
 necessarily have to reject '24:00:00'.
 
 Also, the spec explicitly states that arithmetic on TIME values is done
 modulo 24 hours.  So it's correct for '23:59:59'::time + '1 second'::interval
 to yield '00:00:00', but this does not necessarily mean that we should
 cause rounding to behave that way.  Depends whether you think that
 rounding is an arithmetic operation or not ...

Does that portion of the spec also apply to plain time fields? The
entire issue here only exists because there's no method to handle the
overflow, unlike in a timestamp.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] FW: PGBuildfarm member snake Branch HEAD Status changed

2005-09-30 Thread Andrew Dunstan




Alvaro Herrera wrote:


On Fri, Sep 30, 2005 at 10:08:11AM +0100, Dave Page wrote:
 


Looks like there more unprotable code in the recent changes to pgbench
:-(
   



Here, the culprits are tfind() and tsearch().  These apparently aren't
portable enough, but they seem to exist on all other platforms.  Maybe
we could come up with a replacement on Windows?  Are there simple
btree/hash table functions on Windows, with a similar API?
 



Not that I can see, looking here:

http://msdn.microsoft.com/library/en-us/vclib/html/vcrefRunTimeLibraryReference.asp

The library found at 
http://sourceforge.net/project/shownotes.php?release_id=209006 seems to 
have what is needed, but I really don't think we can impose that extra 
requirement at this stage of the release cycle, do you? ISTM either we 
need to revert this or change it to use an API that is already known to 
be supported.


cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Open items list for 8.1

2005-09-30 Thread Jim C. Nasby

On Wed, Sep 28, 2005 at 06:07:02PM -0400, Neil Conway wrote:
 On Wed, 2005-28-09 at 18:35 -0300, Marc G. Fournier wrote:
  The problem isn't whether or not they should be changed, the problem is 
  that they were changed *during* beta AND *against* the direction that 
  discussion on these changes went
 
 I'm not sure what you mean: what is the direction that discusson on
 these changes went? (If you're referring to complete vs. total,
 that hardly constitutes a change in direction.)
 
  ... pre-beta would have been more acceptable, but pre-feature freeze
  would have been much preferred
 
 I think there is an argument to be made for reverting pg_cancel_backend,
 since that function was released with 8.0. Personally I'm sceptical that
 there are very many people using that function in scripts (particularly
 using it in such a way that their scripts will break if the return type
 is changed). Since we've already made the change, I don't really see the
 point in reverting it, but I don't mind if someone wants to do it.

I think it's just as important to work towards keeping interfaces clean
as it is not to break old code.

What's wrong with adding pg_cancel_backend(...) RETURNS int as an alias
for the one that returns boolean, and document that it's deprecated and
will be removed in the future.

The same goes for Tom's timeofday() RETURNS text example.

 As for the other changes, I think there is absolutely no reason to
 revert them. Since when is making changes to the signatures of new
 functions forbidden during the beta period? AFAIK we don't make
 guarantees of backward compatibility during the beta period, nor would
 it be sensible to do so. We had the opportunity to fix some poor API
 choices, and since an initdb was already required I think making these
 changes for beta2 was quite reasonable.

Agreed. Not making API changes now means we get to live with them for
years and years.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] On Logging

2005-09-30 Thread Jim C. Nasby

On Mon, Sep 26, 2005 at 10:57:54AM -0700, Ron Mayer wrote:
 David Fetter wrote:
 ...log file formats in 8.0
 
 * CSV
 * YAML
 * XML
 * Piped logs, as Apache can do
 * DB handle.  I know this one will be controversial.
 [...]
 1.  Am I the only one who would wants an option for machine-readable logs?
 
 I'd very much like a format that can be easily loaded into
 a database (not necessarily the same one producing the logs :-) )
 in real time and/or be visible as a table through something
 like dbi-link.
 
 I suppose any of the first three formats you suggest could work
 with dbi-link; or another alternate format
   * sql insert statements
 would work if piped logs were supported by sending it to psql.

Apache seems to have the best, most flexible logging of anything out
there, and should probably be used as a model. It's pretty easy to have
it actually log to a database.

Whatever method we decide on, I think it would be very useful if we
supported multiple logging streams. I certainly wouldn't want to give up
a human-readable log to get a CSV one.

Is a logging mechanism the best way to do profiling? Seems like it might
be better to have a more efficient, dedicated method. But I'm not
against adding capabilities like per-backend logging, etc.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Open items list for 8.1

2005-09-30 Thread Neil Conway

On Fri, 2005-30-09 at 17:47 -0500, Jim C. Nasby wrote:
 What's wrong with adding pg_cancel_backend(...) RETURNS int as an alias
 for the one that returns boolean, and document that it's deprecated and
 will be removed in the future.

You can't overload functions based on their return type alone.

-Neil



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Open items list for 8.1

2005-09-30 Thread Bruce Momjian

Jim C. Nasby wrote:
 On Wed, Sep 28, 2005 at 06:07:02PM -0400, Neil Conway wrote:
  On Wed, 2005-28-09 at 18:35 -0300, Marc G. Fournier wrote:
   The problem isn't whether or not they should be changed, the problem is 
   that they were changed *during* beta AND *against* the direction that 
   discussion on these changes went
  
  I'm not sure what you mean: what is the direction that discusson on
  these changes went? (If you're referring to complete vs. total,
  that hardly constitutes a change in direction.)
  
   ... pre-beta would have been more acceptable, but pre-feature freeze
   would have been much preferred
  
  I think there is an argument to be made for reverting pg_cancel_backend,
  since that function was released with 8.0. Personally I'm sceptical that
  there are very many people using that function in scripts (particularly
  using it in such a way that their scripts will break if the return type
  is changed). Since we've already made the change, I don't really see the
  point in reverting it, but I don't mind if someone wants to do it.
 
 I think it's just as important to work towards keeping interfaces clean
 as it is not to break old code.
 
 What's wrong with adding pg_cancel_backend(...) RETURNS int as an alias
 for the one that returns boolean, and document that it's deprecated and
 will be removed in the future.
 
 The same goes for Tom's timeofday() RETURNS text example.

We don't have the ability to have to functions that take the same
parameters and return different results because there is no facility to
decide which function to call based on what return value is expected,
because a simple query doesn't have a return value restriction:

SELECT pg_cancel_backend();

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] State of support for back PG branches

2005-09-30 Thread Jim C. Nasby

On Mon, Sep 26, 2005 at 09:07:45PM -0700, Joshua D. Drake wrote:
 
 A nice pg_upgrade utility would make a big difference. Clearly an
 in-place upgrade is possible, but maintaining is hard. There are two
 broad ways of running a pg_upgrade project - one that is entirely
 independent of the main codebase and one that puts requirements on the
 main codebase developers (if you change $foo you provide code to
 translate old $foo to new $foo). Any feel for the relative difficulty
 of the two approaches? And how much push-back there'd be on the latter?
  
 
 You can do in place upgrades with Slony-I and Mammoth Replicator.

With a lot more effort than a dump/restore, or presumably a pg_upgrade.
I'd love to see a project that uses Slony to do an in-place migration as
easy as possible. Maybe I'll get around to it in another 5 years
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread PFC




Bulk loading speed is irrelevant here - that is dominated by parsing,  
which

we have covered copiously (har har) previously and have sped up by 500%,
which still makes Postgres  1/2 the loading speed of MySQL.


Let's ask MySQL 4.0


LOAD DATA INFILE blah

0 errors, 666 warnings

SHOW WARNINGS;

not implemented. upgrade to 4.1

duh

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread Ron Peacetree

That 11MBps was your =bulk load= speed.  If just loading a table
is this slow, then there are issues with basic physical IO, not just
IO during sort operations.

As I said, the obvious candidates are inefficient physical layout
and/or flawed IO code.

Until the basic IO issues are addressed, we could replace the
present sorting code with infinitely fast sorting code and we'd
still be scrod performance wise.

So why does basic IO suck so badly?

Ron  


-Original Message-
From: Josh Berkus josh@agliodbs.com
Sent: Sep 30, 2005 1:23 PM
To: Ron Peacetree [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org, pgsql-performance@postgresql.org
Subject: Re: [HACKERS] [PERFORM] A Better External Sort?

Ron,

 Hmmm.
 60GB/5400secs= 11MBps.  That's ssllooww.  So the first
 problem is evidently our physical layout and/or HD IO layer
 sucks.

Actually, it's much worse than that, because the sort is only dealing 
with one column.  As I said, monitoring the iostat our top speed was 
2.2mb/s.

--Josh


---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] Request for a force interactive mode flag (-I) for psql

2005-09-30 Thread Bill Bartlett

Back in 2003 Bruce Momjian proposed adding a flag (-I) to psql to force
it into interactive mode.  (See
http://archives.postgresql.org/pgsql-hackers/2003-11/msg00013.php for
the thread.)  The proposal was rejected because there was no proven need
for it at that time. I'd like to raise this proposal again, since I
think in our situation, this is the only fix for our problem.

Our environment is as follows:
* A large number of PostgreSQL 7.4 servers running on a variety of SuSE
Linux (9.0 - 9.3) servers
* A growing number of PostgreSQL 8.0.3 servers running on Windows Server
2003 servers. (We are running the native Win32 version of PostgreSQL,
not the Cygwin version.)

The servers are all located in remote offices.  Maintenance is done
remotely via SSH to a local bash command prompt where we use the
command line tools: psql, pg_dump, etc.  On Linux we use the native
sshd; on Windows we use Cygwin to get bash, sshd, cron, etc.  On Linux
this works fine; on Windows, however, psql thinks it is not in an
interactive console so we get no prompts, no line editing, no history,
very little cursor control, etc. (I see that the missing autocomplete
feature is a readline issue, but that's a topic for another posting.)
This makes remote maintenance on the Windows servers much more difficult
than it otherwise could be.

The issue appears to be due to isatty() returning false in the Windows
Cygwin environment.  From other research around the web, it appears that
if the app is Cygwin-aware is knows to override this check (or allow
manual override) or try to do further testing, but in this case, since
we are using the native Win32 version of PostgreSQL, psql doesn't do any
additional testing.

Also, even in the local console on Windows, running rxvt -- our
preferred terminal in Windows (since it lets us make our Windows command
line act just like our Linux command line grin) --  causes psql to
think that there is no terminal. (Our first encounter of this no
terminal problem was in trying to run psql via a local bash shell via
rxvt, and originally we thought that psql was hanging. Given our
reliance on psql for remote maintenance, this would have prevented our
rolling out a Windows version of PostgreSQL. It was only after too much
time looking at it with some low-level tools that we stumbled across the
fact that psql was simply silently waiting at a command prompt rather
than being hung. However, from other posts in these lists [e.g.:
http://archives.postgresql.org/pgsql-patches/2004-07/msg00369.php ] it
appears that other people also thought psql was hanging when it was run
from a terminal program, so I suppose I shouldn't feel too bad...)

Bruce's proposal and suggested code simply added a new -I flag to
force psql into interactive mode (by simply setting pset.notty = 0).
From everything I can find (including reading through the Cygwin code,
ssh and sshd man pages and code, psql code, testing various modes of
running the sshd service in Windows, changing Windows profile account
permissions, etc.), adding this flag seems to be the only viable option
(other than writing a replacement for psql for our Windows servers,
something I'd prefer to not do). While I can obviously add this patch
myself and build and maintain a custom version of psql for our own use,
since there now is a true native version of PostgreSQL for Windows, I
think we will see a growing list of people supporting mixed environments
just like ours, and bumping into this exact same problem.

Thoughts?

- Bill

Bill Bartlett
meridianEMR, Inc.
http://www.meridianemr.com


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread Jignesh K. Shah


I have seen similar performance as Josh and my reasoning is as follows:

* WAL is the biggest bottleneck with its default size of 16MB. Many 
people hate to recompile the code   to change its default, and 
increasing checkpoint segments help but still there is lot of overhead 
in the rotation of WAL files (Even putting WAL on tmpfs shows that it is 
still slow). Having an option for bigger size is helpful to a small 
extent percentagewise (and frees up CPU a bit in doing file rotation)


* Growing files: Even though this is OS dependent but it does spend lot 
of time doing small 8K block increases to grow files. If we can signal 
bigger chunks to grow or pre-grow  to expected size of  data files 
that will help a lot in such cases.


* COPY command had restriction but that has been fixed to a large 
extent.(Great job)


But ofcourse I have lost touch with programming and can't begin to 
understand PostgreSQL code to change it myself.


Regards,
Jignesh




Ron Peacetree wrote:


That 11MBps was your =bulk load= speed.  If just loading a table
is this slow, then there are issues with basic physical IO, not just
IO during sort operations.

As I said, the obvious candidates are inefficient physical layout
and/or flawed IO code.

Until the basic IO issues are addressed, we could replace the
present sorting code with infinitely fast sorting code and we'd
still be scrod performance wise.

So why does basic IO suck so badly?

Ron  



-Original Message-
From: Josh Berkus josh@agliodbs.com
Sent: Sep 30, 2005 1:23 PM
To: Ron Peacetree [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org, pgsql-performance@postgresql.org
Subject: Re: [HACKERS] [PERFORM] A Better External Sort?

Ron,

 


Hmmm.
60GB/5400secs= 11MBps.  That's ssllooww.  So the first
problem is evidently our physical layout and/or HD IO layer
sucks.
   



Actually, it's much worse than that, because the sort is only dealing 
with one column.  As I said, monitoring the iostat our top speed was 
2.2mb/s.


--Josh


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly
 



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] State of support for back PG branches

2005-09-30 Thread Jim C. Nasby

On Mon, Sep 26, 2005 at 08:54:49PM -0700, Josh Berkus wrote:
 Tom,
 
 Or, as you say, we could take the viewpoint that there are commercial
 companies willing to take on the burden of supporting back releases, and
 the development community ought not spend its limited resources on doing
 that.  I'm hesitant to push that idea very hard myself, because it would
 look too much like I'm pushing the interests of my employer Red Hat
 ... but certainly there's a reasonable case to be made there.
 
 Well, I think you know my opinion on this.  Since there *are* commercial 
 companies available, I think we should use them to reduce back-patching 
 effort.   I suggest that our policy should be:  the community will patch 
 two old releases, and beyond that if it's convenient, but no promises. 
 In other words, when 8.1 comes out we'd be telling 7.3 users We'll be 
 patching this only where we can apply 7.4 patches.  Otherwise, better 
 get a support contract.

I agree, although I think there should be some time guarantees as well.
I like the ~3 year number that's been tossed around.

 Of course, a lot of this is up to individual initiative; if someone 
 fixes a patch so it applies back to 7.2, there's no reason not to make 
 it available.  However, there's no reason *you* should make it a priority.

Yeah, I hope that commercial interests can work together on supporting
things they want supported.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Database file compatability

2005-09-30 Thread Jim C. Nasby

On Wed, Sep 28, 2005 at 10:22:51AM -0400, Tom Lane wrote:
 Qingqing Zhou [EMAIL PROTECTED] writes:
  Tom Lane [EMAIL PROTECTED] wrote in message 
  There is a possible sequence like this:
 
  ALIGNOF_LONG4
  ALIGNOF_DOUBLE  8
  MAXIMUM_ALIGNOF 8
 
  vs.
 
  ALIGNOF_LONG8
  ALIGNOF_DOUBLE  8
  MAXIMUM_ALIGNOF 8
 
  So we should at least check ALIGNOF_LONG as well.
 
 No, we don't need to, because we do not really care about ALIGNOF_LONG
 per se.  We don't use long as an on-disk datatype, precisely because
 we don't know what size it is.  We use int32 and int64.  The former has
 align 4 on all machines AFAIK, and the latter has MAXIMUM_ALIGNOF.

Is there a serious penalty associated with just checking them all? Seems
like better safe than sorry...

On a related note, are checks for endianness made as well?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] PostgreSQL overall design

2005-09-30 Thread Jim C. Nasby

On Tue, Sep 27, 2005 at 07:00:14PM +0530, Gnanavel S wrote:
 On 9/27/05, Jonah H. Harris [EMAIL PROTECTED] wrote:
 
  Were you looking for a call graph?
 
 
 Yes. I want to know the list and sequence of files involved during a call.

Total non-coder question, but is there an open-source utility that's
capable of generating that? Seems like a useful piece of documentation
to have. Also seems like it'd be completely impractical to maintain by
hand.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] effective SELECT from child tables

2005-09-30 Thread Jim C. Nasby

On Wed, Sep 28, 2005 at 10:24:18PM +0200, Martijn van Oosterhout wrote:
 On Wed, Sep 28, 2005 at 11:00:46PM +0400, Ilia Kantor wrote:
  Your suggestion is essentially the same as mine.. 
  There exists tableoid, pretty much suited to tell between tables in the case
  of inheritance.. I can't see a real need to add a special class
  classifier to each table..
  This solution is a workaround. It will work, just can't make myself love it.
 
 I wonder if it would be possible to tweak the constraints exclusion
 code so that if it sees something of the form tableoid = X to exclude
 other tables...
 
 You know, assume each table has a constraint tableoid = OID.
 
 Still, it is a fairly unusual feature.

Well, it's possibly a good way to do list partitioning where you can
drop the partitioning key out of each partition, something I would love
to have for stats.distributed.net (I'm actually working on a project
that does exactly this with a UNION ALL view and rules...)
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] effective SELECT from child tables

2005-09-30 Thread Jim C. Nasby

On Wed, Sep 28, 2005 at 07:25:46PM +0100, Simon Riggs wrote:
 Include the Discriminator as a column in A and it will be inherited by
 all A1, A2, A3. 
 e.g. concrete_class   char(1) not null
snip
 This will add 1 byte per row in your superclass... and requires no

I thought char was actually stored variable-length...? I know there's a
type that actually acts like char does on most databases, but I can't
remember what it is off-hand (it should be mentioned in docs 8.3...)
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread Ron Peacetree

25MBps should not be a CPU bound limit for IO, nor should it be
an OS limit.  It should be something ~100x (Single channel RAM)
to ~200x (dual channel RAM) that.

For an IO rate of 25MBps to be pegging the CPU at 100%, the CPU
is suffering some combination of
A= lot's of cache misses (cache thrash), 
B= lot's of random rather than sequential IO (like pointer chasing)
C= lot's of wasteful copying
D= lot's of wasteful calculations

In fact, this is crappy enough performance that the whole IO layer
should be rethought and perhaps reimplemented from scratch.
Optimization of the present code is unlikely to yield a 100-200x
improvement.

On the HD side, the first thing that comes to mind is that DBs are
-NOT- like ordinary filesystems in a few ways:
1= the minimum HD IO is a record that is likely to be larger than
a HD sector.  Therefore, the FS we use should be laid out with
physical segments of max(HD sector size, record size)

2= DB files (tables) are usually considerably larger than any other
kind of files stored.  Therefore the FS we should use should be laid
out using LARGE physical pages.  64KB-256KB at a _minimum_.

3= The whole 2GB striping of files idea needs to be rethought.
Our tables are significantly different in internal structure from the
usual FS entity.

4= I'm sure we are paying all sorts of nasty overhead for essentially
emulating the pg filesystem inside another filesystem.  That means
~2x as much overhead to access a particular piece of data.   

The simplest solution is for us to implement a new VFS compatible
filesystem tuned to exactly our needs: pgfs.

We may be able to avoid that by some amount of hacking or
modifying of the current FSs we use, but I suspect it would be more
work for less ROI.

Ron 


-Original Message-
From: Josh Berkus josh@agliodbs.com
Sent: Sep 30, 2005 4:41 PM
To: Ron Peacetree [EMAIL PROTECTED]
Cc: pgsql-hackers@postgresql.org, pgsql-performance@postgresql.org
Subject: Re: [HACKERS] [PERFORM] A Better External Sort?

Ron,

 That 11MBps was your =bulk load= speed.  If just loading a table
 is this slow, then there are issues with basic physical IO, not just
 IO during sort operations.

Oh, yeah.  Well, that's separate from sort.  See multiple posts on this 
list from the GreenPlum team, the COPY patch for 8.1, etc.  We've been 
concerned about I/O for a while.  

Realistically, you can't do better than about 25MB/s on a single-threaded 
I/O on current Linux machines, because your bottleneck isn't the actual 
disk I/O.   It's CPU.   Databases which go faster than this are all, to 
my knowledge, using multi-threaded disk I/O.

(and I'd be thrilled to get a consistent 25mb/s on PostgreSQL, but that's 
another thread ... )

 As I said, the obvious candidates are inefficient physical layout
 and/or flawed IO code.

Yeah, that's what I thought too.   But try sorting an 10GB table, and 
you'll see: disk I/O is practically idle, while CPU averages 90%+.   We're 
CPU-bound, because sort is being really inefficient about something. I 
just don't know what yet.

If we move that CPU-binding to a higher level of performance, then we can 
start looking at things like async I/O, O_Direct, pre-allocation etc. that 
will give us incremental improvements.   But what we need now is a 5-10x 
improvement and that's somewhere in the algorithms or the code.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Found small issue with OUT params

2005-09-30 Thread Jim C. Nasby

On Fri, Sep 30, 2005 at 10:53:22AM -0700, Joshua D. Drake wrote:
 
 So you might notice little performance hit bringing back a million rows, 
 and most of these type of single OUT params functions only return one 
 row/value anyway.
 There would be zero perceivable difference in performance regardless of 
 the extra overhead for a single value/row.
 
 Sounds like we need a test case... up for it?

If there is a performance difference my vote is that we bite the bullet
for 8.1 and accept the performance hit rather than settle for
sub-optimal behavior. Much easier to fix the performance penalty down
the road than to fix the behavior.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Query in SQL statement

2005-09-30 Thread Jim C. Nasby

On Thu, Sep 29, 2005 at 09:28:38PM +0800, Christopher Kings-Lynne wrote:
 
 CREATE SEQUENCE ai_id;
 CREATE TABLE badusers (
   id int DEFAULT nextval('ai_id') NOT NULL,
   UserName varchar(30),
   Date  datetime DEFAULT '-00-00 00:00:00' NOT NULL,
   Reason varchar(200),
   Admin varchar(30) DEFAULT '-',
   PRIMARY KEY (id),
   KEY UserName (UserName),
   KEY Date (Date)
 );
 
 
 Am always getting foll. Errors,
 
 ERROR:  relation ai_id already exists
 ERROR:  syntax error at or near ( at character 240
 
 You have just copied the Mysql code to Postgresql.  It will in no way 
 work.  Your default for 'Date' is illegal in postgresql and hence it 
 must allow NULLs.  There is no such thing as a 'datetime' type.  There 
 is no such thing as 'Key'.  Also your mixed case identifiers won't be 
 preserved.  You want:
 
 CREATE TABLE badusers (
   id SERIAL PRIMARY KEY,
   UserName varchar(30),
   Date  timestamp,
   Reason varchar(200),
   Admin varchar(30) DEFAULT '-'
 );
 
 CREATE INDEX UserName_Idx ON badusers(Username);
 CREATE INDEX Date_Idx ON badusers(Date);

Actually, to preserve the case you can wrap everything in quotes:
CREATE ...
UserName varchar(30)

Of course that means that now you have to do that in every statement
that uses that field, too...

SELECT username FROM badusers
ERROR

SELECT UserName FROM badusers
bad user

I suggest ditching the CamelCase and going with underline_seperators.
I'd also not use the bareword id, instead using bad_user_id. And I'd
name the table bad_user. But that's just me. :)
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread Dann Corbit

I have perused the tuple sort stuff.

The good:
The documentation of the sort algorithm from Knuth's TAOCP was
beautifully done.  Everyone who writes an algorithm should credit the
original source like this, and also where it deviates.  That was done
very nicely.

The bad:
With random access, tape style merging is not necessary.  A priority
queue based merge will be a lot faster.

The UGLY:
Please, someone, tell me I was hallucinating.  Is that code really
READING AND WRITING THE WHOLE TUPLE with every sort exchange?!  Maybe
there is a layer of abstraction that I am somehow missing.  I just can't
imagine that is what it is really doing.  If (somehow) it really is
doing that, a pointer based sort which forms a permutation based upon
the keys, would be a lot better.

The fundamental algorithm itself could also be improved somewhat.

Here is a {public domain} outline for an introspective quick sort that
Pete Filander and I wrote some time ago and contributed to FastDB.  It
is written as a C++ template, but it will take no effort to make it a
simple C routine.  It assumes that e_type has comparison operators, so
in C you would use a compare function instead.

/*
** Sorting stuff by Dann Corbit and Pete Filandr.
** ([EMAIL PROTECTED] and [EMAIL PROTECTED])
** Use it however you like.
*/

//
//  The insertion sort template is used for small partitions.
//

template  class e_type 
void insertion_sort(e_type * array, size_t nmemb)
{
e_type temp,
  *last,
  *first,
  *middle;
if (nmemb  1) {
first = middle = 1 + array;
last = nmemb - 1 + array;
while (first != last) {
++first;
if ((*(middle)  *(first))) {
middle = first;
}
}
if ((*(array)  *(middle))) {
((void) ((temp) = *(array), *(array) = *(middle), *(middle)
= (temp)));
}
++array;
while (array != last) {
first = array++;
if ((*(first)  *(array))) {
middle = array;
temp = *middle;
do {
*middle-- = *first--;
} while ((*(first)  *(temp)));
*middle = temp;
}
}
}
}

//
// The median estimate is used to choose pivots for the quicksort
algorithm
//

template  class e_type 
void median_estimate(e_type * array, size_t n)
{
e_type  temp;
long unsigned   lu_seed = 123456789LU;
const size_tk = ((lu_seed) = 69069 * (lu_seed) + 362437) % --n;
((void) ((temp) = *(array), *(array) = *(array + k), *(array + k) =
(temp)));
if ((*((array + 1))  *((array {
(temp) = *(array + 1);
if ((*((array + n))  *((array {
*(array + 1) = *(array);
if ((*((temp))  *((array + n {
*(array) = *(array + n);
*(array + n) = (temp);
} else {
*(array) = (temp);
}
} else {
*(array + 1) = *(array + n);
*(array + n) = (temp);
}
} else {
if ((*((array))  *((array + n {
if ((*((array + 1))  *((array + n {
(temp) = *(array + 1);
*(array + 1) = *(array + n);
*(array + n) = *(array);
*(array) = (temp);
} else {
((void) (((temp)) = *((array)), *((array)) = *((array +
n)), *((array + n)) = ((temp;
}
}
}
}


//
// This is the heart of the quick sort algorithm used here.
// If the sort is going quadratic, we switch to heap sort.
// If the partition is small, we switch to insertion sort.
//

template  class e_type 
void qloop(e_type * array, size_t nmemb, size_t d)
{
e_type temp,
  *first,
  *last;
while (nmemb  50) {
if (sorted(array, nmemb)) {
return;
}
if (!d--) {
heapsort(array, nmemb);
return;
}
median_estimate(array, nmemb);
first = 1 + array;
last = nmemb - 1 + array;
do {
++first;
} while ((*(array)  *(first)));
do {
--last;
} while ((*(last)  *(array)));
while (last  first) {
((void) ((temp) = *(last), *(last) = *(first), *(first) =
(temp)));
do {
++first;
} while ((*(array)  *(first)));
do {
--last;
} while ((*(last)  *(array)));
}
((void) ((temp) = *(array), *(array) = *(last), *(last) =
(temp)));
qloop(last + 1, nmemb - 1 + array - last, d);
nmemb = last - array;
}
insertion_sort(array, nmemb);
}

//
// This heap sort is better than average because it uses Lamont's heap.
//

template  class e_type 
void heapsort(e_type * array, size_t nmemb)
{
size_t i,
   child,
   parent;
e_type temp;
if (nmemb  1) {
i = --nmemb /

Re: [HACKERS] PCTFree Results

2005-09-30 Thread Jim C. Nasby

On Thu, Sep 29, 2005 at 11:05:47AM -0400, Jonah H. Harris wrote:
 Has there been any movement on this? If not, I finally have some time to
 look at it.

Well, here's the original thread:
http://archives.postgresql.org/pgsql-hackers/2005-08/msg00637.php

I think the problem is that the testing isn't going to show us much (if
anything) based on applying PCTFREE across all tables. It only makes
sense for tables that see a lot of updates, or Index Organized Tables
(to use Oracle vernacular) with indexes spread throughout the table (I
don't think there's any other way to get indexes forcibly spread
throughout a table, and in any case it doesn't matter for PostgreSQL
right now).

So, the NOTPM numbers are probably pretty meaningless. One of the other
metrics that's essentially just updating might be more interesting.

Josh mentioned that there were additional stats produced by the patch,
but the URL seems to be down so I can't go and check. :(

It would be really useful if someone with a real-life heavy-update
workload stepped up.

I'm going to cc Satoshi; maybe he's not on the list.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Open items list for 8.1

2005-09-30 Thread Jim C. Nasby

On Fri, Sep 30, 2005 at 06:58:05PM -0400, Bruce Momjian wrote:
 We don't have the ability to have to functions that take the same
 parameters and return different results because there is no facility to
 decide which function to call based on what return value is expected,
 because a simple query doesn't have a return value restriction:
 
   SELECT pg_cancel_backend();

Duh, I keep forgetting that.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] \d on database with a lot of tables is slow

2005-09-30 Thread Jon Jensen


On Fri, 30 Sep 2005, Jim C. Nasby wrote:


ERROR:  cache lookup failed for relation 1906465919

It is on an exact table name.  When we retry the describe on a failure,
sometimes it works and sometimes it fails again.  When it fails again
the relation number is different. Also, \d schema.table always returns
quickly and never errors. \d table is slow and produces the error fairly
often.

They're using 8.0.x; I'm pretty certain it's 8.0.3.


We have the exact same problem on a 7.4.5 database. Some basic info on the 
database: psql's \d returns 424 rows, and the on-disk size of the database 
is about 11 GB. A standalone \dt throws the same occasional error.


I've suspected that it may be caused by ongoing periodic (at least once an 
hour) rebuilding of certain generated tables with TRUNCATE and then INSERT 
INTO the_table SELECT ... inside a transaction. But I don't have any proof 
of that; it's just the most obvious different thing going on compared to 
other databases we have.


It does seem like I've encountered the error less often since increasing 
the max_fsm_pages setting and thus had more effective VACUUM and less 
pg_class bloat, but OTOH I trained myself not to do \d there very often 
either, since it was so slow and failed so often, so that may be 
coincidence. :)


Jon

--
Jon Jensen
End Point Corporation
http://www.endpoint.com/
Software development with Interchange, Perl, PostgreSQL, Apache, Linux, ...

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Found small issue with OUT params

2005-09-30 Thread Bruce Momjian

Jim C. Nasby wrote:
 On Fri, Sep 30, 2005 at 10:53:22AM -0700, Joshua D. Drake wrote:
  
  So you might notice little performance hit bringing back a million rows, 
  and most of these type of single OUT params functions only return one 
  row/value anyway.
  There would be zero perceivable difference in performance regardless of 
  the extra overhead for a single value/row.
  
  Sounds like we need a test case... up for it?
 
 If there is a performance difference my vote is that we bite the bullet
 for 8.1 and accept the performance hit rather than settle for
 sub-optimal behavior. Much easier to fix the performance penalty down
 the road than to fix the behavior.

Yes, it seems this is the concensus.  I have added it to the open items
list:

have a single OUT parameter return the parameter name, not function name

---


   PostgreSQL 8.1 Open Items
   =

Current version at http://candle.pha.pa.us/cgi-bin/pgopenitems or
from http://www.postgresql.org/developer/beta.

Bugs

fix pg_dump --clean for roles
fix foreign trigger timing issue
fix ALTER SCHEMA RENAME for sequence name binding, or remove
improve spinlock performance
fix semantic issues of granted permissions in roles
fix pgxs for spaces in file names
have a single OUT parameter return the parameter name, not function name

Questions
-
cosider O_SYNC as default when O_DIRECT exists
/contrib move to pgfoundry
pgindent?
make sure bitmap scan optimizer settings are reasonable

Documentation
-


-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Found small issue with OUT params

2005-09-30 Thread Mike Rylander

On 9/30/05, Jim C. Nasby [EMAIL PROTECTED] wrote:
 On Fri, Sep 30, 2005 at 10:53:22AM -0700, Joshua D. Drake wrote:
  
  So you might notice little performance hit bringing back a million rows,
  and most of these type of single OUT params functions only return one
  row/value anyway.
  There would be zero perceivable difference in performance regardless of
  the extra overhead for a single value/row.
 
  Sounds like we need a test case... up for it?

 If there is a performance difference my vote is that we bite the bullet
 for 8.1 and accept the performance hit rather than settle for
 sub-optimal behavior. Much easier to fix the performance penalty down
 the road than to fix the behavior.

What about just returning the single OUT value named by the parameter,
instead of special casing single-OUT functions?  If I understand
correctly, Tom has just added a test to make single-OUT functions look
like RETURNS functions.  If that were removed then we'd have what, at
least by counting the responses on this thread, seems to be the
desired (and expected) behaviour.

Or I could just be misunderstanding the implementation again.

--
Mike Rylander
[EMAIL PROTECTED]
GPLS -- PINES Development
Database Developer
http://open-ils.org

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] On Logging

2005-09-30 Thread David Fetter

On Fri, Sep 30, 2005 at 05:54:49PM -0500, Jim C. Nasby wrote:
 On Mon, Sep 26, 2005 at 10:57:54AM -0700, Ron Mayer wrote:
  David Fetter wrote:
  ...log file formats in 8.0
  
  * CSV
  * YAML
  * XML
  * Piped logs, as Apache can do
  * DB handle.  I know this one will be controversial.
  [...]
  1.  Am I the only one who would wants an option for
  machine-readable logs?
  
  I'd very much like a format that can be easily loaded into a
  database (not necessarily the same one producing the logs :-) ) in
  real time and/or be visible as a table through something like
  dbi-link.
  
  I suppose any of the first three formats you suggest could work
  with dbi-link; or another alternate format
* sql insert statements
  would work if piped logs were supported by sending it to psql.
 
 Apache seems to have the best, most flexible logging of anything out
 there, and should probably be used as a model.  It's pretty easy to
 have it actually log to a database.

Great :)

 Whatever method we decide on, I think it would be very useful if we
 supported multiple logging streams.  I certainly wouldn't want to
 give up a human-readable log to get a CSV one.

Excellent idea.

 Is a logging mechanism the best way to do profiling?  Seems like it
 might be better to have a more efficient, dedicated method.

I'm not totally confident in my ability to think up everything I'd
want to look at, every way I'd want to look at it in advance.  I'm
pretty sure I can't come up with every way everyone could want to do
profiling, though.

 But I'm not against adding capabilities like per-backend logging,
 etc.

How would per-backend logging work?

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread Gregory Maxwell

On 9/30/05, Ron Peacetree [EMAIL PROTECTED] wrote:
 4= I'm sure we are paying all sorts of nasty overhead for essentially
 emulating the pg filesystem inside another filesystem.  That means
 ~2x as much overhead to access a particular piece of data.

 The simplest solution is for us to implement a new VFS compatible
 filesystem tuned to exactly our needs: pgfs.

 We may be able to avoid that by some amount of hacking or
 modifying of the current FSs we use, but I suspect it would be more
 work for less ROI.

On this point, Reiser4 fs already implements a number of things which
would be desirable for PostgreSQL. For example: write()s to reiser4
filesystems are atomic, so there is no risk of torn pages (this is
enabled because reiser4 uses WAFL like logging where data is not
overwritten but rather relocated). The filesystem is modular and
extensible so it should be easy to add whatever additional semantics
are needed.  I would imagine that all that would be needed is some
more atomicity operations (single writes are already atomic, but I'm
sure it would be useful to batch many writes into a transaction),some
layout and packing controls, and some flush controls.  A step further
would perhaps integrate multiversioning directly into the FS (the
wandering logging system provides the write side of multiversioning, a
little read side work would be required.). More importantly: the file
system was intended to be extensible for this sort of application.

It might make a good 'summer of code' project for someone next year,
... presumably by then reiser4 will have made it into the mainline
kernel by then. :)

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread Gregory Maxwell

On 9/28/05, Ron Peacetree [EMAIL PROTECTED] wrote:
 2= We use my method to sort two different tables.  We now have these
 very efficient representations of a specific ordering on these tables.  A
 join operation can now be done using these Btrees rather than the
 original data tables that involves less overhead than many current
 methods.

If we want to make joins very fast we should implement them using RD
trees. For the example cases where a join against a very large table
will produce a much smaller output, a RD tree will provide pretty much
the optimal behavior at a very low memory cost.

On the subject of high speed tree code for in-core applications, you
should check out http://judy.sourceforge.net/ . The performance
(insert, remove, lookup, AND storage) is really quite impressive.
Producing cache friendly code is harder than one might expect, and it
appears the judy library has already done a lot of the hard work. 
Though it is *L*GPLed, so perhaps that might scare some here away from
it. :) and good luck directly doing joins with a LC-TRIE. ;)

---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] Expression index ignores column statistics target

2005-09-30 Thread Michael Fuhr

I've noticed that row count estimates for expression indexes appear
to rely on default_statistics_target rather than on a column's
actual statistics target.  That is, if I use ALTER TABLE SET
STATISTICS to increase a column's statistics target and then run
ANALYZE, then estimates for non-expression-index queries improve
as expected.  However, queries that use an expression index remain
accurate for only around the N most common values, where N is the
default_statistics_target that was in effect when ANALYZE ran.  I'm
still rummaging through the archives looking for past discussion;
is this behavior a known limitation or just an oversight?

CREATE TABLE foo (x integer);

CREATE INDEX foo_x_idx ON foo (x);
CREATE INDEX foo_abs_x_idx ON foo (abs(x));

INSERT INTO foo (x)
  SELECT r1 % r2
  FROM generate_series(1, 100) AS g1(r1),
   generate_series(1, 100) AS g2(r2);

SET default_statistics_target TO 15;
ALTER TABLE foo ALTER COLUMN x SET STATISTICS 20;
ANALYZE foo;

SELECT most_common_vals FROM pg_stats WHERE attname = 'x';
  most_common_vals   
-
 {0,1,2,3,4,5,6,7,8,10,9,11,12,14,13,15,16,19,17,18}
(1 row)

EXPLAIN ANALYZE SELECT * FROM foo WHERE x = 13;
  QUERY PLAN
  
--
 Bitmap Heap Scan on foo  (cost=2.72..50.28 rows=205 width=4) (actual 
time=0.370..1.766 rows=220 loops=1)
   Recheck Cond: (x = 13)
   -  Bitmap Index Scan on foo_x_idx  (cost=0.00..2.72 rows=205 width=0) 
(actual time=0.314..0.314 rows=220 loops=1)
 Index Cond: (x = 13)
 Total runtime: 2.905 ms
(5 rows)

EXPLAIN ANALYZE SELECT * FROM foo WHERE abs(x) = 13;
QUERY PLAN  
  
--
 Bitmap Heap Scan on foo  (cost=2.72..50.80 rows=205 width=4) (actual 
time=0.358..1.720 rows=220 loops=1)
   Recheck Cond: (abs(x) = 13)
   -  Bitmap Index Scan on foo_abs_x_idx  (cost=0.00..2.72 rows=205 width=0) 
(actual time=0.305..0.305 rows=220 loops=1)
 Index Cond: (abs(x) = 13)
 Total runtime: 2.875 ms
(5 rows)

EXPLAIN ANALYZE SELECT * FROM foo WHERE x = 18;
  QUERY PLAN
  
--
 Bitmap Heap Scan on foo  (cost=2.60..49.75 rows=172 width=4) (actual 
time=0.312..1.442 rows=180 loops=1)
   Recheck Cond: (x = 18)
   -  Bitmap Index Scan on foo_x_idx  (cost=0.00..2.60 rows=172 width=0) 
(actual time=0.262..0.262 rows=180 loops=1)
 Index Cond: (x = 18)
 Total runtime: 2.393 ms
(5 rows)

EXPLAIN ANALYZE SELECT * FROM foo WHERE abs(x) = 18;
   QUERY PLAN   
 
-
 Bitmap Heap Scan on foo  (cost=2.22..43.65 rows=63 width=4) (actual 
time=0.313..1.436 rows=180 loops=1)
   Recheck Cond: (abs(x) = 18)
   -  Bitmap Index Scan on foo_abs_x_idx  (cost=0.00..2.22 rows=63 width=0) 
(actual time=0.263..0.263 rows=180 loops=1)
 Index Cond: (abs(x) = 18)
 Total runtime: 2.418 ms
(5 rows)

-- 
Michael Fuhr

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Expression index ignores column statistics target

2005-09-30 Thread Bruce Momjian


This is expected.  The main TODO items is:

* Allow accurate statistics to be collected on indexes with more than
  one column or expression indexes, perhaps using per-index statistics

Basically, we don't have multi-column or expression statistics.  ANALYZE
just analyzes columns, even if an expression index exists.

---

Michael Fuhr wrote:
 I've noticed that row count estimates for expression indexes appear
 to rely on default_statistics_target rather than on a column's
 actual statistics target.  That is, if I use ALTER TABLE SET
 STATISTICS to increase a column's statistics target and then run
 ANALYZE, then estimates for non-expression-index queries improve
 as expected.  However, queries that use an expression index remain
 accurate for only around the N most common values, where N is the
 default_statistics_target that was in effect when ANALYZE ran.  I'm
 still rummaging through the archives looking for past discussion;
 is this behavior a known limitation or just an oversight?
 
 CREATE TABLE foo (x integer);
 
 CREATE INDEX foo_x_idx ON foo (x);
 CREATE INDEX foo_abs_x_idx ON foo (abs(x));
 
 INSERT INTO foo (x)
   SELECT r1 % r2
   FROM generate_series(1, 100) AS g1(r1),
generate_series(1, 100) AS g2(r2);
 
 SET default_statistics_target TO 15;
 ALTER TABLE foo ALTER COLUMN x SET STATISTICS 20;
 ANALYZE foo;
 
 SELECT most_common_vals FROM pg_stats WHERE attname = 'x';
   most_common_vals   
 -
  {0,1,2,3,4,5,6,7,8,10,9,11,12,14,13,15,16,19,17,18}
 (1 row)
 
 EXPLAIN ANALYZE SELECT * FROM foo WHERE x = 13;
   QUERY PLAN  
 
 --
  Bitmap Heap Scan on foo  (cost=2.72..50.28 rows=205 width=4) (actual 
 time=0.370..1.766 rows=220 loops=1)
Recheck Cond: (x = 13)
-  Bitmap Index Scan on foo_x_idx  (cost=0.00..2.72 rows=205 width=0) 
 (actual time=0.314..0.314 rows=220 loops=1)
  Index Cond: (x = 13)
  Total runtime: 2.905 ms
 (5 rows)
 
 EXPLAIN ANALYZE SELECT * FROM foo WHERE abs(x) = 13;
 QUERY PLAN
 
 --
  Bitmap Heap Scan on foo  (cost=2.72..50.80 rows=205 width=4) (actual 
 time=0.358..1.720 rows=220 loops=1)
Recheck Cond: (abs(x) = 13)
-  Bitmap Index Scan on foo_abs_x_idx  (cost=0.00..2.72 rows=205 width=0) 
 (actual time=0.305..0.305 rows=220 loops=1)
  Index Cond: (abs(x) = 13)
  Total runtime: 2.875 ms
 (5 rows)
 
 EXPLAIN ANALYZE SELECT * FROM foo WHERE x = 18;
   QUERY PLAN  
 
 --
  Bitmap Heap Scan on foo  (cost=2.60..49.75 rows=172 width=4) (actual 
 time=0.312..1.442 rows=180 loops=1)
Recheck Cond: (x = 18)
-  Bitmap Index Scan on foo_x_idx  (cost=0.00..2.60 rows=172 width=0) 
 (actual time=0.262..0.262 rows=180 loops=1)
  Index Cond: (x = 18)
  Total runtime: 2.393 ms
 (5 rows)
 
 EXPLAIN ANALYZE SELECT * FROM foo WHERE abs(x) = 18;
QUERY PLAN 

 -
  Bitmap Heap Scan on foo  (cost=2.22..43.65 rows=63 width=4) (actual 
 time=0.313..1.436 rows=180 loops=1)
Recheck Cond: (abs(x) = 18)
-  Bitmap Index Scan on foo_abs_x_idx  (cost=0.00..2.22 rows=63 width=0) 
 (actual time=0.263..0.263 rows=180 loops=1)
  Index Cond: (abs(x) = 18)
  Total runtime: 2.418 ms
 (5 rows)
 
 -- 
 Michael Fuhr
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Expression index ignores column statistics target

2005-09-30 Thread Tom Lane

Michael Fuhr [EMAIL PROTECTED] writes:
 I've noticed that row count estimates for expression indexes appear
 to rely on default_statistics_target rather than on a column's
 actual statistics target.  That is, if I use ALTER TABLE SET
 STATISTICS to increase a column's statistics target and then run
 ANALYZE, then estimates for non-expression-index queries improve
 as expected.  However, queries that use an expression index remain
 accurate for only around the N most common values, where N is the
 default_statistics_target that was in effect when ANALYZE ran.

The code does in fact honor per-column statistics targets attached to
expression indexes, viz

alter table myfuncindex alter column pg_expression_1 set statistics 100;

This isn't documented, mainly because pg_dump doesn't promise to dump
such things, which it doesn't do because I didn't want to see the
pg_expression_N naming for expression index columns become graven on
stone tablets.  I seem to recall bringing up the question of whether
we could find a less implementation-specific way of commanding this
behavior, but I can't find it in the archives right now.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Expression index ignores column statistics target

2005-09-30 Thread Michael Fuhr

On Fri, Sep 30, 2005 at 11:59:26PM -0400, Bruce Momjian wrote:
 This is expected.  The main TODO items is:
   
   * Allow accurate statistics to be collected on indexes with more than
 one column or expression indexes, perhaps using per-index statistics
 
 Basically, we don't have multi-column or expression statistics.  ANALYZE
 just analyzes columns, even if an expression index exists.

But the row count estimates imply that expression index queries do
use column statistics, presumably as a proxy in the absence of
expression statistics.  This looks like a relevant commit:

http://archives.postgresql.org/pgsql-committers/2004-02/msg00124.php

The behavior I observed is that the planner does appear to use
column statistics when planning an expression index query, but it
doesn't appear to honor a column's non-default statistics target.
In other words:

* Row count estimates for expression index queries (at least simple
  ones) are reasonably accurate for the N most common column values,
  where N is the value of default_statistics_target when ANALYZE
  was run.

* Specifically setting the column's statistics target with ALTER
  TABLE SET STATISTICS doesn't result in better statistics for
  expression index queries.

That difference in behavior seems odd: if default_statistics_target
has an effect, why doesn't ALTER TABLE SET STATISTICS?

-- 
Michael Fuhr

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Expression index ignores column statistics target

2005-09-30 Thread Michael Fuhr

On Sat, Oct 01, 2005 at 12:53:03AM -0400, Tom Lane wrote:
 The code does in fact honor per-column statistics targets attached to
 expression indexes, viz
 
 alter table myfuncindex alter column pg_expression_1 set statistics 100;

Aha -- that's the piece I didn't know about.  I was wondering where
those statistics were being stored, since they were affected by
default_statistics_target but not by per-column statistics targets.
And now I see them when I don't restrict queries against pg_stats
by just the table or column name.  Thanks.

-- 
Michael Fuhr

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-30 Thread Dann Corbit

Judy definitely rates a WOW!!

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:pgsql-hackers-
 [EMAIL PROTECTED] On Behalf Of Gregory Maxwell
 Sent: Friday, September 30, 2005 7:07 PM
 To: Ron Peacetree
 Cc: Jeffrey W. Baker; pgsql-hackers@postgresql.org; pgsql-
 [EMAIL PROTECTED]
 Subject: Re: [HACKERS] [PERFORM] A Better External Sort?

 On 9/28/05, Ron Peacetree [EMAIL PROTECTED] wrote:
  2= We use my method to sort two different tables.  We now have these
  very efficient representations of a specific ordering on these
tables.
 A
  join operation can now be done using these Btrees rather than the
  original data tables that involves less overhead than many current
  methods.

 If we want to make joins very fast we should implement them using RD
 trees. For the example cases where a join against a very large table
 will produce a much smaller output, a RD tree will provide pretty much
 the optimal behavior at a very low memory cost.

 On the subject of high speed tree code for in-core applications, you
 should check out http://judy.sourceforge.net/ . The performance
 (insert, remove, lookup, AND storage) is really quite impressive.
 Producing cache friendly code is harder than one might expect, and it
 appears the judy library has already done a lot of the hard work.
 Though it is *L*GPLed, so perhaps that might scare some here away from
 it. :) and good luck directly doing joins with a LC-TRIE. ;)

 ---(end of
broadcast)---
 TIP 6: explain analyze is your friend

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

70 matches

Mail list logo