Re: [HACKERS] ScanKey representation for RowCompare index

2006-01-16 Thread Simon Riggs
On Sun, 2006-01-15 at 18:03 -0500, Tom Lane wrote:
 There's one nontrivial decision still to make about how to implement
 proper per-spec row-comparison operations, namely: how a row
 comparison ought to be represented in the index access method API.

I'm not sure I understand why. Surely a row wise comparison can take
advantage of indexes on some of its columns? When would it be
advantageous to create an index on the whole row anyway? Wouldn't the
key likely be excessively long? So why would we support this at all?

 I'm currently leaning to plan B, on the grounds that:
 
 1. It would require no changes in _bt_checkkeys(), which is the only
 user of the data structure that is particularly performance-critical.
 With plan A we'd be adding at least a few cycles to _bt_checkkeys in
 all cases.  Plan B avoids that at the cost of an extra level of function
 call to do a row comparison.  Given the relative frequency of the two
 sorts of index conditions, this has to be the better tradeoff to make.
 
 2. There's quite a bit of logic in btree indexscan setup that would find
 it convenient to treat a row comparison as if it were a single condition
 on just its leftmost column.  This may end up being nearly a wash, but
 I think that plan A would make the setup code a shade more complex than
 plan B would.  In particular, the rules about valid orderings of the
 ScanKey entries would be complicated under plan A, whereas under plan
 B it's still clear where everything belongs.
 
 Any thoughts?  Anyone see a good plan C, or a serious flaw that I'm
 missing in either of these ideas?

That looks sound. 

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

2006-01-16 Thread Simon Riggs
On Mon, 2006-01-16 at 00:07 -0500, Rod Taylor wrote:
 A couple of days ago I found myself wanting to aggregate 3 Billion
 tuples down to 100 Million tuples based on an integer key with six
 integer values -- six sum()'s.
 
 PostgreSQL ran out of memory with its Hash Aggregator and doing an old
 style Sort  Sum took a fair amount of time to complete (cancelled the
 process after 24 hours -- small machine).

 Spilling to disk would be nice but I suspect the obvious method would
 thrash quite badly with non-sorted input.

There is already hash table overflow (spill to disk) logic in HashJoins,
so this should be possible by reusing that code for HashAggs. That's on
my todo list, but I'd welcome any assistance.

A question: Are the rows in your 3 B row table clumped together based
upon the 100M row key? (or *mostly* so) We might also be able to
pre-aggregate the rows using a plan like
HashAgg
SortedAgg
or
SortedAgg
Sort
SortedAgg

The first SortedAgg seems superfluous, buy would reduce the row volume
considerably if incoming rows were frequently naturally adjacent, even
if the values were not actually sorted. (This could also be done during
sorting, but its much easier to slot the extra executor step into the
plan). That might then reduce the size of the later sort, or allow it to
become a HashAgg.

I could make that manually enabled using enable_pre_agg to allow us to
measure the effectiveness of that technique and decide what cost model
we'd use to make it automatic. Would that help?

 I've written something similar using a client and COPY with temporary
 tables. Even with the Export/Import copy I still beat the SortSum
 method PostgreSQL falls back to.

You can get round this now by chopping the larger table into pieces with
a WHERE clause and then putting them back together with a UNION. If the
table is partitioned, then do this by partitions.

This should also help when it comes to recalculating the sums again in
the future, since you'll only need to rescan the rows that have been
added since the last summation.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Coding standards? Recommendations?

2006-01-16 Thread Simon Riggs
On Sun, 2006-01-15 at 11:14 -0500, korry wrote:
 I've noticed a variety of coding styles in the PostgreSQL source code.  In 
 particular, I see a mix of naming conventions.  Some variables use camelCase 
 (or CamelCase), others use under_score_style.

I just follow the style of the module I'm working in.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] source documentation tool doxygen

2006-01-16 Thread Joachim Wieland
I've created a browsable source tree documentation, it's done with the
doxygen tool.

http://www.mcknight.de/pgsql-doxygen/cvshead/html/

There was a discussion about this some time ago, Jonathan Gardner proposed
it here:

http://archives.postgresql.org/pgsql-hackers/2004-03/msg00748.php

quite a few people found it useful but it somehow got stuck. The reason was
apparently that you'd have to tweak your comments in order to profit from
it as much as possible.

However I still think it's a nice-to-have among the online documentation
and it gives people quite new to the code the possibility to easily check
what is where defined and how... What do you think?

doxygen can also produce a pdf but I haven't succeeded in doing that so far,
pdflatex keeps bailing out. Has anybody else succeeded building this yet?



Joachim

-- 
Joachim Wieland  [EMAIL PROTECTED]
C/ Usandizaga 12 1°B   ICQ: 37225940
20002 Donostia / San Sebastian (Spain) GPG key available

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] source documentation tool doxygen

2006-01-16 Thread Thomas Hallgren
I wish I've had this when I started working with PostgreSQL. This looks really good. Very 
useful indeed, even without the comments. What kind of changes are needed in order to get 
the comments in?


Regards,
Thomas Hallgren

Joachim Wieland wrote:

I've created a browsable source tree documentation, it's done with the
doxygen tool.

http://www.mcknight.de/pgsql-doxygen/cvshead/html/

There was a discussion about this some time ago, Jonathan Gardner proposed
it here:

http://archives.postgresql.org/pgsql-hackers/2004-03/msg00748.php

quite a few people found it useful but it somehow got stuck. The reason was
apparently that you'd have to tweak your comments in order to profit from
it as much as possible.

However I still think it's a nice-to-have among the online documentation
and it gives people quite new to the code the possibility to easily check
what is where defined and how... What do you think?

doxygen can also produce a pdf but I haven't succeeded in doing that so far,
pdflatex keeps bailing out. Has anybody else succeeded building this yet?



Joachim




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] source documentation tool doxygen

2006-01-16 Thread Kim Bisgaard

Try following the link (the Doxygen icon) - it has both a tutorial and
extensive doc.

Regards,
Kim Bisgaard

Thomas Hallgren wrote:

I wish I've had this when I started working with PostgreSQL. This 
looks really good. Very useful indeed, even without the comments. What 
kind of changes are needed in order to get the comments in?


Regards,
Thomas Hallgren

Joachim Wieland wrote:


I've created a browsable source tree documentation, it's done with the
doxygen tool.

http://www.mcknight.de/pgsql-doxygen/cvshead/html/

There was a discussion about this some time ago, Jonathan Gardner 
proposed

it here:

http://archives.postgresql.org/pgsql-hackers/2004-03/msg00748.php

quite a few people found it useful but it somehow got stuck. The 
reason was
apparently that you'd have to tweak your comments in order to profit 
from

it as much as possible.

However I still think it's a nice-to-have among the online documentation
and it gives people quite new to the code the possibility to easily 
check

what is where defined and how... What do you think?

doxygen can also produce a pdf but I haven't succeeded in doing that 
so far,
pdflatex keeps bailing out. Has anybody else succeeded building this 
yet?




Joachim




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] source documentation tool doxygen

2006-01-16 Thread Andrew Dunstan
Thomas Hallgren said:
 I wish I've had this when I started working with PostgreSQL. This looks
 really good. Very  useful indeed, even without the comments. What kind
 of changes are needed in order to get  the comments in?


I too have done this. But retrofitting Doxygen style comments to the
PostgreSQL source code would be a big undertaking. Maintaining it, which
would be another task for reviewers/committers, would also be a pain unless
there were some automated checking tool.

cheers

andrew



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] [PATCH] Better way to check for getaddrinfo function.

2006-01-16 Thread R, Rajesh (STSD)
Title: [PATCH] Better way to check for getaddrinfo function.








Just thought that the following patch might improve checking for getaddrinfo function (in configure.in)

I was forced to write 'coz getaddrinfo went unnoticed in Tru64 Unix.


(displaying attached patch)


$ diff -r configure.in configure.in.1

920c920,944

 AC_REPLACE_FUNCS([getaddrinfo])

---

 AC_CACHE_CHECK([for getaddrinfo], ac_cv_func_getaddrinfo,

 [AC_TRY_LINK([#include netdb.h],

 [struct addrinfo *g,h;g=h;getaddrinfo(,,g,g);],

 AC_TRY_RUN([

 #include assert.h

 #include netdb.h

 #include sys/types.h

 #ifndef AF_INET

 # include sys/socket.h

 #endif

 #ifdef __cplusplus

 extern C

 #endif

 char (*f) ();

 int main(void) {



 f = getaddrinfo;



 return 0;

 }

 ],ac_cv_func_getaddrinfo=yes, ac_cv_func_getaddrinfo=no),

 ac_cv_func_getaddrinfo=no)])

 if test $ac_cv_func_getaddrinfo = yes; then

 AC_DEFINE(HAVE_GETADDRINFO,1,[Define if you have the getaddrinfo function])

 fi




Rajesh R

--

This space intentionally left non-blank. 

 configure-in.patch 





configure-in.patch
Description: configure-in.patch

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

2006-01-16 Thread Rod Taylor
On Mon, 2006-01-16 at 08:32 +, Simon Riggs wrote:
 On Mon, 2006-01-16 at 00:07 -0500, Rod Taylor wrote:
  A couple of days ago I found myself wanting to aggregate 3 Billion
  tuples down to 100 Million tuples based on an integer key with six
  integer values -- six sum()'s.
  
  PostgreSQL ran out of memory with its Hash Aggregator and doing an old
  style Sort  Sum took a fair amount of time to complete (cancelled the
  process after 24 hours -- small machine).
 
  Spilling to disk would be nice but I suspect the obvious method would
  thrash quite badly with non-sorted input.
 
 There is already hash table overflow (spill to disk) logic in HashJoins,
 so this should be possible by reusing that code for HashAggs. That's on
 my todo list, but I'd welcome any assistance.

 A question: Are the rows in your 3 B row table clumped together based
 upon the 100M row key? (or *mostly* so) We might also be able to

They are randomly distributed. Fully sorting the data is quite painful.

 pre-aggregate the rows using a plan like
   HashAgg
   SortedAgg
 or
   SortedAgg
   Sort
   SortedAgg
 
 The first SortedAgg seems superfluous, buy would reduce the row volume
 considerably if incoming rows were frequently naturally adjacent, even
 if the values were not actually sorted. (This could also be done during
 sorting, but its much easier to slot the extra executor step into the
 plan). That might then reduce the size of the later sort, or allow it to
 become a HashAgg.
 
 I could make that manually enabled using enable_pre_agg to allow us to
 measure the effectiveness of that technique and decide what cost model
 we'd use to make it automatic. Would that help?

I don't understand how this helps. The problem isn't the 3B data source
rows but rather the 100M destination keys that are being aggregated
against.

The memory constraints of HashAgg are a result of the large number of
target keys and should be the same if it was 100M rows or 10B rows.

I think I need something closer to:

HashAgg
- HashSort (to disk)

HashSort would create a number of files on disk with similar data.
Grouping all similar keys into a single temporary file which HashAgg can
deal with individually (100 loops by 1M target keys instead of 1 loop by
100M target keys). The results would be the same as partitioning by
keyblock and running a HashAgg on each partition, but it would be
handled by the Executor rather than by client side code.

  I've written something similar using a client and COPY with temporary
  tables. Even with the Export/Import copy I still beat the SortSum
  method PostgreSQL falls back to.
 
 You can get round this now by chopping the larger table into pieces with
 a WHERE clause and then putting them back together with a UNION. If the
 table is partitioned, then do this by partitions.

True, except this results in several sequential scans over the source
data. I can extract and sort in a single pass at client side but it
would be far better if I could get PostgreSQL to do the same. I could
probably write a plpgsql function to do that logic but it would be quite
messy.

 This should also help when it comes to recalculating the sums again in
 the future, since you'll only need to rescan the rows that have been
 added since the last summation.

We store the aggregated results and never do this type of calculation on
that dataset again. The original dataset comes from about 300 partitions
(time and source) and they are removed upon completion. While this
calculation is being performed additional partitions are added.

I suppose I could store source data in 300 * 1000 partitions (Approx 300
batches times 1000 segments) but that would probably run into other
problems. PostgreSQL probably has issues with that many tables.

-- 


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] ScanKey representation for RowCompare index conditions

2006-01-16 Thread Martijn van Oosterhout
On Sun, Jan 15, 2006 at 06:03:12PM -0500, Tom Lane wrote:
 There's one nontrivial decision still to make about how to implement
 proper per-spec row-comparison operations, namely: how a row
 comparison ought to be represented in the index access method API.
 The current representation of index conditions in the AM API is an
 array of ScanKey structs, one per indexcol op constant index
 condition, with implicit ANDing across all the conditions.  There
 are various bits of code that require the ScanKeys to appear in order
 by index column, though this isn't inherent in the data structure
 itself.

ISTM that row-wise comparisons, as far as indexes are concerned are
actually simpler than normal scan-keys. For example, if you have the
condition (a,b) = (5,1) then once the index has found that point,
every subsequent entry in the index matches (except possibly NULLs). So
you really don't actually need a row-comparison at any point after
than.

Now, if you have bracketing conditions like (a,b) BETWEEN (5,1) and
(7,0) it gets more interesting. Ideally you'd want to pass both of
these to the index so it knows when to stop. This really suggests using
your plan B since you then have the possibility of passing both, which
you don't really have with plan A. The latter also allows you to give
more auxilliary conditions like b4.

So it seems like a good idea. No plan C I can think of...

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


Re: [HACKERS] ScanKey representation for RowCompare index conditions

2006-01-16 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Sun, 2006-01-15 at 18:03 -0500, Tom Lane wrote:
 There's one nontrivial decision still to make about how to implement
 proper per-spec row-comparison operations, namely: how a row
 comparison ought to be represented in the index access method API.

 I'm not sure I understand why. Surely a row wise comparison can take
 advantage of indexes on some of its columns?

Not until we write the code to let it do so.  Currently, what the system
can match to an index on (a,b) is conditions like WHERE a = x AND b = y,
which is quite a different animal both semantically and representationally
from WHERE (a,b) = (x,y).  (At least, assuming that one imputes the
correct SQL-spec meaning to the latter, viz a  x OR (a = x AND b = y).)
Because of the semantic difference, we have to preserve the difference
when passing the condition to the index AM, thus the need for my
question.

 So why would we support this at all?

Example applications here and here:
http://archives.postgresql.org/pgsql-performance/2004-07/msg00188.php
http://archives.postgresql.org/pgsql-performance/2005-12/msg00590.php
(which demonstrate BTW the reasons for not trying to handle this by just
expanding out the OR/AND equivalents).

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Docs off on ILIKE indexing?

2006-01-16 Thread Magnus Hagander
http://www.postgresql.org/docs/8.1/static/indexes-types.html

says:
The optimizer can also use a B-tree index for queries involving the
pattern matching operators LIKE, ILIKE, ~, and ~*, if the pattern is a
constant and is anchored to the beginning of the string - for example,
col LIKE 'foo%' or col ~ '^foo', but not col LIKE '%bar'. 


But really, does it use indexes for ILIKE? (And I assume the same holds
for case insensitive regexp matching)

(If it does, can someone enlighten me on what I have to do - I have a
system with C locale that refuses to do it for ILIKE, but works just
fine for LIKE. My workaronud for now is to create an index on lower(foo)
and then use WHERE lower(foo) LIKE 'bar%' which works fine - but it does
require an extra index..)

So. Am I off, or are the docs? Or is it just me who can't read ;-)

//Magnus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] ScanKey representation for RowCompare index conditions

2006-01-16 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 ISTM that row-wise comparisons, as far as indexes are concerned are
 actually simpler than normal scan-keys. For example, if you have the
 condition (a,b) = (5,1) then once the index has found that point,
 every subsequent entry in the index matches (except possibly NULLs). So
 you really don't actually need a row-comparison at any point after
 than.

Well, yeah you do, eg if the caller starts demanding backward fetch;
you need to be able to tell where to stop.  In any case it's certainly
necessary to pass down the whole row-condition into the index AM;
without that, the closest you could get would be an indexscan on a = 5
which might have to scan an undesirably large number of rows before
reaching the first one with b = 1.

 Now, if you have bracketing conditions like (a,b) BETWEEN (5,1) and
 (7,0) it gets more interesting. Ideally you'd want to pass both of
 these to the index so it knows when to stop. This really suggests using
 your plan B since you then have the possibility of passing both, which
 you don't really have with plan A. The latter also allows you to give
 more auxilliary conditions like b4.

No, you misunderstood my plan A --- it's not giving up any of these
options.  But it's paying for it in complexity.  I was envisioning a
case like the above as being represented this way:

sk_flagssk_attnosk_func sk_datum

ROW_START   a   =  5
ROW_END b   =  1
ROW_START   a   =  7
ROW_END b   =  0
normal  b  4

Hence my comments about weird rules for the ordering of entries under
plan A --- the normal and ROW_START entries would be in order by attno,
but the row continuation/end entries wouldn't appear to follow this global
ordering.  They'd follow a local order within each row condition,
instead.

Since you didn't understand what I was saying, I suspect that plan A is
too confusing ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

2006-01-16 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Mon, 2006-01-16 at 00:07 -0500, Rod Taylor wrote:
 A couple of days ago I found myself wanting to aggregate 3 Billion
 tuples down to 100 Million tuples based on an integer key with six
 integer values -- six sum()'s.

 There is already hash table overflow (spill to disk) logic in HashJoins,
 so this should be possible by reusing that code for HashAggs. That's on
 my todo list, but I'd welcome any assistance.

Yeah, I proposed something similar awhile back in conjunction with
fixing the spill logic for hash joins (which was always there, but was
not adaptive until recently).  I got the join part done but got
distracted before fixing HashAgg :-(

In principle, you just reduce the range of currently-in-memory hash
codes whenever you run low on memory.  The so-far-accumulated working
state for aggregates that are not in the range anymore goes into a temp
file, and subsequently any incoming tuples that hash outside the range
go into another temp file.  After you've completed the scan, you
finalize and emit the aggregates that are still in memory, then you pick
up the first set of dropped aggregates, rescan the associated TODO
file of unprocessed tuples, lather rinse repeat till done.

The tricky part is to preserve the existing guarantee that tuples are
merged into their aggregate in arrival order.  (This does not matter for
the standard aggregates but it definitely does for custom aggregates,
and there will be unhappy villagers appearing on our doorsteps if we
break it.)  I think this can work correctly under the above sketch but
it needs to be verified.  It might require different handling of the
TODO files than what hashjoin does.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] [PATCH] Better way to check for getaddrinfo function.

2006-01-16 Thread Tom Lane
R, Rajesh (STSD) [EMAIL PROTECTED] writes:
 Just thought that the following patch might improve checking for
 getaddrinfo function (in configure.in)

Since AC_TRY_RUN tests cannot work in cross-compilation scenarios,
you need an *extremely* good reason to put one in.  I thought this
might improve things doesn't qualify.  Exactly what problem are you
trying to solve and why is a run-time test necessary?  Why doesn't
the existing coding work for you?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Docs off on ILIKE indexing?

2006-01-16 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 http://www.postgresql.org/docs/8.1/static/indexes-types.html
 says:
 The optimizer can also use a B-tree index for queries involving the
 pattern matching operators LIKE, ILIKE, ~, and ~*, if the pattern is a
 constant and is anchored to the beginning of the string - for example,
 col LIKE 'foo%' or col ~ '^foo', but not col LIKE '%bar'. 

 But really, does it use indexes for ILIKE?

That's pretty poorly phrased.  For ILIKE it'll only work if there's a
prefix of the pattern that's not letters (and hence is unaffected by
the case-folding issue).

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Warm-up cache may have its virtue

2006-01-16 Thread Jim C. Nasby
On Sat, Jan 14, 2006 at 04:13:56PM -0500, Qingqing Zhou wrote:
 
 Qingqing Zhou [EMAIL PROTECTED] wrote
 
  I wonder if we should really implement file-system-cache-warmup strategy
  which we have discussed before. There are two natural good places to do
  this:
 
  (1) sequentail scan
  (2) bitmap index scan
 
 
 For the sake of memory, there is a third place a warm-up cache or pre-read 
 is beneficial (OS won't help us):
 (3) xlog recovery

Wouldn't it be better to improve pre-reading data instead, ie, making
sure things like seqscan and bitmap scan always keep the IO system busy?
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Improving N-Distinct estimation by ANALYZE

2006-01-16 Thread Jim C. Nasby
On Fri, Jan 13, 2006 at 11:37:38PM -0500, Tom Lane wrote:
 Josh Berkus josh@agliodbs.com writes:
  It's also worth mentioning that for datatypes that only have an =
  operator the performance of compute_minimal_stats is O(N^2) when values
  are unique, so increasing sample size is a very bad idea in that case.
 
  Hmmm ... does ANALYZE check for UNIQUE constraints?
 
 Our only implementation of UNIQUE constraints is btree indexes, which
 require more than an = operator, so this seems irrelevant.

IIRC, the point was that if we know a field has to be unique, there's no
sense in doing that part of the analysis on it; you'd only care about
correlation.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] source documentation tool doxygen

2006-01-16 Thread Neil Conway
On Mon, 2006-01-16 at 07:57 -0600, Andrew Dunstan wrote:
 I too have done this. But retrofitting Doxygen style comments to the
 PostgreSQL source code would be a big undertaking. Maintaining it, which
 would be another task for reviewers/committers, would also be a pain unless
 there were some automated checking tool.

I don't think it would be all that painful. There would be no need to
convert the entire source tree to use proper Doxygen-style comments in
one fell swoop: individual files and modules can be converted whenever
anyone gets the inclination to do so. I don't think the maintenance
burden would be very substantial, either.

-Neil



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] source documentation tool doxygen

2006-01-16 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes:
 I don't think it would be all that painful. There would be no need to
 convert the entire source tree to use proper Doxygen-style comments in
 one fell swoop: individual files and modules can be converted whenever
 anyone gets the inclination to do so. I don't think the maintenance
 burden would be very substantial, either.

In the previous go-round on this topic, I seem to recall some concern
about side-effects, to wit reducing the readability of the comments for
ordinary non-doxygen code browsing.  I'd be quite against taking any
noticeable hit in that direction.

A quick look through the doxygen manual doesn't make it sound too
invasive, but I am worried about how well it will coexist with pgindent.
It seems both tools think they can dictate the meaning of the characters
immediately after /* of a comment block.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Surrogate keys (Was: enums)

2006-01-16 Thread Jim C. Nasby
On Sat, Jan 14, 2006 at 07:28:21PM +0900, Michael Glaesemann wrote:
 
 On Jan 13, 2006, at 21:42 , Leandro Guimar?es Faria Corcete DUTRA wrote:
 
 If you still declare the natural key(s) as UNIQUEs, you have just made
 performance worse.  Now there are two keys to be checked on UPDATEs  
 and
 INSERTs, two indexes to be updated, and probably a SEQUENCE too.
 
 For UPDATEs and INSERTs, the proper primary key also needs to be  
 checked, but keys are used for more than just checking uniqueness:  
 they're also often used in JOINs. Joining against a single integer  
 I'd think it quite a different proposition (I'd think faster in terms  
 of performance) than joining against, say, a text column or a  
 composite key.

a) the optimizer does a really poor job on multi-column index statistics
b) If each parent record will have many children, the space savings from
using a surrogate key can be quite large
c) depending on how you view things, putting actual keys all over the
place is denormalized

Generally, I just use surrogate keys for everything unless performance
dictates something else.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

2006-01-16 Thread Simon Riggs
On Mon, 2006-01-16 at 09:42 -0500, Rod Taylor wrote:
 On Mon, 2006-01-16 at 08:32 +, Simon Riggs wrote:
  On Mon, 2006-01-16 at 00:07 -0500, Rod Taylor wrote:
   
  A question: Are the rows in your 3 B row table clumped together based
  upon the 100M row key? (or *mostly* so) We might also be able to
 
 They are randomly distributed. Fully sorting the data is quite painful.

...

 I don't understand how this helps. 

It wouldn't since your rows are randomly distributed. The idea was not
related to improving HashAgg, but to improving Aggregation for the case
of naturally grouped data.

 I think I need something closer to:
 
 HashAgg
   - HashSort (to disk)
 
 HashSort would create a number of files on disk with similar data.
 Grouping all similar keys into a single temporary file which HashAgg can
 deal with individually (100 loops by 1M target keys instead of 1 loop by
 100M target keys). The results would be the same as partitioning by
 keyblock and running a HashAgg on each partition, but it would be
 handled by the Executor rather than by client side code.
 
   I've written something similar using a client and COPY with temporary
   tables. Even with the Export/Import copy I still beat the SortSum
   method PostgreSQL falls back to.

That is exactly how the spill to disk logic works for HashJoin (and
incidentally, identical to an Oracle one-pass hash join since both are
based upon the hybrid hash join algorithm). 

Multi-pass would only be required to handle very skewed hash
distributions, which HJ doesn't do yet.

So yes, this can be done. 

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

2006-01-16 Thread Simon Riggs
On Mon, 2006-01-16 at 12:36 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On Mon, 2006-01-16 at 00:07 -0500, Rod Taylor wrote:
  A couple of days ago I found myself wanting to aggregate 3 Billion
  tuples down to 100 Million tuples based on an integer key with six
  integer values -- six sum()'s.
 
  There is already hash table overflow (spill to disk) logic in HashJoins,
  so this should be possible by reusing that code for HashAggs. That's on
  my todo list, but I'd welcome any assistance.
 
 Yeah, I proposed something similar awhile back in conjunction with
 fixing the spill logic for hash joins (which was always there, but was
 not adaptive until recently).  I got the join part done but got
 distracted before fixing HashAgg :-(

You've done the main work. :-)

 The tricky part is to preserve the existing guarantee that tuples are
 merged into their aggregate in arrival order.  (This does not matter for
 the standard aggregates but it definitely does for custom aggregates,
 and there will be unhappy villagers appearing on our doorsteps if we
 break it.)  I think this can work correctly under the above sketch but
 it needs to be verified.  It might require different handling of the
 TODO files than what hashjoin does.

For HJ we write each outer tuple to its own file-per-batch in the order
they arrive. Reading them back in preserves the original ordering. So
yes, caution required, but I see no difficulty, just reworking the HJ
code (nodeHashjoin and nodeHash). What else do you see?

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] PostgreSQL win32 NT4

2006-01-16 Thread Bruce Momjian
[EMAIL PROTECTED] wrote:
 NT4 is officially dead, IMHO no need for PostgreSQL to officially support it,
 let's leave place for companies offering commercial postgresql versions to
 work on it if they have enough customer requests.
 BTW Win 2000 is more or less 6 years old now ...

Agreed.  If they are using NT4 they certainly should be happy using
PostgreSQL 8.1.X.  So, we will have an NT4 solution, it just won't be
our most recent major release.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] PostgreSQL win32 NT4

2006-01-16 Thread Joshua D. Drake

Bruce Momjian wrote:

[EMAIL PROTECTED] wrote:

NT4 is officially dead, IMHO no need for PostgreSQL to officially support it,
let's leave place for companies offering commercial postgresql versions to
work on it if they have enough customer requests.
BTW Win 2000 is more or less 6 years old now ...
I believe Microsoft has an official 10 year support policy. Although the 
don't

backport all features (which makes sense).



Agreed.  If they are using NT4 they certainly should be happy using
PostgreSQL 8.1.X.  So, we will have an NT4 solution, it just won't be
our most recent major release.




--
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: PLphp, PLperl - http://www.commandprompt.com/


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] PostgreSQL win32 NT4

2006-01-16 Thread Magnus Hagander
  NT4 is officially dead, IMHO no need for PostgreSQL to officially 
  support it, let's leave place for companies offering commercial 
  postgresql versions to work on it if they have enough 
 customer requests.
  BTW Win 2000 is more or less 6 years old now ...
 I believe Microsoft has an official 10 year support policy. 
 Although the don't backport all features (which makes sense).

They're 10 years for extended support. Only 5 years for mainstream
support, meaning it has already passed.

The difference? You can no longer get a hotfix unless you have a
specific hotfix agreement, that you have to buy separate. You don't
get any no-charge incident support (that would be
support-because-of-a-bug). You don't get to make any feature requests
(though seriously, they hardly ever listen to you unless you're really
big anyway). And you don't get any warranty claims, whatever that
means in reality.

You do, however, get paid support and security hotfixes.

//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

2006-01-16 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 For HJ we write each outer tuple to its own file-per-batch in the order
 they arrive. Reading them back in preserves the original ordering. So
 yes, caution required, but I see no difficulty, just reworking the HJ
 code (nodeHashjoin and nodeHash). What else do you see?

With dynamic adjustment of the hash partitioning, some tuples will go
through multiple temp files before they ultimately get eaten, and
different tuples destined for the same aggregate may take different
paths through the temp files depending on when they arrive.  It's not
immediately obvious that ordering is preserved when that happens.
I think it can be made to work but it may take different management of
the temp files than hashjoin uses.  (Worst case, we could use just a
single temp file for all unprocessed tuples, but this would result in
extra I/O.)

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Improving N-Distinct estimation by ANALYZE

2006-01-16 Thread Simon Riggs
On Mon, 2006-01-16 at 12:26 -0600, Jim C. Nasby wrote:
 On Fri, Jan 13, 2006 at 11:37:38PM -0500, Tom Lane wrote:
  Josh Berkus josh@agliodbs.com writes:
   It's also worth mentioning that for datatypes that only have an =
   operator the performance of compute_minimal_stats is O(N^2) when values
   are unique, so increasing sample size is a very bad idea in that case.
  
   Hmmm ... does ANALYZE check for UNIQUE constraints?
  
  Our only implementation of UNIQUE constraints is btree indexes, which
  require more than an = operator, so this seems irrelevant.
 
 IIRC, the point was that if we know a field has to be unique, there's no
 sense in doing that part of the analysis on it; you'd only care about
 correlation.

An interesting point: if we know the cardinality by definition there is
no need to ANALYZE for that column. An argument in favour of the
definitional rather than the discovery approach. As a designer I very
often already know the cardinality by definition, so I would appreciate
the opportunity to specify that to the database.

Tom has not spoken against checking for UNIQUE constraints: he is just
pointing out that there never could be a constraint in the case I was
identifying. For other datatypes we could check for them rather than
analyzing the sample, but the effect would be the same either way.

My original point was that behaviour is O(N^2) when unique; it is still
O(N^2) when nearly unique, albeit O(N^2) - O(N). Reducing stats_target
does not help there - the effect varies according to number of rows in
the sample, not num slots in MCV list.

Best Regards, Simon Riggs




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Improving N-Distinct estimation by ANALYZE

2006-01-16 Thread Manfred Koizar
On Fri, 13 Jan 2006 19:18:29 +, Simon Riggs
[EMAIL PROTECTED] wrote:
I enclose a patch for checking out block sampling.

Can't comment on the merits of block sampling and your implementation
thereof.  Just some nitpicking:

|!  * Row Sampling: As of May 2004, we use the Vitter algorithm to create

Linking the use of the Vitter algorithm to May 2004 is not quite
appropriate.  We introduced two stage sampling at that time.

|   * a random sample of targrows rows (or less, if there are less in the
|!  * sample of blocks). In this case, targblocks is always the same as
|!  * targrows, so we always read one row per block.

This is just wrong, unless you add on average.  Even then it is a
bit misleading, because in most cases we *read* more tuples than we
use.

|   * Although every row has an equal chance of ending up in the final
|   * sample, this sampling method is not perfect: not every possible
|   * sample has an equal chance of being selected.  For large relations
|   * the number of different blocks represented by the sample tends to be
|!  * too small.  In that case, block sampling should be used.

Is the last sentence a fact or personal opinion?

|   * block.  The previous sampling method put too much credence in the row
|   * density near the start of the table.

FYI, previous refers to the date mentioned above:
previous == before May 2004 == before two stage sampling.

|+ /* Assume that we will have rows at least 64 bytes wide 
|+  * Currently we're very unlikely to overflow availMem here 
|+  */
|+ if ((allocrows * sizeof(HeapTuple))  (allowedMem  4))

This is a funny way of saying
if (allocrows * (sizeof(Heaptuple) + 60)  allowedMem)

It doesn't match the comment above; and it is something different if
the size of a pointer is not four bytes.

|-  if (bs.m  0)
|+  if (bs.m  0 )

Oops.

|+  ereport(DEBUG2,
|+ (errmsg(ANALYZE attr %u sample: n=%u nmultiple=%u f1=%u d=%u, 
|+ stats-tupattnum,samplerows, nmultiple, f1, d)));
^
missing space here and in some more places

I haven't been following the discussion too closely but didn't you say
that a block sampling algorithm somehow compensates for the fact that
the sample is not random?
Servus
 Manfred

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Improving N-Distinct estimation by ANALYZE

2006-01-16 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Tom has not spoken against checking for UNIQUE constraints: he is just
 pointing out that there never could be a constraint in the case I was
 identifying.

More generally, arguing for or against any system-wide change on the
basis of performance of compute_minimal_stats() is folly, because that
function is not used for any common datatypes.  (Ideally it'd never be
used at all.)

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Anyone see a need for BTItem/HashItem?

2006-01-16 Thread Tom Lane
I'm considering getting rid of the BTItem/BTItemData and
HashItem/HashItemData struct definitions and just referencing
IndexTuple(Data) directly in the btree and hash AMs.  It appears that
at one time in the forgotten past, there was some access-method-specific
data in index entries in addition to the common IndexTuple struct, but
that's been gone for a long time and I can't see a reason why either of
these AMs would resurrect it.  So this just seems like extra notational
cruft to me, as well as an extra layer of palloc overhead (see eg
_bt_formitem()).  GIST already got rid of this concept, or never had it.

Does anyone see a reason to keep this layer of struct definitions?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Anyone see a need for BTItem/HashItem?

2006-01-16 Thread David Fetter
On Mon, Jan 16, 2006 at 03:52:01PM -0500, Tom Lane wrote:
 I'm considering getting rid of the BTItem/BTItemData and
 HashItem/HashItemData struct definitions and just referencing
 IndexTuple(Data) directly in the btree and hash AMs.  It appears
 that at one time in the forgotten past, there was some
 access-method-specific data in index entries in addition to the
 common IndexTuple struct, but that's been gone for a long time and I
 can't see a reason why either of these AMs would resurrect it.  So
 this just seems like extra notational cruft to me, as well as an
 extra layer of palloc overhead (see eg _bt_formitem()).  GIST
 already got rid of this concept, or never had it.
 
 Does anyone see a reason to keep this layer of struct definitions?

If you cut it out, what will the heap and index access methods
needed for SQL/MED use?

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778

Remember to vote!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Anyone see a need for BTItem/HashItem?

2006-01-16 Thread Tom Lane
David Fetter [EMAIL PROTECTED] writes:
 On Mon, Jan 16, 2006 at 03:52:01PM -0500, Tom Lane wrote:
 Does anyone see a reason to keep this layer of struct definitions?

 If you cut it out, what will the heap and index access methods
 needed for SQL/MED use?

What's that have to do with this?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Anyone see a need for BTItem/HashItem?

2006-01-16 Thread Jonah H. Harris
>From what I've seen, I don't think we need to keep them around.On 1/16/06, Tom Lane [EMAIL PROTECTED] wrote:
I'm considering getting rid of the BTItem/BTItemData andHashItem/HashItemData struct definitions and just referencing
IndexTuple(Data) directly in the btree and hash AMs.It appears thatat one time in the forgotten past, there was some access-method-specificdata in index entries in addition to the common IndexTuple struct, but
that's been gone for a long time and I can't see a reason why either ofthese AMs would resurrect it.So this just seems like extra notationalcruft to me, as well as an extra layer of palloc overhead (see eg
_bt_formitem()).GIST already got rid of this concept, or never had it.Does anyone see a reason to keep this layer of struct definitions?regards, tom lane---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Anyone see a need for BTItem/HashItem?

2006-01-16 Thread David Fetter
On Mon, Jan 16, 2006 at 04:02:07PM -0500, Tom Lane wrote:
 David Fetter [EMAIL PROTECTED] writes:
  On Mon, Jan 16, 2006 at 03:52:01PM -0500, Tom Lane wrote:
  Does anyone see a reason to keep this layer of struct
  definitions?
 
  If you cut it out, what will the heap and index access methods
  needed for SQL/MED use?
 
 What's that have to do with this?

I'm sure you'll correct me if I'm mistaken, but this is a candidate
for the spot where such interfaces--think of Informix's Virtual
(Table|Index) Interface--would go.

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778

Remember to vote!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[HACKERS] equivalence class not working?

2006-01-16 Thread uwcssa
not sure if this is the right place to post...

I am using postgres 8.1. In indxpath.c, it says Note: if Postgres tried to optimize queries by forming equivalenceclasses over equi-joined attributes (i.e., if it recognized that a
 qualification such as where a.b=c.d and a.b=5 could make use ofan index on c.d), then we could use that equivalence class info 
here with joininfo_list to do more complete tests for the usabilityof a partial index. . XXX as of 7.1, equivalence class info *is* available.



Now i have the following two queries on TPC-H, where there is an index built on o_totalprice. 

explain select * from lineitem, orders where o_totalprice=l_extendedprice and l_extendedprice2000;
explain select * from lineitem, orders where o_totalprice=l_extendedprice and l_extendedprice2000 and o_totalprice2000;

The second query uses the index while the first does not. It seems to me that both queries are the same (the o_totalprice2000 in the second query can be inferred). Is there something that needs to be tuned or ...?


thanks a lot!


Re: [HACKERS] Anyone see a need for BTItem/HashItem?

2006-01-16 Thread Tom Lane
David Fetter [EMAIL PROTECTED] writes:
 On Mon, Jan 16, 2006 at 04:02:07PM -0500, Tom Lane wrote:
 David Fetter [EMAIL PROTECTED] writes:
 If you cut it out, what will the heap and index access methods
 needed for SQL/MED use?
 
 What's that have to do with this?

 I'm sure you'll correct me if I'm mistaken, but this is a candidate
 for the spot where such interfaces--think of Informix's Virtual
 (Table|Index) Interface--would go.

Can't imagine putting anything related to external-database access
inside either the btree or hash AMs; it'd only make sense to handle
it at higher levels.  It's barely conceivable that external access
would make sense as a specialized AM in its own right, but I don't
see managing external links exclusively within the indexes.

IOW, if we did need extra stuff in IndexTuples for external access,
we'd want to put it inside IndexTuple, not in a place where it could
only be seen by these AMs.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] equivalence class not working?

2006-01-16 Thread Tom Lane
uwcssa [EMAIL PROTECTED] writes:
 I am using postgres 8.1.  In indxpath.c, it says  Note: if Postgres tried
 to optimize queries by forming equivalence
 classes over equi-joined attributes (i.e., if it recognized that
 aqualification such as where
 a.b=3Dc.d and a.b=3D5 could make use of
  an index on c.d), then we could use that equivalence class info here with
 joininfo_list to do more complete tests for the usability
 of a partial index.  .  XXX as of 7.1, equivalence class info *is*
 available.

Are you deliberately ignoring the rest of the comment?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] ScanKey representation for RowCompare index conditions

2006-01-16 Thread Martijn van Oosterhout
On Mon, Jan 16, 2006 at 12:07:44PM -0500, Tom Lane wrote:
 Since you didn't understand what I was saying, I suspect that plan A is
 too confusing ...

Umm, yeah. Now you've explained it I think it should be excluded on the
basis that it'll be a source of bugs. For all the places that matter a
row-condition needs to be treated as a whole and storing parts in a
top level ScanKey array just seems like asking for trouble.

Given no plan C, I guess plan B is the way to go...?
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
 tool for doing 5% of the work and then sitting around waiting for someone
 else to do the other 95% so you can sue them.


signature.asc
Description: Digital signature


Re: [HACKERS] Anyone see a need for BTItem/HashItem?

2006-01-16 Thread David Fetter
On Mon, Jan 16, 2006 at 04:21:50PM -0500, Tom Lane wrote:
 David Fetter [EMAIL PROTECTED] writes:
  On Mon, Jan 16, 2006 at 04:02:07PM -0500, Tom Lane wrote:
  David Fetter [EMAIL PROTECTED] writes:
  If you cut it out, what will the heap and index access
  methods needed for SQL/MED use?
  
  What's that have to do with this?
 
  I'm sure you'll correct me if I'm mistaken, but this is a
  candidate for the spot where such interfaces--think of Informix's
  Virtual (Table|Index) Interface--would go.
 
 Can't imagine putting anything related to external-database access
 inside either the btree or hash AMs; it'd only make sense to handle
 it at higher levels.  It's barely conceivable that external access
 would make sense as a specialized AM in its own right, but I don't
 see managing external links exclusively within the indexes.
 
 IOW, if we did need extra stuff in IndexTuples for external access,
 we'd want to put it inside IndexTuple, not in a place where it could
 only be seen by these AMs.

Thanks for the explanation :)

Cheers,
D
-- 
David Fetter [EMAIL PROTECTED] http://fetter.org/
phone: +1 415 235 3778

Remember to vote!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] source documentation tool doxygen

2006-01-16 Thread Michael Glaesemann


On Jan 17, 2006, at 3:51 , Tom Lane wrote:


A quick look through the doxygen manual doesn't make it sound too
invasive, but I am worried about how well it will coexist with  
pgindent.
It seems both tools think they can dictate the meaning of the  
characters

immediately after /* of a comment block.


I haven't looked at it yet, but might there not be a way to have a  
preprocessing step where the current comment format is converted to  
something doxygen-friendly? pg_indent2doxygen or something? Then the  
current comment style could be maintained and doxygen could get what  
it likes.


Michael Glaesemann
grzm myrealbox com




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] ScanKey representation for RowCompare index conditions

2006-01-16 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 On Mon, Jan 16, 2006 at 12:07:44PM -0500, Tom Lane wrote:
 Since you didn't understand what I was saying, I suspect that plan A is
 too confusing ...

 Umm, yeah. Now you've explained it I think it should be excluded on the
 basis that it'll be a source of bugs. For all the places that matter a
 row-condition needs to be treated as a whole and storing parts in a
 top level ScanKey array just seems like asking for trouble.

Actually, I'd just been coming back to plan A, after realizing that it's
not possible to do this with zero change in _bt_checkkeys() after all.
The amount of information passed down to an ordinary sk_func isn't
sufficient to do what the row-compare subroutine would need to do:
it needs access to the whole indextuple, not only the first column's
value, plus access to the direction and continuescan variables.  We
could add extra parameters to what gets passed, but that's certainly
no cheaper than adding a test on some sk_flags bits.  And having the
row comparison logic inside bt_checkkeys is probably more understandable
than having a separate subroutine with strange calling conventions.

I think given adequate documentation in skey.h, either representation
is about the same as far as the source-of-bugs argument goes.  The main
risk you get from a Plan-B approach is that it's easier to fail to
notice bugs of omission, where a piece of code looking at a ScanKey
needs to do something special with a row-compare key and omits to do so.
At least that's what I'm guessing at the moment --- I haven't actually
tried to code up any of the changes yet.  In any case, since we are only
intending to support this for searches of btree indexes, there are not
that many pieces of code that will be concerned with it.  I think it's
just _bt_first, _bt_preprocess_keys, _bt_checkkeys, and
ExecIndexBuildScanKeys.  Everyplace else will be dealing only in
simple ScanKeys with no row comparisons.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] source documentation tool doxygen

2006-01-16 Thread Joachim Wieland
On Tue, Jan 17, 2006 at 06:51:15AM +0900, Michael Glaesemann wrote:
 I haven't looked at it yet, but might there not be a way to have a
 preprocessing step where the current comment format is converted to
 something doxygen-friendly? pg_indent2doxygen or something? Then the
 current comment style could be maintained and doxygen could get what
 it likes.

It's not about getting in all the comments, it's mostly about comments
describing functions / makros / typedefs / structures / global variables
and the like.

Comments in function bodies will just stay I guess.

However we could try to find some de facto rules about comments to get most
of the matches: For example a comment starting on column 0 of the line is
probably one you're looking for vs. one that has some whitespace at the
beginning is not.

Of course this would still need manual review for every file.


Joachim

-- 
Joachim Wieland  [EMAIL PROTECTED]
C/ Usandizaga 12 1°B   ICQ: 37225940
20002 Donostia / San Sebastian (Spain) GPG key available

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] source documentation tool doxygen

2006-01-16 Thread Bruce Momjian

Wow, looks great.  Is that URL stable?  Can we link to it from the
PostgreSQL developers page?

http://www.postgresql.org/developer/coding

---

Joachim Wieland wrote:
 I've created a browsable source tree documentation, it's done with the
 doxygen tool.
 
 http://www.mcknight.de/pgsql-doxygen/cvshead/html/
 
 There was a discussion about this some time ago, Jonathan Gardner proposed
 it here:
 
 http://archives.postgresql.org/pgsql-hackers/2004-03/msg00748.php
 
 quite a few people found it useful but it somehow got stuck. The reason was
 apparently that you'd have to tweak your comments in order to profit from
 it as much as possible.
 
 However I still think it's a nice-to-have among the online documentation
 and it gives people quite new to the code the possibility to easily check
 what is where defined and how... What do you think?
 
 doxygen can also produce a pdf but I haven't succeeded in doing that so far,
 pdflatex keeps bailing out. Has anybody else succeeded building this yet?
 
 
 
 Joachim
 
 -- 
 Joachim Wieland  [EMAIL PROTECTED]
 C/ Usandizaga 12 1?B   ICQ: 37225940
 20002 Donostia / San Sebastian (Spain) GPG key available
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [pgsql-www] [HACKERS] source documentation tool doxygen

2006-01-16 Thread Marc G. Fournier


the only question I have ... is there any way of improving that right 
index?  Love the 'detail pages', but somehow making the right index more 
'tree like' might make navigation a bit easier ...




On Mon, 16 Jan 2006, Bruce Momjian wrote:



Wow, looks great.  Is that URL stable?  Can we link to it from the
PostgreSQL developers page?

http://www.postgresql.org/developer/coding

---

Joachim Wieland wrote:

I've created a browsable source tree documentation, it's done with the
doxygen tool.

http://www.mcknight.de/pgsql-doxygen/cvshead/html/

There was a discussion about this some time ago, Jonathan Gardner proposed
it here:

http://archives.postgresql.org/pgsql-hackers/2004-03/msg00748.php

quite a few people found it useful but it somehow got stuck. The reason was
apparently that you'd have to tweak your comments in order to profit from
it as much as possible.

However I still think it's a nice-to-have among the online documentation
and it gives people quite new to the code the possibility to easily check
what is where defined and how... What do you think?

doxygen can also produce a pdf but I haven't succeeded in doing that so far,
pdflatex keeps bailing out. Has anybody else succeeded building this yet?



Joachim

--
Joachim Wieland  [EMAIL PROTECTED]
C/ Usandizaga 12 1?B   ICQ: 37225940
20002 Donostia / San Sebastian (Spain) GPG key available

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq



--
 Bruce Momjian|  http://candle.pha.pa.us
 pgman@candle.pha.pa.us   |  (610) 359-1001
 +  If your life is a hard drive, |  13 Roberts Road
 +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings




Marc G. Fournier   Hub.Org Networking Services (http://www.hub.org)
Email: [EMAIL PROTECTED]   Yahoo!: yscrappy  ICQ: 7615664

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] equivalence class not working?

2006-01-16 Thread uwcssa
Fine. The rest documentation says:For now, the test only uses restriction clauses (those in restrictinfo_list). --Nels, Dec '92, however, I understand it as being overridden by the

followup, which is:XXX as of 7.1, equivalence class info *is* available. Considerimproving this code as foreseen by Nels.

Therefore, equivalence class should be detected and used for index selection... or anyone
could tell me if after 7.1 Postgresql has determined not to use equi-join for index selection...

On 1/16/06, Tom Lane [EMAIL PROTECTED] wrote:
uwcssa [EMAIL PROTECTED] writes: I am using postgres 
8.1.In indxpath.c, it says  Note: if Postgres tried to optimize queries by forming equivalence classes over equi-joined attributes (i.e., if it recognized that aqualification such as where
 a.b=3Dc.d and a.b=3D5 could make use ofan index on c.d), then we could use that equivalence class info here with joininfo_list to do more complete tests for the usability of a partial index..XXX as of 
7.1, equivalence class info *is* available.Are you deliberately ignoring the rest of the comment? regards, tom lane


Re: [pgsql-www] [HACKERS] source documentation tool doxygen

2006-01-16 Thread Michael Glaesemann


On Jan 17, 2006, at 8:40 , Marc G. Fournier wrote:



the only question I have ... is there any way of improving that  
right index?  Love the 'detail pages', but somehow making the right  
index more 'tree like' might make navigation a bit easier ...


Along those lines, I wonder if a CSS couldn't be worked up to  
integrate the look with the rest of the site.


Michael Glaesemann
grzm myrealbox com




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Improving N-Distinct estimation by ANALYZE

2006-01-16 Thread Simon Riggs
On Mon, 2006-01-16 at 21:24 +0100, Manfred Koizar wrote:
 On Fri, 13 Jan 2006 19:18:29 +, Simon Riggs
 [EMAIL PROTECTED] wrote:
 I enclose a patch for checking out block sampling.
 
 Can't comment on the merits of block sampling and your implementation
 thereof.  

But your thoughts are welcome...

 |!  * Row Sampling: As of May 2004, we use the Vitter algorithm to create
 
 Linking the use of the Vitter algorithm to May 2004 is not quite
 appropriate.  We introduced two stage sampling at that time.

I'll redo some of the comments as you suggest and review coding points.

 |   * a random sample of targrows rows (or less, if there are less in the
 |!  * sample of blocks). In this case, targblocks is always the same as
 |!  * targrows, so we always read one row per block.
 
 This is just wrong, unless you add on average.  Even then it is a
 bit misleading, because in most cases we *read* more tuples than we
 use.

The current code reads a number of blocks == targrows, hence one per
block but I'll change the comment.

 |   * Although every row has an equal chance of ending up in the final
 |   * sample, this sampling method is not perfect: not every possible
 |   * sample has an equal chance of being selected.  For large relations
 |   * the number of different blocks represented by the sample tends to be
 |!  * too small.  In that case, block sampling should be used.
 
 Is the last sentence a fact or personal opinion?

That is my contention, based upon research. The existing code clearly
identifies that case as requiring a solution. We use Chaudhuri et al's
1998 result for the sufficiency of sample size, yet overlook his
estimators and his ideas for random block selection.

My first point is that we need proportional sample size, not fixed size.
[I show that for large tables we currently use a sample that is 2-5
times too small, even based on Chaudhuri's work.]

ANALYZE is very I/O bound: random row sampling would need to scan the
whole table to take a 2-3% sample in most cases. Block sampling is a
realistic way of getting a larger sample size without loss of
performance, and also a way of getting a reasonable sample when
accessing very few blocks.

We would keep random row sampling for smaller relations where the
performance effects of sample collection are not noticeable.

 I haven't been following the discussion too closely but didn't you say
 that a block sampling algorithm somehow compensates for the fact that
 the sample is not random?

I haven't said that block sampling compensates for the sample being
non-random. There is a danger of bias in the sample and I made that
clear. I've tried to mitigate that by the use of random blocks, reusing
your code and taking note of the earlier problems you solved. However,
block sampling allows us to spot cases where rows are naturally grouped
together, which row sampling doesn't spot until very high sample sizes.
I explored in detail a common case where this causes the current method
to fall down very badly. 

Brutlag  Richardson [2000] give a good run down on block sampling and
I'm looking to implement some of his block estimators to complement the
current patch. (Credit to Andrew Dunstan for locating the paper...)
http://www.stat.washington.edu/www/research/reports/1999/tr355.ps

[I did briefly explore the possibility of hybrid row/block sampling,
using row sampling as well some block sampling to identify grouping,
with an attempt to avoid swamping the sample with biased rows; but I
don't have a strong basis for that, so I've ignored that for now.]

This patch allows us to explore the possible bias that block sampling
gives, allowing us to make a later decision to include it, or not.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [pgsql-www] [HACKERS] source documentation tool doxygen

2006-01-16 Thread Robert Treat
This was my plan all along, just been waiting for someone to make it work with 
the postgresql code and then send instructions to the postgresql web team on 
how to set it up. 

Robert Treat

On Monday 16 January 2006 18:32, Bruce Momjian wrote:
 Wow, looks great.  Is that URL stable?  Can we link to it from the
 PostgreSQL developers page?

  http://www.postgresql.org/developer/coding

 ---

 Joachim Wieland wrote:
  I've created a browsable source tree documentation, it's done with the
  doxygen tool.
 
  http://www.mcknight.de/pgsql-doxygen/cvshead/html/
 
  There was a discussion about this some time ago, Jonathan Gardner
  proposed it here:
 
  http://archives.postgresql.org/pgsql-hackers/2004-03/msg00748.php
 
  quite a few people found it useful but it somehow got stuck. The reason
  was apparently that you'd have to tweak your comments in order to profit
  from it as much as possible.
 
  However I still think it's a nice-to-have among the online documentation
  and it gives people quite new to the code the possibility to easily check
  what is where defined and how... What do you think?
 
  doxygen can also produce a pdf but I haven't succeeded in doing that so
  far, pdflatex keeps bailing out. Has anybody else succeeded building this
  yet?
 
 
 
  Joachim
 
  --
  Joachim Wieland 
  [EMAIL PROTECTED] C/ Usandizaga 12 1?B 
   ICQ: 37225940 20002 Donostia / San Sebastian (Spain)
  GPG key available
 
  ---(end of broadcast)---
  TIP 3: Have you checked our extensive FAQ?
 
 http://www.postgresql.org/docs/faq

-- 
Robert Treat
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

2006-01-16 Thread Simon Riggs
On Mon, 2006-01-16 at 14:43 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  For HJ we write each outer tuple to its own file-per-batch in the order
  they arrive. Reading them back in preserves the original ordering. So
  yes, caution required, but I see no difficulty, just reworking the HJ
  code (nodeHashjoin and nodeHash). What else do you see?
 
 With dynamic adjustment of the hash partitioning, some tuples will go
 through multiple temp files before they ultimately get eaten, and
 different tuples destined for the same aggregate may take different
 paths through the temp files depending on when they arrive.  It's not
 immediately obvious that ordering is preserved when that happens.
 I think it can be made to work but it may take different management of
 the temp files than hashjoin uses.  (Worst case, we could use just a
 single temp file for all unprocessed tuples, but this would result in
 extra I/O.)

Sure hash table is dynamic, but we read all inner rows to create the
hash table (nodeHash) before we get the outer rows (nodeHJ).
Why would we continue to dynamically build the hash table after the
start of the outer scan? (I see that we do this, as you say, but I am
surprised).

Best Regards, Simon Riggs






---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

2006-01-16 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Sure hash table is dynamic, but we read all inner rows to create the
 hash table (nodeHash) before we get the outer rows (nodeHJ).

But our idea of the number of batches needed can change during that
process, resulting in some inner tuples being initially assigned to the
wrong temp file.  This would also be true for hashagg.

 Why would we continue to dynamically build the hash table after the
 start of the outer scan?

The number of tuples written to a temp file might exceed what we want to
hold in memory; we won't detect this until the batch is read back in,
and in that case we have to split the batch at that time.  For hashagg
this point would apply to the aggregate states not the input tuples, but
it's still a live problem (especially if the aggregate states aren't
fixed-size values ... consider a concat aggregate for instance).

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] source documentation tool doxygen

2006-01-16 Thread Jonathan Gardner
On Monday 16 January 2006 10:51, Tom Lane wrote:
 Neil Conway [EMAIL PROTECTED] writes:
  I don't think it would be all that painful. There would be no need to
  convert the entire source tree to use proper Doxygen-style comments in
  one fell swoop: individual files and modules can be converted whenever
  anyone gets the inclination to do so. I don't think the maintenance
  burden would be very substantial, either.
 
 In the previous go-round on this topic, I seem to recall some concern
 about side-effects, to wit reducing the readability of the comments for
 ordinary non-doxygen code browsing.  I'd be quite against taking any
 noticeable hit in that direction.
 
 A quick look through the doxygen manual doesn't make it sound too
 invasive, but I am worried about how well it will coexist with pgindent.
 It seems both tools think they can dictate the meaning of the characters
 immediately after /* of a comment block.
 

In my experimentation, I found that there are really three paths for using 
Doxygen on code.

(1) Do nothing to the code. Doxygen in this state works pretty well. You can 
browse the code and you can find references to things. Some of the comments 
are picked up correctly as-is. It does a really good job of formatting the 
code for HTML.

(2) Do minimally invasive changes to make Doxygen recognize comments better. 
Example: Move comments from behind to in-front, or vice-versa. Put an extra 
'*' after '/*' to denote special documentation. This makes Doxygen especially 
useful, because most things already have proper documentation and bringing 
that out to the person who browses Doxygen's output is very good. All of a 
sudden, we have a manual that reads like a book describing how PostgreSQL is 
put together PLUS we still have the marked-up code.

(3) Do maximally invasive changes to the comments. This includes things like 
using Doxygen-specific markup, putting in LaTeX for formulas, and putting in 
cross-references. This is a huge amount of work (initial + maintenance) and 
doesn't add much benefit beyond what (2) has. It makes the book read more 
like a book, but it was already pretty readable.

It's not a question of WHETHER to use Doxygen. You can use it whether or not 
anyone else wants to. It's really a question of how far do we bend backwards 
to accomodate Doxygen. I'd say we just stick to moving comments around and 
putting in the extra '*'s. Let's not encourage people to put in Doxygen 
markup that isn't obvious. If someone contributes code with Doxygen mark-up, 
we won't turn them down but we won't promise we'll keep their markup as 
pretty as it was when they submitted it.

-- 
Jonathan Gardner
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 6: explain analyze is your friend


[HACKERS] FW: PGBuildfarm member firefly Branch HEAD Failed at Stage Check

2006-01-16 Thread Larry Rosenman
PG Build Farm wrote:
 The PGBuildfarm member firefly had the following event on branch HEAD:
 
 Failed at Stage: Check
 
 The snapshot timestamp for the build that triggered this notification
 is: 2006-01-17 03:27:00 
 
 The specs of this machine are:
 OS:  UnixWare / 7.1.4
 Arch: i386
 Comp: cc (SCO UDK) / 4.2
 
 For more information, see
 http://www.pgbuildfarm.org/cgi-bin/show_history.pl?nm=fireflybr=HEAD 

This is a Stats failure.  I thought the latest change Tom put in would fix
it.

I guess not.

How can I help?

LER


-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 512-248-2683 E-Mail: ler@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3683 US


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

2006-01-16 Thread Greg Stark

Tom Lane [EMAIL PROTECTED] writes:

  Why would we continue to dynamically build the hash table after the
  start of the outer scan?
 
 The number of tuples written to a temp file might exceed what we want to
 hold in memory; we won't detect this until the batch is read back in,
 and in that case we have to split the batch at that time.  For hashagg
 this point would apply to the aggregate states not the input tuples, but
 it's still a live problem (especially if the aggregate states aren't
 fixed-size values ... consider a concat aggregate for instance).

For a hash aggregate would it be possible to rescan the original table instead
of spilling to temporary files? Then when you run out of working memory you
simply throw out half the hash table and ignore subsequent tuples that fall in
those hash buckets. Then you rescan for the discarded hash bucket regions.

This avoids having to do any disk writes at the expense possibly of additional
reads. I think in terms of i/o it would be much faster in most cases.

The downsides are: a) volatile aggregates or aggregates with side-effects
would be confused by being executed twice. I'm not clear that volatile
aggregate functions make any sense anyways though. b) I'm unclear whether
rescanning the table could potentially find tuples in a different state than
previous scans. If so then the idea doesn't work at all. But I don't think
that's possible is it?

The main problem is c) it may lose in terms of i/o for cases where the
cardinality is low (ie, it's overflowing despite having low cardinality
because the table is really really big too). But most cases will be spilling
because the cardinality is high. So the table may be big but the spill files
are nearly as big anyways and having to write and then read them means double
the i/o.

The upside of not having to write out temporary files is big. I find queries
that require temporary sort files get hit with a *huge* performance penalty.
Often an order of magnitude. Part of that could probably be mitigated by
having the sort files on a separate spindle but I think it's always going to
hurt especially if there are multiple operations spilling to disk
simultaneously.

-- 
greg


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] source documentation tool doxygen

2006-01-16 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes:
 Wow, looks great.  Is that URL stable?  Can we link to it from the
 PostgreSQL developers page?

The thing seems to have only the very vaguest grasp on whether it is
parsing C or C++ ... or should I say that it is convinced it is parsing
C++ despite all evidence to the contrary?  I'd be happier with the
pretty pictures if they had some connection to reality.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

2006-01-16 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 For a hash aggregate would it be possible to rescan the original table
 instead of spilling to temporary files?

Sure, but the possible performance gain is finite and the possible
performance loss is not.  The original table could be an extremely
expensive join.  We'd like to think that the planner gets the input size
estimate approximately right and so the amount of extra I/O caused by
hash table resizing should normally be minimal.  The cases where it is
not right are *especially* not likely to be a trivial table scan as you
are supposing.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Large Scale Aggregation (HashAgg Enhancement)

2006-01-16 Thread Simon Riggs
On Mon, 2006-01-16 at 20:02 -0500, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  Sure hash table is dynamic, but we read all inner rows to create the
  hash table (nodeHash) before we get the outer rows (nodeHJ).
 
 But our idea of the number of batches needed can change during that
 process, resulting in some inner tuples being initially assigned to the
 wrong temp file.  This would also be true for hashagg.

So we correct that before we start reading the outer table.

  Why would we continue to dynamically build the hash table after the
  start of the outer scan?
 
 The number of tuples written to a temp file might exceed what we want to
 hold in memory; we won't detect this until the batch is read back in,
 and in that case we have to split the batch at that time.  For hashagg
 this point would apply to the aggregate states not the input tuples, but
 it's still a live problem (especially if the aggregate states aren't
 fixed-size values ... consider a concat aggregate for instance).

OK, I see what you mean. Sounds like we should have a new definition for
Aggregates, Sort Insensitive that allows them to work when the input
ordering does not effect the result, since that case can be optimised
much better when using HashAgg. Since we know that applies to the common
cases of SUM, AVG etc this will certainly help people.

For sort-sensitive aggregates sounds like we either:
1. Write to a single file, while we remember the start offset of the
first row of each batch.
2. Write to multiple files, adding a globally incrementing sequenceid.
Batches are then resorted on the sequenceid before processing.
3. We give up, delete the existing batches and restart the scan from the
beginning of the outer table.

Sounds like (1) is best, since the overflow just becomes a SortedAgg.
But all of them sound ugly.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [GENERAL] [PATCH] Better way to check for getaddrinfo function.

2006-01-16 Thread R, Rajesh (STSD)
Title: RE: [GENERAL] [PATCH] Better way to check for getaddrinfo function. 







That was very much situation specific.

But the bottomline is the default test does not include netdb.h in the test code.

So, pg uses getaddrinfo.c.And the getaddrinfo.c does not work for me. 

Ipv6 client authenciation fails.


I have modified the patch.


$ diff -r configure.in configure.in.new

918a919

 AC_MSG_CHECKING([for getaddrinfo])

920c921,926

 AC_REPLACE_FUNCS([getaddrinfo])

---

 AC_TRY_LINK([#include netdb.h #include assert.h],

 [char (*f)();f=getaddrinfo;],

 ac_cv_func_getaddrinfo=yes, ac_cv_func_getaddrinfo=no)

 if test x$ac_cv_func_getaddrinfo = xyes; then

 AC_DEFINE(HAVE_GETADDRINFO,1,[Define if you have the getaddrinfo function])

 fi

923a930

 AC_MSG_RESULT([$ac_cv_func_getaddrinfo])



Rajesh R

--

This space intentionally left non-blank. 


-Original Message-

From: Tom Lane [mailto:[EMAIL PROTECTED]] 

Sent: Monday, January 16, 2006 11:28 PM

To: R, Rajesh (STSD)

Cc: pgsql-hackers@postgresql.org; pgsql-general@postgresql.org

Subject: Re: [GENERAL] [PATCH] Better way to check for getaddrinfo function. 


R, Rajesh (STSD) [EMAIL PROTECTED] writes:

 Just thought that the following patch might improve checking for 

 getaddrinfo function (in configure.in)


Since AC_TRY_RUN tests cannot work in cross-compilation scenarios, you need an *extremely* good reason to put one in. I thought this might improve things doesn't qualify. Exactly what problem are you trying to solve and why is a run-time test necessary? Why doesn't the existing coding work for you?

   regards, tom lane configure-in.patch 





configure-in.patch
Description: configure-in.patch

---(end of broadcast)---
TIP 6: explain analyze is your friend