Re: [PATCHES] Error correction for n_dead_tuples

2007-09-26 Thread Bruce Momjian

This patch is no longer needed.  We can revisit this during 8.4 to see
if it is still needed now that we have HOT.

---

ITAGAKI Takahiro wrote:
 Here is a patch discussed in
 http://archives.postgresql.org/pgsql-hackers/2007-02/msg00010.php
 
 Concurrent vacuum will save n_dead_tuples value at the beginning.
 Stats collector will be subtract the value from n_dead_tuples
 instead of setting it to zero. The statistics accuracy of n_dead_tuples
 will be better, especially just after finish of a vacuum.
 
 The behavior in VACUUM FULL is not changed because concurrent updates
 are not allowed during VACUUM FULL.
 
 Comments welcome.
 
 Regards,
 ---
 ITAGAKI Takahiro
 NTT Open Source Software Center

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] Recalculating OldestXmin in a long-running vacuum

2007-09-26 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Heikki Linnakangas wrote:
 Tom Lane wrote:
  Heikki Linnakangas [EMAIL PROTECTED] writes:
  Maybe we should keep this issue open until we resolve the vacuum WAL 
  flush issue? I can then rerun the same tests to see if this patch is a 
  win after that.
  
  Sounds like a plan, if you are willing to do that.
 
 Sure, just rerunning the same tests isn't much work.
 
 Bruce Momjian wrote:
   Would you like to add a TODO item?
 
 I don't know how we track things like this. Maybe add to the end of the 
 patch queue, with link to this discussion so that we remember that it 
 needs more testing?
 
 -- 
Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] Load Distributed Checkpoints, final patch

2007-09-26 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Tom Lane wrote:
 Heikki Linnakangas [EMAIL PROTECTED] writes:
  Here's latest revision of Itagaki-sans Load Distributed Checkpoints patch:
 
 Applied with some minor revisions to make some of the internal APIs a
 bit cleaner; mostly, it seemed like a good idea to replace all those
 bool parameters with a flag-bits approach, so that you could have
 something like CHECKPOINT_FORCE | CHECKPOINT_WAIT instead of
 false, true, true, false ...
 
 For the moment I removed all the debugging elog's in the patch.
 We still have Greg Smith's checkpoint logging patch to look at
 (which I suppose needs adjustment now), and that seems like the
 appropriate venue to consider what to put in.
 
 Also, the question of redesigning the bgwriter's LRU scan is
 still open.  I believe that's on Greg's plate, too.
 
 One other closely connected item that might be worth looking at is the
 code for creating new future xlog segments (PreallocXlogFiles).  Greg
 was griping upthread about xlog segment creation being a real
 performance drag.  I realized that as we currently have it set up, the
 checkpoint code is next to useless for high-WAL-volume installations,
 because it only considers making *one* future XLOG segment.  Once you've
 built up enough XLOG segments, the system isn't too bad about recycling
 them, but there will be a nasty startup transient where foreground
 processes have to stop and make the things.  I wonder whether it would
 help if we (a) have the bgwriter call PreallocXlogFiles during its
 normal loop, and (b) back the slop in PreallocXlogFiles way off, so that
 it will make a future segment as soon as we start using the last
 existing segment, instead of only when we're nearly done.  This would at
 least make it more likely that the bgwriter does the work instead of a
 foreground process.  I'm hesitant to go much further than that, because
 I don't want to bloat the minimum disk footprint for low-volume
 installations, but the minimum footprint is really 2 xlog files anyway...
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?
 
http://archives.postgresql.org

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] allow CSV quote in NULL

2007-09-26 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Stephen Frost wrote:
 Greetings,
 
   Please find attached a minor patch to remove the constraints that a
   user can't include the delimiter or quote characters in a 'NULL AS'
   string when importing CSV files.
 
   This allows a user to explicitly request that NULL conversion happen
   on fields which are quoted.  As the quote character is being allowed
   to be in the 'NULL AS' string now, there's no reason to exclude the
   delimiter character from being seen in that string as well, though
   unless quoted using the CSV quote character it won't ever be matched.
 
   An example of the usage:
 
   sfrost*=# \copy billing_data from ~/BillingSamplePricerFile.csv
 with csv header quote as '' null as ''
 
   This is no contrived example, it's an issue I ran into earlier today
   when I got a file which had (for reasons unknown to me and not easily
   changed upstream): 
   
   1,V,WASHDCABC12,,120033...
 
   Both of the ending columns shown are integer fields, the  here being
   used to indicate a NULL value.
 
   Without the patch, an ERROR occurs:
 
   sfrost= \copy billing_data from ~/BillingSamplePricerFile.csv 
 with csv header quote as ''
   ERROR:  invalid input syntax for integer: 
 
   And there's no way to get it to import with COPY CSV mode.  The
   patch adds this ability without affecting existing usage or changing
   the syntax.  Even with the patch an ERROR occurs with the default
   treatment of CSV files:
 
   sfrost=# \copy billing_data from ~/BillingSamplePricerFile.csv
with csv header quote as ''
   ERROR:  invalid input syntax for integer: 
 
   Which would be expected.  If the file is modified to remove the s
   for NULL columns, it imports just fine with the syntax above.
 
   It'd be really nice to have this included.
 
   Thanks!
 
   Stephen

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] strpos() KMP

2007-09-26 Thread Bruce Momjian

Added to TODO:

 * Implement Boyer-Moore searching in strpos()

   http://archives.postgresql.org/pgsql-patches/2007-08/msg00012.php


---

Pavel Ajtkulov wrote:
 Hello,
 
 this patch allow to use Knuth-Morrison-Pratt algorithm for strpos() function 
 (see Cormen et al. Introduction to Algorithms, MIT Press, 2001).
 
 It also works with multibyte wchar.
 
 In worst case current brute force strpos() takes O(n * m) (n  m is length 
 of strings) 
 time (example: 'aaa...aaab' search in 'aaa...aaa').
 KMP algo always takes O(n + m) time. 
 To check this someone need to create a table with one text attribute, and 
 insert several thousands
 record 'aa..aa'(for example, with lenght = 1000) . After execute select 
 count(*) from test where 
 strpos(a, 'aaaaab')  0; on current and modified version.
 
 Also, I advise to use select .. where strpos(att, 'word')  0; instead 
 select .. where attr like '%word%'
 (strpos must be faster than regex).
 
 In general, this belongs to artificial expressions. In natural language KMP 
 is equal (execution time)
 current strpos() nearly.
 
 
 
 Ajtkulov Pavel
 [EMAIL PROTECTED]
 
 P. S. Sorry for prime English.

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] [HACKERS] Include Lists for Text Search

2007-09-26 Thread Bruce Momjian

This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---

Simon Riggs wrote:
 On Mon, 2007-09-10 at 10:21 -0400, Tom Lane wrote:
  Oleg Bartunov [EMAIL PROTECTED] writes:
   On Mon, 10 Sep 2007, Simon Riggs wrote:
   Can we include that functionality now?
  
   This could be realized very easyly using dict_strict, which returns
   only known words, and mapping contains only this dictionary. So, 
   feel free to write it and submit.
  
  ... for 8.4.
 
 I've coded a small patch to allow CaseSensitive synonyms.
 
   CREATE TEXT SEARCH DICTIONARY my_diction (
  TEMPLATE = biglist,
  DictFile = words,
  CaseSensitive = true
   );
 
 -- 
   Simon Riggs
   2ndQuadrant  http://www.2ndQuadrant.com

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Thread-safe PREPARE in ecpg

2007-09-26 Thread Michael Meskes
On Wed, Sep 26, 2007 at 01:43:34PM +0900, ITAGAKI Takahiro wrote:
 Here is a revised patch against CVS HEAD.
 I fixed a bug in ECPGdeallocate_all().

Applied to CVS HEAD. I also added your example to the regression tests.

Michael
-- 
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: [EMAIL PROTECTED]
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] Hash Index Build Patch

2007-09-26 Thread Alvaro Herrera
Hi Tom,

Tom Raney wrote:

 We used spool functions from the BTree code to sort the index
 tuples. Sorting is done on the hash value of the tuples.  The hash
 value depends on the number of primary bucket pages (henceforth
 just bucket pages) that will be required to fit all the index
 tuples. So, before sorting, the base relation is scanned to get
 the total number of tuples.

Just wondering, wouldn't it be enough to obtain a tuple count estimate
by using reltuples / relpages * RelationGetNumberOfBlocks, like the
planner does?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Optimizer hook

2007-09-26 Thread Tom Lane
I've applied this patch with revision to put the hook where I thought it
made sense.  Attached is a modification of your dummy.c to show use of
the hook.  I didn't test it heavily, but I did check that it seemed to
work with either order of calling geqo() and standard_join_search().

regards, tom lane


#include postgres.h

#include fmgr.h
#include optimizer/geqo.h
#include optimizer/paths.h
#include optimizer/pathnode.h


PG_MODULE_MAGIC;

void_PG_init(void);
void_PG_fini(void);

static RelOptInfo *
my_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
{
RelOptInfo *dynamic_result, *geqo_result;
Cost dynamic_cost, geqo_cost;
List   *dynamic_list, *geqo_list;
struct HTAB *dynamic_hash, *geqo_hash;
int savelength;

savelength = list_length(root-join_rel_list);

root-join_rel_hash = NULL;

elog(LOG, Starting a join order search \dynamic\...);
dynamic_result = standard_join_search(root, levels_needed, 
initial_rels);
dynamic_cost = dynamic_result-cheapest_total_path-total_cost;

dynamic_list = list_copy(root-join_rel_list);
dynamic_hash = root-join_rel_hash;

root-join_rel_list = list_truncate(root-join_rel_list,

savelength);
root-join_rel_hash = NULL;

elog(LOG, Starting a join order search \geqo\...);
geqo_result = geqo(root, levels_needed, initial_rels);
geqo_cost = geqo_result-cheapest_total_path-total_cost;

geqo_list = list_copy(root-join_rel_list);
geqo_hash = root-join_rel_hash;

fprintf(stderr, GEQO cost: %f  Dynamic programming cost: %f\n,
geqo_cost, dynamic_cost);

if (geqo_cost  dynamic_cost)
{
root-join_rel_list = geqo_list;
root-join_rel_hash = geqo_hash;
return geqo_result;
}
else
{
root-join_rel_list = dynamic_list;
root-join_rel_hash = dynamic_hash;
return dynamic_result;
}
}


void
_PG_init(void)
{
join_search_hook = my_join_search;
}

void
_PG_fini(void)
{
join_search_hook = NULL;
}

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] Hash Index Build Patch

2007-09-26 Thread Tom Raney

Alvaro Herrera wrote:

Hi Tom,

Tom Raney wrote:

  

We used spool functions from the BTree code to sort the index
tuples. Sorting is done on the hash value of the tuples.  The hash
value depends on the number of primary bucket pages (henceforth
just bucket pages) that will be required to fit all the index
tuples. So, before sorting, the base relation is scanned to get
the total number of tuples.



Just wondering, wouldn't it be enough to obtain a tuple count estimate
by using reltuples / relpages * RelationGetNumberOfBlocks, like the
planner does?

  

Hello Alvaro,

We thought of that and the verdict is still out whether it is more 
costly to scan the entire relation to get the accurate count or use the 
estimate and hope for the best with the possibility of splits occurring 
during the build.   If we use the estimate and it is completely wrong 
(with the actual tuple count being much higher) the sort will provide no 
benefit and it will behave as did the original code.


But, to be honest, I don't know exactly when the catalog is updated and 
how accurate the estimate is.  If you have any information there (or 
anyone else) please let me know.  It would be great to eliminate that 
extra pass!


Sincerely,
Tom Raney



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Hash Index Build Patch

2007-09-26 Thread Tom Lane
Tom Raney [EMAIL PROTECTED] writes:
 Alvaro Herrera wrote:
 Just wondering, wouldn't it be enough to obtain a tuple count estimate
 by using reltuples / relpages * RelationGetNumberOfBlocks, like the
 planner does?

 We thought of that and the verdict is still out whether it is more 
 costly to scan the entire relation to get the accurate count or use the 
 estimate and hope for the best with the possibility of splits occurring 
 during the build.   If we use the estimate and it is completely wrong 
 (with the actual tuple count being much higher) the sort will provide no 
 benefit and it will behave as did the original code.

I think this argument is *far* too weak to justify an extra pass over
the relation.  The planner-style calculation is quite unlikely to give a
major underestimate of the rowcount.  It might overestimate, eg if the
relation is bloated by dead tuples, but an error in that direction won't
kill you.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Minor recovery changes

2007-09-26 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 As discussed on -hackers and -admin... a few minor changes, rolled into
 a single patch.

I've brought this up to speed with CVS HEAD and applied it.

I was not, however, sure what your intention was with the files in
test_warm_standby.tar.gz.  That's not really intended to go into the
contrib module, is it?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] [HACKERS] Document and/or remove unreachable code in tuptoaster.c from varvarlena patch

2007-09-26 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 [ revised version of tuptoaster-fixup.patch ]

I've applied most of this.  I didn't like the hardwired assumption about
minimum size to compress, and anyway you had put it in a place where it
broke the intended behavior for SET STORAGE PLAIN.  I replaced it with
just a test to save palloc/pfree overhead in toast_compress_datum, which
has at least got knowledge of which PGLZ strategy is being used.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org