date:20060825

Re: [HACKERS] New XML section for documentation

2006-08-25 Thread Nikolay Samokhvalov

On 8/26/06, Peter Eisentraut <[EMAIL PROTECTED]> wrote:

Bruce Momjian wrote:
> Validation
> --
> /contrib/xml2 has a function called xml_valid() that can be used in
> a CHECK constraint to enforce that a field contains valid XML.  It
> does not support validation against a specific XML schema.

Then this is not validation but only checking for well-formedness.  The
xml2 README says so, in fact.

Exactly. contrib/xml2 mixes the term here, xml_valid() should be
another function, that takes two types of data - XML value and
corresponding XML schema - and validate the XML data. Actually, the
latest version of SQL/XML standard includes such a function
(XMLVALIDATE).

If you decide to include the mentioning about contrib/xml2 to docs, I
would suggest the patch for this module. The patch renames that
function to xml_check() and adds xml_array() (issue from the current
TODO). Or it's too late for 8.2?

Also, I would add a little introduction to XML terms (from XML
standards) to this documentation section.

--
Best regards,
Nikolay

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [Open Item] Re: [HACKERS] Autovacuum on by default?

2006-08-25 Thread Peter Eisentraut

Jim C. Nasby wrote:
> I thought we had agreed it would be a good idea to turn autovac_delay
> on?

We had not, because there was no experience available about where to put 
the default numbers.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] New XML section for documentation

2006-08-25 Thread Peter Eisentraut

Bruce Momjian wrote:
> XML Document Support
> 
> XML support is not one capability, but a variety of features
> supported by a database.

database system

> Storage
> ---
> PostgreSQL stores XML documents as ordinary text documents.

It is "possible" to do that, but this sounds like it's done 
automatically or implicitly.  Maybe:

"PostgreSQL does not have a specialized XML data type.  The recommended 
way is to store XML documents as text."

> Import/Export
> -
> Because XML documents are stored as normal text documents, they can
> be imported/exported with little complexity.

Import/export refers to exporting schema data with XML decorations.  Of 
course you can export column data trivially, but that's not what this 
is about.

> Validation
> --
> /contrib/xml2 has a function called xml_valid() that can be used in
> a CHECK constraint to enforce that a field contains valid XML.  It
> does not support validation against a specific XML schema.

Then this is not validation but only checking for well-formedness.  The 
xml2 README says so, in fact.

> Indexing
> 

I think the expression index capability combined with contrib/xml2 is 
more relevant here than the full-text search capability.

> Transforming
> 
> /contrib/xml2 supports XSL transformations.

That's XSLT.

> XML to SQL Mapping
> ---
> This involves converting XML data to and from relational structures.
> PostgreSQL has no internal support for such mapping, and relies on
> external tools to do such conversions.

Are there instances of such tools?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [DOCS] New XML section for documentation

2006-08-25 Thread Magnus Hagander

> Indexing
> 
> Because XML documents are stored as text, full-text indexing tool
> /contrib/tsearch2 can be used to index XML documents.  Of 
> course, the searches are text searches, with no XML 
> awareness, but tsearch2 can be used with other XML 
> capabilities to dramatically reduce the amount of data 
> processed at the XML level.


You can also use a functional index and /contrib/xml2 to do limited
XPath indexing. (Can't make it "subtree-aware" for example, unless you
are willing to change your queries, but you can index specific xpath
nodes). 


//Magnus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Coding style for emacs

2006-08-25 Thread Bruce Momjian


I am still waiting for an actual patch to the developer's FAQ for this.

---

Gregory Stark wrote:
> 
> Andrew Dunstan <[EMAIL PROTECTED]> writes:
> 
> > I use more or less what is in the developers' FAQ (not surprising, since I
> > contributed it). It works just fine for me. See
> > http://www.postgresql.org/docs/faqs.FAQ_DEV.html#item1.9
> > 
> > We should probably bring the docs in line with that, unless someone with 
> > more
> > emacs-fu than me has something to add.
> 
> Oh, I've been meaning to email about this. There's a file in the source tree
> but it's buggy. The code in your FAQ seems ok, but I prefer the way I've done
> it:
> 
> (add-hook 'c-mode-hook
> (function
>  (lambda nil 
>(if (string-match "pgsql" buffer-file-name)
>(progn
>  (c-set-style "bsd")
>  (setq c-basic-offset 4) 
>  (setq tab-width 4)
>  (c-set-offset 'case-label '+)
>  (setq indent-tabs-mode t)
>  )
>  
> 
> 
> 
> 
> -- 
>   Gregory Stark
>   EnterpriseDB  http://www.enterprisedb.com
> 
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] CSStorm occurred again by postgreSQL8.2

2006-08-25 Thread Bruce Momjian


Is there anything to do for 8.2 here?

---

ITAGAKI Takahiro wrote:
> This is an additional information.
> 
> I wrote:
> > If we want to resolve the probmen fundamentally, we might have to
> > improve SubTrans using a better buffer management algorithm or so.
> 
> The above is maybe wrong. I checked each lwlock of pg_subtrans's buffers.
> All lwlocks are uniformly acquired and I could not see any differences
> among buffers. So the cause seems not to be a buffer management algorithm,
> but just a lack of SLRU buffer pages.
> 
> NUM_SUBTRANS_BUFFERS is defined as 32 in HEAD. If we increase it,
> we can avoid the old transaction problem for a certain time.
> However, it doesn't help much on high-load -- for example, on a workload
> with 2000 tps, we will use up 1000 pg_subtrans pages in 15 minites.
> I suppose it is not enough for online and batch/maintenance mix.
> 
> Also, the simple scanning way in SLRU will likely cause another performance
> issue when we highly increase the number of buffers. A sequential scanning
> is used in SLRU, so it will not work well against many buffers.
> 
> 
> I hope some cares in upper layer, snapshot, hitbits or something,
> being discussed in the recent thread.
> 
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
> 
> 
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] pgcrypto deprecated functions?

2006-08-25 Thread Bruce Momjian

Michael Fuhr wrote:
> In README.pgcrypto, Section 2.3 "Deprecated functions" says that
> digest_exists(), hmac_exists(), and cipher_exists() are planned to
> be removed in PostgreSQL 8.2.  Those functions still exist -- should
> they be removed or does that section need updating?

Yes, I see this text:

The `digest_exists()`, `hmac_exists()` and `cipher_exists()` functions
are deprecated.  The plan is to remove them in PostgreSQL 8.2.

Would someone address this?

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] VALUES clause memory optimization

2006-08-25 Thread Bruce Momjian


Has this been addressed?

---

Joe Conway wrote:
> Tom Lane wrote:
> > The reason we could safely list_free inside transformInsertRow is that
> > we know all its callers have just built the passed-in list and so there
> > are no other pointers to it.  That doesn't apply in the general case of
> > grammar output.
> 
> What about for the specific case of an InsertStmt? It seems that we 
> could at least get away with freeing the raw-expression list in that case.
> 
> In terms of freeing an entire arbitrary node, could we create a 
> backend/nodes/freefuncs.c file that does a recursive freeObject() 
> similar to the way copyObject() does in backend/nodes/copyfuncs.c?
> 
> > My advice is to get that low-hanging fruit
> > in transformInsertRow and leave the other ideas for 8.3.
> 
> OK. This should be safe also, correct?
> 
> Thanks,
> 
> Joe
> 
> 8<
> diff -c -r1.341 analyze.c
> *** src/backend/parser/analyze.c  2 Aug 2006 01:59:46 -   1.341
> --- src/backend/parser/analyze.c  2 Aug 2006 05:13:20 -
> ***
> *** 2191,2196 
> --- 2196,2202 
>   for (i = 0; i < sublist_length; i++)
>   {
>   coltypes[i] = select_common_type(coltype_lists[i], "VALUES");
> + list_free(coltype_lists[i]);
>   }
> 
>   newExprsLists = NIL;
> 
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Further reduction of bufmgr lock contention

2006-08-25 Thread Bruce Momjian


Is this being kept for 8.3?

---

Tom Lane wrote:
> A couple months ago we were discussing partitioning the buffer mapping
> table so as to reduce contention for the BufMappingLock.  The discussion
> stalled after I complained about the uncertainty of shared memory usage
> arising from having a separate hashtable for each partition (since
> different numbers of buffers might fall into each partition at different
> times).  I've been thinking about it more after seeing an example from
> Tatsuo that seems to be exactly the same problem as Gavin saw.
> 
> We could fix the uncertain-usage objection if we stick with a single
> hashtable and ensure that concurrent access to different partitions of
> it is safe.  I believe this is easily doable, if we make hashtables
> used in this mode allocate the maximum allowed number of buckets
> immediately during hashtable initialization.  (Since shared hashtables
> already have a fixed maximum directory size, they already have an upper
> bound on the number of buckets, so this loses no flexibility.)  Then
> there will be no on-the-fly bucket splits, and that means that accesses
> to different hash buckets operate completely independently.  Therefore,
> we can regard the hashtable as logically partitioned on the basis of any
> classification of entries that will uniquely assign hash buckets to
> classes --- taking the low order bits of entry hash codes will do fine.
> 
> The only changeable state that is shared across all buckets is the entry
> freelist and the "nentries" counter.  We could protect these with a
> spinlock (one spinlock is sufficient since changes to nentries go along
> with addition or removal of freelist entries).
> 
> Usage of a partitioned hash table would then be like
> 
>   compute hashtable lookup key;
>   entryhashcode = calc_hash(lookup key);
>   partitionnumber = entryhashcode % NumPartitions;
>   LWLockAcquire(PartitionLock[partitionnumber], ...);
>   manipulate hashtable;
>   LWLockRelease(PartitionLock[partitionnumber]);
> 
> We could do this without changing the API of hash_search, but then we'd
> be computing the entry hashcode twice, so I'm inclined to provide an
> additional entry point that takes a precalculated hashcode.
> 
> Potential downsides of applying this idea to the buffer mapping table:
> 
> 1. Reassigning a buffer to a new page will (usually) require two cycles
> of LWLockAcquire/Release for the two different partitions involved.
> Since this path also requires at least a read() kernel call (maybe a
> write() too), I don't think there'll be any meaningful slowdown.
> 
> 2. The current logic for reassigning a buffer attempts to make a
> hashtable entry for its new page number (to detect collisions) before
> releasing the old hashtable entry.  This would only be safe if we held
> both partition LWLocks concurrently; which seems bad for concurrency,
> plus avoiding deadlock would complicate the code significantly.  I'm
> inclined to release the old entry and then try to insert the new one,
> holding only one lock at a time.  If the insertion fails (because
> someone was concurrently loading the same page), we'd have to throw the
> buffer onto the freelist rather than allowing it to retain its previous
> valid data.  Which is slightly annoying, but in practice the case
> probably doesn't happen often enough to be worth worrying about.
> 
> 3. Taking the freelist spinlock is new computation that wasn't there
> before.  But, again, it's only needed in code paths that will also be
> doing a kernel call.
> 
> If we do this we should probably also handle the lmgr lock tables the
> same way (partially reverting my partition-the-LockMgrLock patch of a
> couple months ago).  However, downside #3 might be a stronger objection
> for lmgr, since it can create or destroy lock objects without necessarily
> doing any I/O.
> 
> Thoughts, objections?
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 6: explain analyze is your friend

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] New XML section for documentation

2006-08-25 Thread Euler Taveira de Oliveira

Bruce Momjian wrote:

> Here is an new XML section for our SGML documentation.  It explains the
> various XML capabilities, if we support them, and how to use them.
> 
> Comments?
> 
+1. Users often ask this in the mailing lists. Where are you want to
put this? I'll suggest: FAQ. What do you all think?

> Missing Features
> 
>   o  XQuery
>   o  SQL/XML syntax (ISO/IEC 9075-14)
>   o  XML data type optimized for XML storage
> 
Another section in TODO?


-- 
  Euler Taveira de Oliveira
  http://www.timbira.com/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] integration of pgcluster into postgresql

2006-08-25 Thread Jim C. Nasby

On Fri, Aug 25, 2006 at 05:40:09AM -0700, Chahine Hamila wrote:
> Hi there,
> 
> I guess many - if not most - here have tried
> pgcluster. For those who didn't, postgresql is pretty

I think you need to guess again. :)

> much the equivalent of pgcluster configured without
> load balancer or replicator, in read-write standalone
> mode. From a user point of view, that's three maximum
> additional configuration files, which can be basically
> set to those default values and distributed as is in
> standard (making installs/upgrades transparent to
> non-cluster environments). From a developer point of
> view, the pgcluster code is quite easy to take a hold
> on.
> 
> pgcluster still has quite a few pending issues, but
> it's good enough for many users. Integrating it as
> part of a standard postgresql distribution would
> likely not disrupt standard postgresql functionning,
> while giving it the replication features it lacks as
> is. It's also likely to accelerate its maturing by a
> more widespread adoption and as a result overcome most
> of its issues.
> 
> If the idea of its integration in the main postgresql
> code is of any interest to the postgresql team, I'm
> willing to invest some effort on it.

First, you need to review all the past discussion about the very
intentional decision not to build any replication into the core
database.

Second, pgcluster is (AFAIK) command-based replication, which has some
very, very serious drawbacks. If PostgreSQL were to include a
replication solution, I'd certainly hope it wouldn't be command-based.

Finally, pgcluster is very out-of-date. The last version uses 8.0.1 and
was released on Mar. 7, 2005. If the author can't find the time to
maintain it, I don't see why that burden should be put on the shoulders
of this community.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] New XML section for documentation

2006-08-25 Thread Bruce Momjian

Euler Taveira de Oliveira wrote:
> Bruce Momjian wrote:
> 
> > Here is an new XML section for our SGML documentation.  It explains the
> > various XML capabilities, if we support them, and how to use them.
> > 
> > Comments?
> > 
> +1. Users often ask this in the mailing lists. Where are you want to
> put this? I'll suggest: FAQ. What do you all think?

Our main documentation.  Once it is there, people will find it rather
than on the FAQ.

> > Missing Features
> > 
> > o  XQuery
> > o  SQL/XML syntax (ISO/IEC 9075-14)
> > o  XML data type optimized for XML storage
> > 
> Another section in TODO?

Perhaps, yea.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Jim C. Nasby

On Fri, Aug 25, 2006 at 11:25:43AM -0400, Tom Lane wrote:
> Alvaro Herrera <[EMAIL PROTECTED]> writes:
> > The problem is that what the qualifier is doing is modifying the
> > operation itself, not the properties of the index to be created, like
> > UNIQUE, which modifies the index.
> 
> Right, which was the same point Bruce made earlier.  And really you
> can't respect that difference while putting them into the same place in
> the word order.  So I'm starting to feel like maybe we should leave
> well enough alone.

Since we might eventually have other 'concurrent commands', perhaps

CONCURRENT CREATE INDEX ...

would be best.

BTW, if we started to consider lazy vacuum a concurrent command we could
ditch the use of FULL, which is always confusing if you're talking about
database-wide vacuums. I know it'd take many versions to fully make that
change, but it seems worth it to me to reduce confusion. There's also an
issue of newbies thinking they should use vacuum full regularly because
it's somehow better than lazyvac.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Jim C. Nasby

On Fri, Aug 25, 2006 at 06:57:58PM +0100, Gregory Stark wrote:
> I'll use this opportunity to plug that feature again. I think most people
> should use autocommit off with on_error_rollack on for most of their daily
> use. Being able to double check the results of my ad-hoc updates before
> committing them saved me more headaches than I can count with Oracle.
> Autocommit off only became practical for interactive use with postgres when
> savepoints showed up.

+1
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] default child of partition master

2006-08-25 Thread Jim C. Nasby

On Sun, Aug 20, 2006 at 12:58:31PM -0400, Gene wrote:
> Keep in mind if you have multiple rules for a master table, it won't return
> the number of affected rows as you might expect. This screws up Hibernate
> which I'm using for my application. It checks the return value to make sure
> it was inserted properly. Luckily I only need one rule which puts it into
> the "current" child table (im partitioning on current timestamp). I suppose
> I could get around this by using a stored procedure or something but that
> would not be as portable. I'm looking forward to future versions of PG which
> automate more of the partitioning features :) keep up the good work!

Sounds like a bug in Hibernate. It should be checking for errors
instead.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [Open Item] Re: [HACKERS] Autovacuum on by default?

2006-08-25 Thread Jim C. Nasby

On Fri, Aug 25, 2006 at 12:16:33PM -0400, Alvaro Herrera wrote:
> Tom Lane wrote:
> > "Matthew T. O'Connor"  writes:
> > > Peter Eisentraut wrote:
> > >> - Leave base thresholds alone (pending further analysis that might 
> > >> remove them 
> > >> altogether?)
> > 
> > > While there is talk of removing this all together, I think it was also 
> > > agreed that as long as these values are there, they should be reduced. 
> > > I think the defaults in 8.1 are 1000/500, I think 200/100 was suggested.
> > 
> > ISTM that if we don't want to remove the thresholds immediately,
> > we should make them default to zero for a release or two and see how
> > well it works.
> > 
> > At the moment I can't find the thread that discussed removing them,
> > but IIRC there were some good arguments why the thresholds should always
> > be zero.
> 
> I can't find it either, but I think the bug reported here is related:
> 
> http://archives.postgresql.org/pgsql-general/2006-06/thrd2.php#00951
> 
> On the other hand, I don't think we completely resolved this, so I
> proposed this be added to the "Open Items" list.

Yeah, I think there's reasons we can't go to zero. 200/100 or even 20/10
would probably be a good compromise.

I agree that droping to 0.08 might be a bit much, but it would be good
if we started recommending that value to folks to see how well it works.

I thought we had agreed it would be a good idea to turn autovac_delay
on? I know there was question as to what a good value would be, but
5-10ms seems pretty reasonable. I think it'd also be good to up the cost
threshold and the dirty_page cost, though I don't have much data to back
that up (I did testing at one customer on a drive array and found 300
and 30 were good values).

If we've got command stats turned on by default now, I'll have a hard
time buying performance as any reason to turn the others off. I think we
should turn them all on and let those who are trying to eek the last few
percent of performance out of a system turn them off.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [DOCS] [HACKERS] New XML section for documentation

2006-08-25 Thread Bruce Momjian

David Fetter wrote:
> On Fri, Aug 25, 2006 at 07:46:57PM -0400, Bruce Momjian wrote:
> > Here is an new XML section for our SGML documentation.  It explains

> > the various XML capabilities, if we support them, and how to use
> > them.
> > 
> > Comments?
> 
> This looks hauntingly similar to Peter's presentation at the

I used the XML/SQL and validation part from his talk, but the rest was
from earlier email discussions.

> conference. :)  I'd add a http://wiscorp.com/SQLStandards.html to the

This seems to be the best URL, but it seems too detailed:


http://wiscorp.com/H2-2005-197-SC32N1293-WG3_Presentation_for_SC32_20050418.pdf

> reference section.
> 
> Speaking of other parts of the SQL:2003 standard, how about one
> section each that mentions them?  There's
> 
> Part 4: SQL/PSM (Persistent Stored Modules)
> Part 9: SQL/MED (Management of External Data) (my favorite)
> Part 10: SQL/OLB (Object Language Binding)
> Part 11: SQL/Schemata
> Part 13: SQL/JRT (Java Routines and Types)

I don't know anything about them.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] New XML section for documentation

2006-08-25 Thread David Fetter

On Fri, Aug 25, 2006 at 07:46:57PM -0400, Bruce Momjian wrote:
> Here is an new XML section for our SGML documentation.  It explains
> the various XML capabilities, if we support them, and how to use
> them.
> 
> Comments?

This looks hauntingly similar to Peter's presentation at the
conference. :)  I'd add a http://wiscorp.com/SQLStandards.html to the
reference section.

Speaking of other parts of the SQL:2003 standard, how about one
section each that mentions them?  There's

Part 4: SQL/PSM (Persistent Stored Modules)
Part 9: SQL/MED (Management of External Data) (my favorite)
Part 10: SQL/OLB (Object Language Binding)
Part 11: SQL/Schemata
Part 13: SQL/JRT (Java Routines and Types)

Cheers,
D
-- 
David Fetter <[EMAIL PROTECTED]> http://fetter.org/
phone: +1 415 235 3778AIM: dfetter666
  Skype: davidfetter

Remember to vote!

---(end of broadcast)---
TIP 6: explain analyze is your friend

[HACKERS] New XML section for documentation

2006-08-25 Thread Bruce Momjian

Here is an new XML section for our SGML documentation.  It explains the
various XML capabilities, if we support them, and how to use them.

Comments?

---


XML Document Support

XML support is not one capability, but a variety of features supported
by a database.  These capabilities include storage, import/export,
validation, indexing, efficiency of modification,  searching,
transformating, and XML to SQL mapping.  PostgreSQL supports some but
not all of these XML capabilities.  Future releases of PostgreSQL will
continue to improve XML support.

Storage
---
PostgreSQL stores XML documents as ordinary text documents.  It does not
split apart XML documents into its component parts and store each
element separately.  You can use middle-ware solutions to do that, but
once done, the data becomes relational and has to be processed
accordingly.

Import/Export
-
Because XML documents are stored as normal text documents, they can be
imported/exported with little complexity.  A simple TEXT field can hold
up to 1 gigabyte of text, and large objects are available for larger
documents.

Validation
--
/contrib/xml2 has a function called xml_valid() that can be used in
a CHECK constraint to enforce that a field contains valid XML.  It
does not support validation against a specific XML schema.  A
server-side language with XML capabilities could be used to do
schema-specific XML checks.

Indexing

Because XML documents are stored as text, full-text indexing tool
/contrib/tsearch2 can be used to index XML documents.  Of course, the
searches are text searches, with no XML awareness, but tsearch2 can be
used with other XML capabilities to dramatically reduce the amount of
data processed at the XML level.

Modification
 
If an UPDATE does not modify an XML field, the XML data is shared
between the old and new rows.  However, if the UPDATE modifies a XML
field, a full modified copy of the XML field must be created internally.

Searching
-
XPath searches are implemented using /contrib/xml2.  It processes XML
text documents and returns results based on the requested query.

Transforming

/contrib/xml2 supports XSL transformations.

XML to SQL Mapping
---
This involves converting XML data to and from relational structures. 
PostgreSQL has no internal support for such mapping, and relies on
external tools to do such conversions.

Missing Features

o  XQuery
o  SQL/XML syntax (ISO/IEC 9075-14)
o  XML data type optimized for XML storage

See also http://www.rpbourret.com/xml/XMLAndDatabases.htm

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [GENERAL] invalid byte sequence ?

2006-08-25 Thread Alvaro Herrera

Tom Lane wrote:
> Martijn van Oosterhout  writes:
> > And as a counter-example: pg_dump should absolutly not use the client
> > locale, it should always dump as the same encoding as the server...
> 
> Sure, but pg_dump should set that explicitly.  I'm prepared to believe
> that looking at the locale is sane for all normal clients.

What are "normal clients"?  I would think that programs in PHP or Perl
have their own idea of the correct encoding (JDBC already has one).

> It might be worth providing a way to set the client_encoding through a
> PQconnectdb connection-string keyword, just in case the override-via-
> PGCLIENTENCODING dodge doesn't suit someone.  The priority order
> would presumably be connection string, then PGCLIENTENCODING, then
> locale.

This sounds like a good idea anyway...

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [GENERAL] invalid byte sequence ?

2006-08-25 Thread Tom Lane

Martijn van Oosterhout  writes:
> And as a counter-example: pg_dump should absolutly not use the client
> locale, it should always dump as the same encoding as the server...

Sure, but pg_dump should set that explicitly.  I'm prepared to believe
that looking at the locale is sane for all normal clients.

It might be worth providing a way to set the client_encoding through a
PQconnectdb connection-string keyword, just in case the override-via-
PGCLIENTENCODING dodge doesn't suit someone.  The priority order
would presumably be connection string, then PGCLIENTENCODING, then
locale.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [GENERAL] invalid byte sequence ?

2006-08-25 Thread Martijn van Oosterhout

On Fri, Aug 25, 2006 at 08:13:39PM +0200, Peter Eisentraut wrote:
> > I agree with Martijn that putting these into libpq's API 
> > seems like useless clutter.
> 
> Where else to put it?  We need it in libpq anyway if we want this 
> behavior in all client applications (by default).

Is that so? I thought we were only talkng about psql. Even then, I'm
wondering if we should alter the current behaviour at all if stdout is
not a tty (i.e. run as a pipe).

And as a counter-example: pg_dump should absolutly not use the client
locale, it should always dump as the same encoding as the server...

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] [GENERAL] invalid byte sequence ?

2006-08-25 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> I agree with Martijn that putting these into libpq's API 
>> seems like useless clutter.

> Where else to put it?  We need it in libpq anyway if we want this 
> behavior in all client applications (by default).

Having the code in libpq doesn't necessarily mean exposing it to the
outside world.  I can't see a reason for these to be in the API at all.

Possibly we could avoid the duplication-of-source-code issue by putting
the code in libpgport, or someplace, whence both initdb and libpq could
get at it?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [GENERAL] invalid byte sequence ?

2006-08-25 Thread Peter Eisentraut

Tom Lane wrote:
> Um, but initdb doesn't use libpq, so it's going to need its own copy
> anyway.

initdb certainly links against libpq.

> I agree with Martijn that putting these into libpq's API 
> seems like useless clutter.

Where else to put it?  We need it in libpq anyway if we want this 
behavior in all client applications (by default).

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [PATCHES] psql 'none' as a HISTFILE special case

2006-08-25 Thread Martin Atukunda


On 8/25/06, Tom Lane <[EMAIL PROTECTED]> wrote:

"Martin Atukunda" <[EMAIL PROTECTED]> writes:
> On 8/25/06, Tom Lane <[EMAIL PROTECTED]> wrote:
>> There's probably no way to get Apple's libedit to not try the fchmod,
>> so what do we want to do here?  Maybe special-case the string
>> "/dev/null"?

> If this is OK, I can up with a patch that special cases /dev/null as a
> HISTFILE if libedit is found.

I was thinking of basically a one-liner addition to write_history
to skip the whole thing if strcmp(fname, DEVNULL) == 0.  Should be
reasonably inoffensive on anyone's machine.


I guess you meant saveHistory instead of write_history here. :)

something like the attached diff

- Martin -


special_case_DEVNULL.diff
Description: Binary data

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Tom Lane

"Zeugswetter Andreas DCP SD" <[EMAIL PROTECTED]> writes:
>> That was what the patch originally used, but it was changed 
>> because it made difficult for psql to auto-complete that.

> That is imho not enough of a reason to divert.

My recollection is that the principal argument against ONLINE was
that it didn't convey the function of the option to anyone who
didn't already know Oracle's usage of the term.

Also, psql's problem is not with auto-completion, it's with
detecting whether the command is allowed inside a transaction
block.  That's not a function we can just blow off.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Gregory Stark

Alvaro Herrera <[EMAIL PROTECTED]> writes:

> That was what the patch originally used, but it was changed because it
> made difficult for psql to auto-complete that.

Psql has to be able to parse it not for auto-completion but because it needs
to know that it's not a transactional command. The regular CREATE INDEX can be
run from within a transaction but online index builds use two transactions on
their own so psql has to know not to insert a BEGIN and savepoint around it.

I'll use this opportunity to plug that feature again. I think most people
should use autocommit off with on_error_rollack on for most of their daily
use. Being able to double check the results of my ad-hoc updates before
committing them saved me more headaches than I can count with Oracle.
Autocommit off only became practical for interactive use with postgres when
savepoints showed up.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Zeugswetter Andreas DCP SD


> > precedent syntax (Oracle, Informix) uses the keyword ONLINE 
> at the end:
> >  CREATE INDEX blabla_x0 ON blabla (a,b) ONLINE;
> 
> That was what the patch originally used, but it was changed 
> because it made difficult for psql to auto-complete that.

That is imho not enough of a reason to divert.

Andreas

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Tom Lane

"Zeugswetter Andreas DCP SD" <[EMAIL PROTECTED]> writes:
> precedent syntax (Oracle, Informix) uses the keyword ONLINE at the end:
>  CREATE INDEX blabla_x0 ON blabla (a,b) ONLINE;

We rejected that one already ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] [PATCHES] psql 'none' as a HISTFILE special case

2006-08-25 Thread Tom Lane

"Martin Atukunda" <[EMAIL PROTECTED]> writes:
> On 8/25/06, Tom Lane <[EMAIL PROTECTED]> wrote:
>> There's probably no way to get Apple's libedit to not try the fchmod,
>> so what do we want to do here?  Maybe special-case the string
>> "/dev/null"?

> If this is OK, I can up with a patch that special cases /dev/null as a
> HISTFILE if libedit is found.

I was thinking of basically a one-liner addition to write_history
to skip the whole thing if strcmp(fname, DEVNULL) == 0.  Should be
reasonably inoffensive on anyone's machine.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Alvaro Herrera

Zeugswetter Andreas DCP SD wrote:
> 
> > > What bothers me about what we have now is that we have optional 
> > > keywords before and after INDEX, rather than only between 
> > CREATE and INDEX.
> > 
> > Yeah, putting them both into that space seems consistent to 
> > me, and it will fix the problem of making an omitted index 
> > name look like a valid command.
> > 
> > I'm not sure I should be opening this can of worms, but do we 
> > want to use a different keyword than CONCURRENTLY to make it 
> > read better there?
> 
> precedent syntax (Oracle, Informix) uses the keyword ONLINE at the end:
>  CREATE INDEX blabla_x0 ON blabla (a,b) ONLINE;

That was what the patch originally used, but it was changed because it
made difficult for psql to auto-complete that.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Performance testing of COPY (SELECT) TO

2006-08-25 Thread Zoltan Boszormenyi


Hi,

Böszörményi Zoltán írta:

Böszörményi Zoltán <[EMAIL PROTECTED]> writes:


With PostgreSQL 8.1.4, I used this:
  
begin;

select ... into temp myquery1;
copy myquery1 to stdout csv delimiter '|';
rollback;
  

The performance of this would doubtless vary a lot with the temp_buffers
setting.  Did you try different values?



Yes, I did, but now checked back with 8.2CVS.
The previously quoted result was achieved with
temp_buffers = 1000 on both 8.1.4 and 8.2CVS.
On 8.2CVS with temp_buffers = 4096, the 10 client case kills
the machine with swapping, but the 3 client runtime with
COPY(SELECT) went down to 2:41. The SELECT INTO TEMP
case went down to 3:36.

  

It'd also be interesting to time the same way (with a temp table) in
devel.  I don't remember whether we did any performance work on the
COPY CSV data path in this cycle, or whether that was all present in
8.1.  In any case it'd be worth proving that the COPY SELECT patch isn't
degrading performance of the copy-a-relation case.



I will report back with that, say on Monday.
  


It seems my previous mail hasn't reached
the hackers list, I answer here.

In the export, there is a largish table,
that has both many columns and rows.

With COPY(SELECT) patch applied:

time psql -c "copy (select * from table) to 'file'" dbx
COPY 886046

real0m13.253s
user0m0.000s
sys 0m0.000s

time psql -c "copy table to 'file'" dbx
COPY 886046

real0m13.234s
user0m0.000s
sys 0m0.000s

time psql -c "copy table to stdout" dbx >file

real0m15.155s
user0m0.540s
sys 0m0.450s

time psql -c "copy (select * from table) to stdout" dbx >file

real0m15.079s
user0m0.540s
sys 0m0.590s

Surprisingly, without the COPY(SELECT) patch it's slower,
this is the lowest from five runs, e.g. with warm caches:

time psql -c "copy table to 'file'" dbx

real0m20.464s
user0m0.000s
sys 0m0.010s

time psql -c "copy table to stdout" dbx >file

real0m25.753s
user0m0.570s
sys 0m0.460s

With the original settings, temp_buffers = 1000 on 8.2CVS,
the export runtime with one client looks like this:
first run 1:44, second run 1:12, third run 1:04.
It seems it's a bit faster both on startup and on
subsequent runs.

Best regards,
Zoltán Böszörményi



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Zeugswetter Andreas DCP SD


> > What bothers me about what we have now is that we have optional 
> > keywords before and after INDEX, rather than only between 
> CREATE and INDEX.
> 
> Yeah, putting them both into that space seems consistent to 
> me, and it will fix the problem of making an omitted index 
> name look like a valid command.
> 
> I'm not sure I should be opening this can of worms, but do we 
> want to use a different keyword than CONCURRENTLY to make it 
> read better there?

precedent syntax (Oracle, Informix) uses the keyword ONLINE at the end:
 CREATE INDEX blabla_x0 ON blabla (a,b) ONLINE;

I'd stick with that.

Andreas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [Pgsqlrpms-hackers] [HACKERS] Safer auto-initdb for RPM init

2006-08-25 Thread Bort, Paul

> 
> Am Freitag, 25. August 2006 16:31 schrieb Reinhard Max:
> > But shouldn't mountpoints always have 000 permissions to prevent
> > writing into the directory as long as nothing is mounted to it?
> 
> That's an interesting point, but in practice nobody does 
> that.  And we're 
> trying to defend exactly against the case where someone has 
> set up a mount 
> point manually.
> 

It had never occurred to me, but I'm definitely going to start doing it
now. So it will be in practice, at least around here.

Regards,
Paul Bort

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [PATCHES] psql 'none' as a HISTFILE special case

2006-08-25 Thread Martin Atukunda

On 8/25/06, Tom Lane <[EMAIL PROTECTED]> wrote:

> When I set HISTFILE to /dev/null I get the following:
> could not save history to file "/dev/null": Operation not permitted

Hm.  ktrace shows this happening:

 23279 psql CALL  open(0x302d70,0x601,0x1b6)
 23279 psql NAMI  "/dev/null"
 23279 psql RET   open 3
 23279 psql CALL  fchmod(0x3,0x180)
 23279 psql RET   fchmod -1 errno 1 Operation not permitted
 23279 psql CALL  close(0x3)
 23279 psql RET   close 0
 23279 psql CALL  write(0x2,0xb180,0x44)
 23279 psql GIO   fd 2 wrote 68 bytes
   "could not save history to file "/dev/null": Operation not permitted
   "
 23279 psql RET   write 68/0x44
 23279 psql CALL  exit(0)

There's probably no way to get Apple's libedit to not try the fchmod,
so what do we want to do here?  Maybe special-case the string
"/dev/null"?

If this is OK, I can up with a patch that special cases /dev/null as a
HISTFILE if libedit is found.

- Martin -

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Replication

2006-08-25 Thread Jeff Davis

On Fri, 2006-08-25 at 11:23 +0200, Markus Schiltknecht wrote:
> Jeff Davis wrote:
> > Which doesn't work very well in the case of two groups of servers set up
> > in two physical locations. I can see two possibilities:
> > (1) You require a quorum to be effective, in which case your cluster of
> > databases is only as reliable as the location which holds more servers.
> > (2) You have another central authority that determines which databases
> > are up, and which are down. Then your cluster is only as reliable as
> > that central authority.
> 
> Right, the ideal here would be two sync clusters a both locations, 
> connected via async replication :-)
> 
> > Even if you have a large number of nodes at different locations, then
> > you end up with strange decisions to make if the network connections are
> > intermittent or very slow. A temporary slowdown of many nodes could
> > cause them to be degraded until some kind of human intervention brought
> > them back. Until that time you might not be able to determine which
> > nodes make up an authoritative group. 
> 
> Side note: in such a case, I think a GCS will just choose only one node 
> to be the 'authoritative group'. Because most systems cannot effort to 
> have long waits for such decisions. For database replication I also 
> think its better to have at least one node running than none.
> 
>  > This kind of degradation could
> > happen in the case of a DDoS attack, or perhaps a worm moving around the
> > internet.
> 
> Well, sync replication in general needs a good, low latency and secure 
> interconnect. The internet does not seem to be a good fit here.
> 
> > In practice everyone can find a solution that works for them. However,
> > synchronous replication is not perfect, and there are many failure
> > scenarios which need to be resolved in a way that fits your business. I
> > think synchronous replication is inherently less available than
> > asynchronous.
> 
> This surely depends on the environment. With a dedicated (i.e. low 
> latency and secure) interconnect sync replication is surely more 
> available because your arguments above don't apply. And because sync 
> replication guarantees you won't loose committed transactions.
> 
> If however you want or have to replicate over the internet it depends. 
> Your arguments above also apply to async replication. Only that because 
> of the conflict resolution, async replication systems can continue to 
> operate on all the disconnected nodes and merge their work later on as 
> the network is up again. But then again, async still has the danger of 
> loosing transactions.
> 
> So I probably agree: if you are on an unreliable network and if you have 
> conflict resolution correctly setup then async replication is more 
> available, but less secure.
> 
> As I said above, sync replication needs a reliable interconnect, better 
> even have two interconnects, because it's a SPOF for a clustered 
> database system.
> 

Ok, I agree with your statements. Async is convenient in many ways, but
has less durable transactions (at least for transactions committed
recently). Sync has some limitations, and is harder to get right (at
least if you want good availability as well) but provides more durable
transactions and consistency between systems.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] [PATCHES] psql 'none' as a HISTFILE special case

2006-08-25 Thread Tom Lane

"Martin Atukunda" <[EMAIL PROTECTED]> writes:
> On 8/25/06, Peter Eisentraut <[EMAIL PROTECTED]> wrote:
>> Please elaborate on "doesn't work".

> without any .psqlrc file I get the following error when quitting a psql 
> session:
> could not save history to file "/Users/matlads/.psql_history": Invalid 
> argument

That is fixed in CVS HEAD.  The current coding looks like:

/*
 * return value of write_history is not standardized across GNU
 * readline and libedit.  Therefore, check for errno becoming set
 * to see if the write failed.
 */
errno = 0;
(void) write_history(fname);
if (errno == 0)
return true;

psql_error("could not save history to file \"%s\": %s\n",
   fname, strerror(errno));


> When I set HISTFILE to /dev/null I get the following:
> could not save history to file "/dev/null": Operation not permitted

Hm.  ktrace shows this happening:

 23279 psql CALL  open(0x302d70,0x601,0x1b6)
 23279 psql NAMI  "/dev/null"
 23279 psql RET   open 3
 23279 psql CALL  fchmod(0x3,0x180)
 23279 psql RET   fchmod -1 errno 1 Operation not permitted
 23279 psql CALL  close(0x3)
 23279 psql RET   close 0
 23279 psql CALL  write(0x2,0xb180,0x44)
 23279 psql GIO   fd 2 wrote 68 bytes
   "could not save history to file "/dev/null": Operation not permitted
   "
 23279 psql RET   write 68/0x44
 23279 psql CALL  exit(0)

There's probably no way to get Apple's libedit to not try the fchmod,
so what do we want to do here?  Maybe special-case the string
"/dev/null"?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[Open Item] Re: [HACKERS] Autovacuum on by default?

2006-08-25 Thread Alvaro Herrera

Tom Lane wrote:
> "Matthew T. O'Connor"  writes:
> > Peter Eisentraut wrote:
> >> - Leave base thresholds alone (pending further analysis that might remove 
> >> them 
> >> altogether?)
> 
> > While there is talk of removing this all together, I think it was also 
> > agreed that as long as these values are there, they should be reduced. 
> > I think the defaults in 8.1 are 1000/500, I think 200/100 was suggested.
> 
> ISTM that if we don't want to remove the thresholds immediately,
> we should make them default to zero for a release or two and see how
> well it works.
> 
> At the moment I can't find the thread that discussed removing them,
> but IIRC there were some good arguments why the thresholds should always
> be zero.

I can't find it either, but I think the bug reported here is related:

http://archives.postgresql.org/pgsql-general/2006-06/thrd2.php#00951

On the other hand, I don't think we completely resolved this, so I
proposed this be added to the "Open Items" list.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [GENERAL] invalid byte sequence ?

2006-08-25 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> Am Freitag, 25. August 2006 17:30 schrieb Martijn van Oosterhout:
>> Umm, why export all these functions. For starters, does this even need
>> to be in libpq?

> Where else would you put it?
> ...
> initdb has different requirements.  Let me know if you have a different way 
> to 
> refactor it that satisfies initdb.

Um, but initdb doesn't use libpq, so it's going to need its own copy
anyway.  I agree with Martijn that putting these into libpq's API
seems like useless clutter.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Autovacuum on by default?

2006-08-25 Thread Matthew T. O'Connor


Tom Lane wrote:

"Matthew T. O'Connor"  writes:
  
While there is talk of removing this all together, I think it was also 
agreed that as long as these values are there, they should be reduced. 
I think the defaults in 8.1 are 1000/500, I think 200/100 was suggested.



ISTM that if we don't want to remove the thresholds immediately,
we should make them default to zero for a release or two and see how
well it works.

At the moment I can't find the thread that discussed removing them,
but IIRC there were some good arguments why the thresholds should always
be zero.


I don't have any significant objection to removing them, it just seemed 
to me that we are late in release cycle and that might be more than we 
want to do at the moment.  If others think it's OK, then it's OK with me.



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [PATCHES] psql 'none' as a HISTFILE special case

2006-08-25 Thread Martin Atukunda


On 8/25/06, Peter Eisentraut <[EMAIL PROTECTED]> wrote:

Am Freitag, 25. August 2006 17:03 schrieb Martin Atukunda:
> hmm, setting HISTFILE to /dev/null doesn't work on my MacOSX here.

Please elaborate on "doesn't work".



without any .psqlrc file I get the following error when quitting a psql session:

could not save history to file "/Users/matlads/.psql_history": Invalid argument

When I set HISTFILE to /dev/null I get the following:

could not save history to file "/dev/null": Operation not permitted

- Martin -

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Autovacuum on by default?

2006-08-25 Thread Tom Lane

"Matthew T. O'Connor"  writes:
> Peter Eisentraut wrote:
>> - Leave base thresholds alone (pending further analysis that might remove 
>> them 
>> altogether?)

> While there is talk of removing this all together, I think it was also 
> agreed that as long as these values are there, they should be reduced. 
> I think the defaults in 8.1 are 1000/500, I think 200/100 was suggested.

ISTM that if we don't want to remove the thresholds immediately,
we should make them default to zero for a release or two and see how
well it works.

At the moment I can't find the thread that discussed removing them,
but IIRC there were some good arguments why the thresholds should always
be zero.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [GENERAL] invalid byte sequence ?

2006-08-25 Thread Martijn van Oosterhout

On Fri, Aug 25, 2006 at 05:38:20PM +0200, Peter Eisentraut wrote:
> > In fact, the only thing you need is PQsetClientEncodingFromLocale(),
> > anything else is just sugar. Why would the user care about what the OS
> > calls it? We have a "pg_enc" enum, so lets use it.
> 
> initdb has different requirements.  Let me know if you have a different way 
> to 
> refactor it that satisfies initdb.

Well, check_encodings_match(pg_enc,ctype) is simply a short way of
saying: if(find_matching_encoding(ctype) != pg_enc ) { error }.
And get_encoding_from_locale() is not used outside of those functions.

So the only thing initdb actually needs is an implementation of
find_matching_encoding(ctype), which returns a value of "enum pg_enc".
check_encodings_match() stays in initdb, and get_encoding_from_locale()
becomes internal to libpq.

How does that sound?

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] [GENERAL] invalid byte sequence ?

2006-08-25 Thread Peter Eisentraut

Am Freitag, 25. August 2006 17:30 schrieb Martijn van Oosterhout:
> Umm, why export all these functions. For starters, does this even need
> to be in libpq?

Where else would you put it?

> In fact, the only thing you need is PQsetClientEncodingFromLocale(),
> anything else is just sugar. Why would the user care about what the OS
> calls it? We have a "pg_enc" enum, so lets use it.

initdb has different requirements.  Let me know if you have a different way to 
refactor it that satisfies initdb.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Andrew Dunstan


Gregory Stark wrote:

Bruce Momjian <[EMAIL PROTECTED]> writes:

  

The original thinking was to use CONCURRENT, and CREATE CONCURRENT INDEX
sounded like a different type of index, not a different way to build the
index.  I don't think CONCURRENTLY has that problem, so CREATE
CONCURRENTLY INDEX sounds good.  To read in English, it would be read as
CREATE CONCURRENTLY, INDEX ii.



That doesn't sound like English at all to me.

Fwiw, I think the best option was what Tom did. The gotcha I tripped on seems
pretty minor to me.

  


It's a form of construction my father (who was a noted orator) loved to 
use, maybe a little too much. It is arguably slightly archaic, but 
nevertheless quite grammatical ;-) I agree that these days it is more 
idiomatic to defer the adverb until after the object of the verb in most 
cases.


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Autovacuum on by default?

2006-08-25 Thread Peter Eisentraut

Am Freitag, 25. August 2006 17:32 schrieb Matthew T. O'Connor:
> While there is talk of removing this all together, I think it was also
> agreed that as long as these values are there, they should be reduced.
> I think the defaults in 8.1 are 1000/500, I think 200/100 was suggested.

I'm thinking leaving them higher would offset the jump in the scale factor a 
bit.  But if the idea is to get rid of the base threshold, then that's not 
really a forward-looking strategy, I suppose.

I don't mind if we go down with the scale factor a little less and move the 
threshold down more, but at least tying the scale factor to the fill factor 
doesn't make all those numbers too random.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Autovacuum on by default?

2006-08-25 Thread Matthew T. O'Connor


Peter Eisentraut wrote:

Summarizing this thread, I see support for the following:

- autovacuum set to on by default in 8.2.


Yes.


- stats_row_level also defaults to on.


Yes.

(Perhaps stats_block_level should also default to on so it's not inconsistent, 
seeing that everything else in on, too.)


I haven't followed this closely, but are all the other stats commands 
other than block_level now on by default?  In general I would vote to 
keep it off if not needed just for performance reasons, though I haven't 
measured the effect of block_level turned on.  Anyone measured this?



- Delayed vacuum and delayed autovacuum will stay disabled.


Unfortunately.

- Scale factor set to 0.08 (vacuum) and 0.04 (analyze) (?)  (formerly 0.4 and 
0.2)


That seems a big jump.  BTW, I know .08 and .04 were suggested, but I 
didn't see confirmation that it was a good idea.  I know my initial 
values were grossly over-conservative, but I am concerned about bogging 
down the server with lots of vacuums, especially since we don't have the 
delay settings on by default, nor do we have a maintenance windows yet.


- Leave base thresholds alone (pending further analysis that might remove them 
altogether?)


While there is talk of removing this all together, I think it was also 
agreed that as long as these values are there, they should be reduced. 
I think the defaults in 8.1 are 1000/500, I think 200/100 was suggested.





---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] [GENERAL] invalid byte sequence ?

2006-08-25 Thread Martijn van Oosterhout

On Fri, Aug 25, 2006 at 05:07:03PM +0200, Peter Eisentraut wrote:
> I got started on this and just wanted to post an intermediate patch.  I have 
> taken the logic from initdb and placed it into libpq and refined the API a 
> bit.  At this point, there should be no behaviorial change.  It remains to 
> make libpq use this stuff if PGCLIENTENCODING is not set.  Unless someone 
> beats me, I'll figure that out later.

Umm, why export all these functions. For starters, does this even need
to be in libpq? I wouldn't have thought so the first time round,
especially not three functions. The only thing you need is to take a
locale name and return the charset you can pass to PQsetClientEncoding.

In fact, the only thing you need is PQsetClientEncodingFromLocale(),
anything else is just sugar. Why would the user care about what the OS
calls it? We have a "pg_enc" enum, so lets use it.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to 
> litigate.

signature.asc
Description: Digital signature

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Tom Lane

Alvaro Herrera <[EMAIL PROTECTED]> writes:
> The problem is that what the qualifier is doing is modifying the
> operation itself, not the properties of the index to be created, like
> UNIQUE, which modifies the index.

Right, which was the same point Bruce made earlier.  And really you
can't respect that difference while putting them into the same place in
the word order.  So I'm starting to feel like maybe we should leave
well enough alone.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Alvaro Herrera

Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > What bothers me about what we have now is that we have optional keywords
> > before and after INDEX, rather than only between CREATE and INDEX.
> 
> Yeah, putting them both into that space seems consistent to me, and
> it will fix the problem of making an omitted index name look like
> a valid command.
> 
> I'm not sure I should be opening this can of worms, but do we want to
> use a different keyword than CONCURRENTLY to make it read better there?

The problem is that what the qualifier is doing is modifying the
operation itself, not the properties of the index to be created, like
UNIQUE, which modifies the index.  So the qualifier should be something
that modifies the CREATE, that is, an adverb (AFAIK).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PATCHES] psql 'none' as a HISTFILE special case

2006-08-25 Thread Peter Eisentraut

Am Freitag, 25. August 2006 17:03 schrieb Martin Atukunda:
> hmm, setting HISTFILE to /dev/null doesn't work on my MacOSX here.

Please elaborate on "doesn't work".

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes:
> What bothers me about what we have now is that we have optional keywords
> before and after INDEX, rather than only between CREATE and INDEX.

Yeah, putting them both into that space seems consistent to me, and
it will fix the problem of making an omitted index name look like
a valid command.

I'm not sure I should be opening this can of worms, but do we want to
use a different keyword than CONCURRENTLY to make it read better there?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Safer auto-initdb for RPM init script

2006-08-25 Thread Peter Eisentraut

Am Freitag, 25. August 2006 16:54 schrieb Tom Lane:
> No, it doesn't.  (The Red Hat RPMs in fact did that ... for about
> a week ... until I was told in no uncertain terms that we don't
> start unnecessary daemons by default.)

Well, there seem to be philosophical differences between the various operating 
systems -- "We won't install unnecessary packages." vs. "We won't start 
unnecessary daemons in unnecessarily installed packages." -- in which case 
your solution doesn't sound all that bad.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [Pgsqlrpms-hackers] [HACKERS] Safer auto-initdb for RPM initscript

2006-08-25 Thread Peter Eisentraut

Am Freitag, 25. August 2006 17:17 schrieb Enver ALTIN:
> Am I the only one who believes that PostgreSQL project is not supposed
> to fix (or include workarounds for) some other systems that actually
> don't work very well?

Yes.

> If NFS is causing trouble, let it be.

NFS is not the trouble.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [Pgsqlrpms-hackers] [HACKERS] Safer auto-initdb for RPM initscript

2006-08-25 Thread Enver ALTIN

Merhaba,

On Fri, Aug 25, 2006 at 10:30:56AM -0400, Andrew Dunstan wrote:
> It is extremely careful. The point is that the NFS mount will hide the 
> existing datadir from initdb.

Am I the only one who believes that PostgreSQL project is not supposed
to fix (or include workarounds for) some other systems that actually
don't work very well?

If NFS is causing trouble, let it be.

Thanks,
-- 
Enver

signature.asc
Description: Digital signature

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Bruce Momjian

Gregory Stark wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> 
> > The original thinking was to use CONCURRENT, and CREATE CONCURRENT INDEX
> > sounded like a different type of index, not a different way to build the
> > index.  I don't think CONCURRENTLY has that problem, so CREATE
> > CONCURRENTLY INDEX sounds good.  To read in English, it would be read as
> > CREATE CONCURRENTLY, INDEX ii.
> 
> That doesn't sound like English at all to me.
> 
> Fwiw, I think the best option was what Tom did. The gotcha I tripped on seems
> pretty minor to me.

What bothers me about what we have now is that we have optional keywords
before and after INDEX, rather than only between CREATE and INDEX.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] [GENERAL] invalid byte sequence ?

2006-08-25 Thread Peter Eisentraut

Am Donnerstag, 24. August 2006 00:52 schrieb Tom Lane:
> A possible solution therefore is to have psql or libpq drive the
> client_encoding off the client's locale environment instead of letting
> it default to equal the server_encoding.

I got started on this and just wanted to post an intermediate patch.  I have 
taken the logic from initdb and placed it into libpq and refined the API a 
bit.  At this point, there should be no behaviorial change.  It remains to 
make libpq use this stuff if PGCLIENTENCODING is not set.  Unless someone 
beats me, I'll figure that out later.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


codeset-refactor.patch.gz
Description: GNU Zip compressed data

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Gregory Stark

Bruce Momjian <[EMAIL PROTECTED]> writes:

> The original thinking was to use CONCURRENT, and CREATE CONCURRENT INDEX
> sounded like a different type of index, not a different way to build the
> index.  I don't think CONCURRENTLY has that problem, so CREATE
> CONCURRENTLY INDEX sounds good.  To read in English, it would be read as
> CREATE CONCURRENTLY, INDEX ii.

That doesn't sound like English at all to me.

Fwiw, I think the best option was what Tom did. The gotcha I tripped on seems
pretty minor to me.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] [PATCHES] psql 'none' as a HISTFILE special case

2006-08-25 Thread Martin Atukunda


On 8/21/06, Bruce Momjian <[EMAIL PROTECTED]> wrote:

Tom Lane wrote:
> Neil Conway <[EMAIL PROTECTED]> writes:
> > On Mon, 2006-08-21 at 19:27 +0300, Martin Atukunda wrote:
> >> If readline is used by psql, a history file is automatically used.
> >> This patch adds the special file name 'none', which if set as the
> >> HISTFILE parameter, will cause psql not to use the history file.
>
> > I think it would be cleaner to use a separate \set variable to control
> > whether a history file is written, rather than needlessly overloading
> > the meaning of HISTFILE.
>
> Why is this useful at all?  There's already the -n (don't use readline)
> switch.

Seems he wants readline without history, perhaps for security.  Doesn't
setting HISTFILE to /dev/null work?


hmm, setting HISTFILE to /dev/null doesn't work on my MacOSX here. so
I whipped up this patch.

- Martin -

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes:
> The original thinking was to use CONCURRENT, and CREATE CONCURRENT INDEX
> sounded like a different type of index, not a different way to build the
> index.  I don't think CONCURRENTLY has that problem, so CREATE
> CONCURRENTLY INDEX sounds good.  To read in English, it would be read as
> CREATE CONCURRENTLY, INDEX ii.

OK, we've got two votes for that, so I'll make it so.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Bruce Momjian

Tom Lane wrote:
> Andrew Dunstan <[EMAIL PROTECTED]> writes:
> > I see we have:
> >  CREATE index_opt_unique INDEX CONCURRENTLY index_name ...
> > which explains how this error occurs.
> 
> Maybe to you, but I'm still caffeine-deprived and don't exactly see what
> it was that Greg mistyped.  AFAICS he'd have to type CONCURRENTLY twice
> to get into a scenario where the proposed warning would fire.
> 
> > But might it not be better to have this instead?
> >   CREATE CONCURRENTLY index_opt_unique INDEX index_name ...
> 
> When I was fooling with gram.y I was thinking that actually
> 
>   CREATE [UNIQUE] INDEX indexname [CONCURRENTLY] ...
> 
> would be the most grammatical thing.  But I can live with putting

The original thinking was to use CONCURRENT, and CREATE CONCURRENT INDEX
sounded like a different type of index, not a different way to build the
index.  I don't think CONCURRENTLY has that problem, so CREATE
CONCURRENTLY INDEX sounds good.  To read in English, it would be read as
CREATE CONCURRENTLY, INDEX ii.

> it right after CREATE, too.  Or there was the proposal to put it
> first:
> 
>   [CONCURRENTLY] CREATE [UNIQUE] INDEX indexname ...

I think this suggested the command was CONCURRENTLY, which isn't good.

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Safer auto-initdb for RPM init script

2006-08-25 Thread Tom Lane

Peter Eisentraut <[EMAIL PROTECTED]> writes:
> Am Freitag, 25. August 2006 16:20 schrieb Tom Lane:
>> It eats rather a lot of disk space for a package that might just be
>> getting loaded as part of a system install, with no likelihood of
>> actually being used.

> Wouldn't the system install start the init script at the end of the 
> installation process anyway?

No, it doesn't.  (The Red Hat RPMs in fact did that ... for about
a week ... until I was told in no uncertain terms that we don't
start unnecessary daemons by default.)

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Safer auto-initdb for RPM init script

2006-08-25 Thread Gregory Stark


Tom Lane <[EMAIL PROTECTED]> writes:

> Comments?  Anyone see a better way?

Well the truly bullet-proof mechanism would be to check every data file on
every open. You could have a header with some kind of unique tag generated at
initdb time and the backend could ensure it matches the same tag in the
control flag whenever you open a data file.

That might be too expensive though I don't see data files getting opened all
that frequently. You could do the same thing for free by putting the tag in
the file names though.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [Pgsqlrpms-hackers] [HACKERS] Safer auto-initdb for RPM init script

2006-08-25 Thread Tom Lane

Reinhard Max  writes:
> Another flaw of the flag-file method is, that PGDATA might have been 
> changed by the sysadmin between installing the RPM and calling the 
> init script for the first time.

What problem do you see there?  With either of these methods, a manual
change in PGDATA would require a manual initdb before the postmaster
would start.  That seems like a good conservative thing to me.

(Actually, with the flag-file method you could get the initscript
to run initdb for you by hand-creating the flag file, but it seems
unlikely people would do that in practice.)

> But shouldn't mountpoints always have 000 permissions to prevent 
> writing into the directory as long as nothing is mounted to it?

Not sure that that helps much given that the initscript runs as root.
And in any case the point here is to protect against human error,
not to assume that the installation is managed according to very
best practices.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [Pgsqlrpms-hackers] [HACKERS] Safer auto-initdb for RPM init

2006-08-25 Thread Peter Eisentraut

Am Freitag, 25. August 2006 16:31 schrieb Reinhard Max:
> But shouldn't mountpoints always have 000 permissions to prevent
> writing into the directory as long as nothing is mounted to it?

That's an interesting point, but in practice nobody does that.  And we're 
trying to defend exactly against the case where someone has set up a mount 
point manually.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Safer auto-initdb for RPM init script

2006-08-25 Thread Peter Eisentraut

Am Freitag, 25. August 2006 16:20 schrieb Tom Lane:
> It eats rather a lot of disk space for a package that might just be
> getting loaded as part of a system install, with no likelihood of
> actually being used.

Wouldn't the system install start the init script at the end of the 
installation process anyway?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [Pgsqlrpms-hackers] [HACKERS] Safer auto-initdb for RPM init

2006-08-25 Thread Reinhard Max

On Fri, 25 Aug 2006 at 10:20, Tom Lane wrote:

> If this were a bulletproof solution then I'd consider it anyway, but 
> AFAICS it's got the very same vulnerabilities as the flag-file 
> method, ie, if you RPM install or upgrade while your mountable data 
> directory is offline, you can still get screwed.

Another flaw of the flag-file method is, that PGDATA might have been 
changed by the sysadmin between installing the RPM and calling the 
init script for the first time.

But shouldn't mountpoints always have 000 permissions to prevent 
writing into the directory as long as nothing is mounted to it?

cu
Reinhard

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [Pgsqlrpms-hackers] [HACKERS] Safer auto-initdb for RPM initscript

2006-08-25 Thread Andrew Dunstan


Sander Steffann wrote:

Hi,
 
  

If this were a bulletproof solution then I'd consider it anyway, but
AFAICS it's got the very same vulnerabilities as the flag-file method,
ie, if you RPM install or upgrade while your mountable data directory
is offline, you can still get screwed.



Isn't the most bulletproof solution to make initdb more careful about
overwriting an existing data directory?

  


It is extremely careful. The point is that the NFS mount will hide the 
existing datadir from initdb.


cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [Pgsqlrpms-hackers] [HACKERS] Safer auto-initdb for RPM initscript

2006-08-25 Thread Sander Steffann

Hi,
 
> If this were a bulletproof solution then I'd consider it anyway, but
> AFAICS it's got the very same vulnerabilities as the flag-file method,
> ie, if you RPM install or upgrade while your mountable data directory
> is offline, you can still get screwed.

Isn't the most bulletproof solution to make initdb more careful about
overwriting an existing data directory?

- Sander


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] [Pgsqlrpms-hackers] Safer auto-initdb for RPM init script

2006-08-25 Thread Joe Conway


Tom Lane wrote:

We've seen more than one report of corruption of PG databases that
seemed to be due to the willingness of the RPM init script to run
initdb if it thinks the data directory isn't there.  This is pretty
darn risky on an NFS volume, for instance, which might be offline
at the instant the script looks.  The failure case is

- script doesn't see data directory
- script runs initdb and starts postmaster
- offline volume comes online
- KABOOM


Been there, done exactly that...



I can still imagine ways for this to fail, eg if you run an RPM
install or upgrade while your mountable data directory is offline.
But it ought to be an order of magnitude safer than things are now.
(Hm, maybe the %post script should only run during an RPM install,
not an upgrade.)


That's probably a good plan.



Comments?  Anyone see a better way?


I can't think of any offhand that aren't too expensive. We ended up 
putting a root-owned empty data directory beneath the mount point, but 
that can't be automated.


We also decided to turn off the init script execution entirely. The DBAs 
were more comfortable with a manual database startup for a production 
machine anyway (this is the way they typically handle Oracle databases 
also). They get paged if the server ever goes down unplanned, and in 
that event they like to check things out before bringing the db back up. 
For planned outages, database startup is simply part of the plan.


Joe

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Safer auto-initdb for RPM init script

2006-08-25 Thread Tom Lane

"Magnus Hagander" <[EMAIL PROTECTED]> writes:
>> I don't really want to remove the auto-initdb feature from the
>> script, because it's important not to drive away newbies by making
>> Postgres hard to start for the first time.  But I think we'd better
>> think about ways to make it more bulletproof.

> Why does initdb have to happen on startup? Wouldn't it be much more
> logical to do it at install time?

It eats rather a lot of disk space for a package that might just be
getting loaded as part of a system install, with no likelihood of
actually being used.  In CVS tip a just-initdb'd data directory seems
to be a shade under 30MB, which I guess isn't a huge amount these days
but it compares unfavorably with the installed footprint of the code
itself (postgresql-server RPM looks to be about 4MB).

If this were a bulletproof solution then I'd consider it anyway, but
AFAICS it's got the very same vulnerabilities as the flag-file method,
ie, if you RPM install or upgrade while your mountable data directory
is offline, you can still get screwed.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Performance testing of COPY (SELECT) TO

2006-08-25 Thread Böszörményi Zoltán

> It'd also be interesting to time the same way (with a temp table) in
> devel.  I don't remember whether we did any performance work on the
> COPY CSV data path in this cycle, or whether that was all present in
> 8.1.  In any case it'd be worth proving that the COPY SELECT patch isn't
> degrading performance of the copy-a-relation case.

In the export, there is a largish table, that has both many columns and rows.

With COPY(SELECT) patch applied:

time psql -c "copy (select * from table) to 'file'" dbx
COPY 886046

real0m13.253s
user0m0.000s
sys 0m0.000s

time psql -c "copy table to 'file'" dbx
COPY 886046

real0m13.234s
user0m0.000s
sys 0m0.000s

time psql -c "copy table to stdout" dbx >file

real0m15.155s
user0m0.540s
sys 0m0.450s

time psql -c "copy (select * from table) to stdout" dbx >file

real0m15.079s
user0m0.540s
sys 0m0.590s

Surprisingly, without the COPY(SELECT) patch it's slower,
it's the lowest from five runs, e.g. with warm caches:

time psql -c "copy table to 'file'" dbx

real0m20.464s
user0m0.000s
sys 0m0.010s

time psql -c "copy table to stdout" dbx >file

real0m25.753s
user0m0.570s
sys 0m0.460s

With the original settings, temp_buffers = 1000 on 8.2CVS,
the one client case looks like this: first run 1:44, second run 1:12,
third run 1:04. It seems it's a bit faster both on startup and on
subsequent runs.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Safer auto-initdb for RPM init script

2006-08-25 Thread Magnus Hagander

> I don't really want to remove the auto-initdb feature from the
> script, because it's important not to drive away newbies by making
> Postgres hard to start for the first time.  But I think we'd better
> think about ways to make it more bulletproof.

Why does initdb have to happen on startup? Wouldn't it be much more
logical to do it at install time? (like we do in the win32 installer,
for example, I'm sure there are other examples as well)


//Magnus


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Stefan Kaltenbrunner

Tom Lane wrote:
> Andrew Dunstan <[EMAIL PROTECTED]> writes:
>> I see we have:
>>  CREATE index_opt_unique INDEX CONCURRENTLY index_name ...
>> which explains how this error occurs.
> 
> Maybe to you, but I'm still caffeine-deprived and don't exactly see what
> it was that Greg mistyped.  AFAICS he'd have to type CONCURRENTLY twice
> to get into a scenario where the proposed warning would fire.

i guess Greg is talking about something like(ie just forgetting to name
the index):


devel=# create index concurrently on foo ( a);
CREATE INDEX
devel=# \d foo
  Table "public.foo"
 Column |  Type   | Modifiers
+-+---
 a  | integer |
 b  | text|
 c  | integer |
 d  | text|
Indexes:
"concurrently" btree (a)


Stefan

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Andrew Dunstan


Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:
  

I see we have:
 CREATE index_opt_unique INDEX CONCURRENTLY index_name ...
which explains how this error occurs.



Maybe to you, but I'm still caffeine-deprived and don't exactly see what
it was that Greg mistyped.  AFAICS he'd have to type CONCURRENTLY twice
to get into a scenario where the proposed warning would fire.

  


AAUI, he left off the index name so the first rule was matched rather 
than the second, with "concurrently" being the index name.


cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Tom Lane

Andrew Dunstan <[EMAIL PROTECTED]> writes:
> I see we have:
>  CREATE index_opt_unique INDEX CONCURRENTLY index_name ...
> which explains how this error occurs.

Maybe to you, but I'm still caffeine-deprived and don't exactly see what
it was that Greg mistyped.  AFAICS he'd have to type CONCURRENTLY twice
to get into a scenario where the proposed warning would fire.

> But might it not be better to have this instead?
>   CREATE CONCURRENTLY index_opt_unique INDEX index_name ...

When I was fooling with gram.y I was thinking that actually

CREATE [UNIQUE] INDEX indexname [CONCURRENTLY] ...

would be the most grammatical thing.  But I can live with putting
it right after CREATE, too.  Or there was the proposal to put it
first:

[CONCURRENTLY] CREATE [UNIQUE] INDEX indexname ...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Performance testing of COPY (SELECT) TO

2006-08-25 Thread Böszörményi Zoltán

> Böszörményi Zoltán <[EMAIL PROTECTED]> writes:
>> With PostgreSQL 8.1.4, I used this:
>
>> begin;
>> select ... into temp myquery1;
>> copy myquery1 to stdout csv delimiter '|';
>> rollback;
>
> The performance of this would doubtless vary a lot with the temp_buffers
> setting.  Did you try different values?

Yes, I did, but now checked back with 8.2CVS.
The previously quoted result was achieved with
temp_buffers = 1000 on both 8.1.4 and 8.2CVS.
On 8.2CVS with temp_buffers = 4096, the 10 client case kills
the machine with swapping, but the 3 client runtime with
COPY(SELECT) went down to 2:41. The SELECT INTO TEMP
case went down to 3:36.

> It'd also be interesting to time the same way (with a temp table) in
> devel.  I don't remember whether we did any performance work on the
> COPY CSV data path in this cycle, or whether that was all present in
> 8.1.  In any case it'd be worth proving that the COPY SELECT patch isn't
> degrading performance of the copy-a-relation case.

I will report back with that, say on Monday.

In the meantime, I documented the COPY (SELECT) case
and modified parser/analyze.c and tcop/utility.c so neither of them
calls anything from under another directory. I think it's cleaner now.
Also, I tried to implement more closely what printtup() does.
Please, review.

Best regards,
Zoltán Böszörményi

pgsql-copyselect-7.patch.gz
Description: GNU Zip compressed data

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Safer auto-initdb for RPM init script

2006-08-25 Thread Peter Eisentraut

Am Freitag, 25. August 2006 15:19 schrieb Tom Lane:
> I don't really want to remove the auto-initdb feature from the script,
> because it's important not to drive away newbies by making Postgres
> hard to start for the first time.  But I think we'd better think about
> ways to make it more bulletproof.

Why not run initdb in the %post and not in the init script at all?  That 
should be newbie-friendly as well.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Andrew Dunstan


Gregory Stark wrote:

Do we want something like this? I just made this error myself so unless I'm
special (pauses for jokes) I imagine others would be prone to it as well.

I would normally be pretty leery of code like this but it seems unlikely
anyone would actually want an index named "concurrently" and the consequences
if you get it wrong in a production environment are pretty dire. We might even
consider making it an outright error.


--- gram.y  25 Aug 2006 10:14:17 +0100  2.558
+++ gram.y  25 Aug 2006 14:04:54 +0100  
@@ -56,6 +56,7 @@
 #include "commands/defrem.h"
 #include "nodes/makefuncs.h"
 #include "parser/gramparse.h"
+#include "parser/scansup.h"
 #include "storage/lmgr.h"
 #include "utils/date.h"
 #include "utils/datetime.h"
@@ -3653,6 +3654,12 @@
opt_definition OptTableSpace where_clause
{
IndexStmt *n = makeNode(IndexStmt);
+
+   if 
(!strcmp(downcase_truncate_identifier($4,20,false), "concurrently"))
+   ereport(WARNING,
+   
(errcode(ERRCODE_SYNTAX_ERROR),
+errmsg("performing 
non-concurrent index build of index named \"concurrently\"")));
+
n->unique = $2;
n->concurrent = false;
n->idxname = $4;
  



I see we have:

CREATE index_opt_unique INDEX CONCURRENTLY index_name ...


which explains how this error occurs. But might it not be better to have 
this instead?


 CREATE CONCURRENTLY index_opt_unique INDEX index_name ...

Then ISTM no ambiguity could arise (and it's also closer to grammatical 
English, if that matters).


Just a thought

cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

[HACKERS] Safer auto-initdb for RPM init script

2006-08-25 Thread Tom Lane

We've seen more than one report of corruption of PG databases that
seemed to be due to the willingness of the RPM init script to run
initdb if it thinks the data directory isn't there.  This is pretty
darn risky on an NFS volume, for instance, which might be offline
at the instant the script looks.  The failure case is

- script doesn't see data directory
- script runs initdb and starts postmaster
- offline volume comes online
- KABOOM

The initdb creates a "local" database that's physically on the root
volume underneath the mountpoint directory for the intended volume.
After the mountable volume comes online, these files are shadowed
by the original database files.  The problem is that by this point the
postmaster has a copy of pg_control in memory from the freshly-initdb'd
database, and that pg_control has a WAL end address and XID counter far
less than is correct for the real database.  Havoc ensues, very probably
resulting in a hopelessly corrupt database.

I don't really want to remove the auto-initdb feature from the script,
because it's important not to drive away newbies by making Postgres
hard to start for the first time.  But I think we'd better think about
ways to make it more bulletproof.

The first thought that comes to mind is to have the RPM install create
the data directory if not present and create a flag file in it showing
that it's safe to initdb.  Then the script is allowed to initdb only
if it finds the directory and the flag file but not PG_VERSION.
Something like (untested off-the-cuff coding)

%post server

if [ ! -d $PGDATA ]; then
mkdir $PGDATA
touch $PGDATA/NO_DATABASE_YET
fi

and in initscript

if [ -d $PGDATA -a -f $PGDATA/NO_DATABASE_YET -a ! -f 
$PGDATA/PG_VERSION ] ; then
rm -f $PGDATA/NO_DATABASE_YET && initdb ...
fi

If the data directory is not mounted then the -d test would fail,
unless the directory is itself the mount point, in which case it
would be there but not contain the NO_DATABASE_YET file.

I can still imagine ways for this to fail, eg if you run an RPM
install or upgrade while your mountable data directory is offline.
But it ought to be an order of magnitude safer than things are now.
(Hm, maybe the %post script should only run during an RPM install,
not an upgrade.)

Comments?  Anyone see a better way?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Gregory Stark


Do we want something like this? I just made this error myself so unless I'm
special (pauses for jokes) I imagine others would be prone to it as well.

I would normally be pretty leery of code like this but it seems unlikely
anyone would actually want an index named "concurrently" and the consequences
if you get it wrong in a production environment are pretty dire. We might even
consider making it an outright error.


--- gram.y  25 Aug 2006 10:14:17 +0100  2.558
+++ gram.y  25 Aug 2006 14:04:54 +0100  
@@ -56,6 +56,7 @@
 #include "commands/defrem.h"
 #include "nodes/makefuncs.h"
 #include "parser/gramparse.h"
+#include "parser/scansup.h"
 #include "storage/lmgr.h"
 #include "utils/date.h"
 #include "utils/datetime.h"
@@ -3653,6 +3654,12 @@
opt_definition OptTableSpace where_clause
{
IndexStmt *n = makeNode(IndexStmt);
+
+   if 
(!strcmp(downcase_truncate_identifier($4,20,false), "concurrently"))
+   ereport(WARNING,
+   
(errcode(ERRCODE_SYNTAX_ERROR),
+
errmsg("performing non-concurrent index build of index named 
\"concurrently\"")));
+
n->unique = $2;
n->concurrent = false;
n->idxname = $4;
-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Performance testing of COPY (SELECT) TO

2006-08-25 Thread Tom Lane

=?iso-8859-2?Q?B=F6sz=F6rm=E9nyi_Zolt=E1n?= <[EMAIL PROTECTED]> writes:
> With PostgreSQL 8.1.4, I used this:

> begin;
> select ... into temp myquery1;
> copy myquery1 to stdout csv delimiter '|';
> rollback;

The performance of this would doubtless vary a lot with the temp_buffers
setting.  Did you try different values?

It'd also be interesting to time the same way (with a temp table) in
devel.  I don't remember whether we did any performance work on the
COPY CSV data path in this cycle, or whether that was all present in
8.1.  In any case it'd be worth proving that the COPY SELECT patch isn't
degrading performance of the copy-a-relation case.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Tom Lane

Gregory Stark <[EMAIL PROTECTED]> writes:
> Because of the way the AM API works changing how the initial heap scan works
> is a bit of a pain. It would require either having some global state or
> passing the concurrent flag through the AM methods or alternatively having a
> whole new AM method.

Yeah, I dealt with that by adding a 'concurrent build' flag to IndexInfo.
A bit grotty but it beats changing a lot of function signatures.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[HACKERS] integration of pgcluster into postgresql

2006-08-25 Thread Chahine Hamila

Hi there,

I guess many - if not most - here have tried
pgcluster. For those who didn't, postgresql is pretty
much the equivalent of pgcluster configured without
load balancer or replicator, in read-write standalone
mode. From a user point of view, that's three maximum
additional configuration files, which can be basically
set to those default values and distributed as is in
standard (making installs/upgrades transparent to
non-cluster environments). From a developer point of
view, the pgcluster code is quite easy to take a hold
on.

pgcluster still has quite a few pending issues, but
it's good enough for many users. Integrating it as
part of a standard postgresql distribution would
likely not disrupt standard postgresql functionning,
while giving it the replication features it lacks as
is. It's also likely to accelerate its maturing by a
more widespread adoption and as a result overcome most
of its issues.

If the idea of its integration in the main postgresql
code is of any interest to the postgresql team, I'm
willing to invest some effort on it.

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] RV: Problem at install / uninstall postgresql

2006-08-25 Thread Peter Eisentraut

Am Mittwoch, 23. August 2006 19:04 schrieb Miguel Pedreño Saura:
> Setting up postgresql-7.4 (7.4.12-3) ...
> Error: cluster configuration already exists

Report this issue to the Ubuntu packager.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] [PATCHES] Intervals within information_schema

2006-08-25 Thread Greg Sabino Mullane


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


>> Does anyone know what is supposed to go into the interval_type field?

> INTERVAL_TYPE IN
> ( 'YEAR', 'MONTH', 'DAY', 'HOUR', 'MINUTE', 'SECOND',
>   'YEAR TO MONTH', 'DAY TO HOUR', 'DAY TO MINUTE',
>   'DAY TO SECOND', 'HOUR TO MINUTE',
>   'HOUR TO SECOND', 'MINUTE TO SECOND' )
>
> Also, depending on which one of these is specified, the precision field
> varies in meaning.  I haven't done a complete analysis, but it might
> not be as simple as your patch suggests.

Yes, I saw some docs that hinted as much. I'm still not clear on exactly
what interval_type is supposed to represent, or at least how it would ever
apply to a Postgres attribute. It almost seems as though the spec is calling
for interval_type of 'HOURS' and interval_precision of '2' for an example
interval of '2 hours'? I'd normally say that we should leave it as NULL in
such a case, but someone added a "FIXME" at some point, which led me to
ask exactly what should go in there. Not a big deal, I suppose - it looks
as though few, if any, other products are using those fields either.
Thanks for the reply.

- --
Greg Sabino Mullane [EMAIL PROTECTED]
End Point Corporation
PGP Key: 0x14964AC8 200608250614
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-BEGIN PGP SIGNATURE-

iD8DBQFE7s26vJuQZxSWSsgRAjFZAJ9SNXevNWtN0HXCK9D0FPsYN/9FTQCfeDX8
srbrlp4kO3co/bKPmCjP+q8=
=YaXx
-END PGP SIGNATURE-



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] RV: Problem at install / uninstall postgresql

2006-08-25 Thread Miguel Pedreño Saura









 

Hello!

 

    I’m installing postgresql in a O.S. Linux Ubuntu
6.06 and I have the next printout at installation process:

 

 

Setting up postgresql-7.4
(7.4.12-3) ...

Error: cluster configuration
already exists

 

 

    I can’t find information about how I can solve the
problem at Internet and I can install/uninstall anythings.

 

 

[EMAIL PROTECTED]:/tmp# aptitude
install postgresql

Reading package lists... Done

Building dependency tree...
Done

Reading extended state information

Initializing package states...
Done

Building tag database... Done

No packages will be
installed, upgraded, or removed.

0 packages upgraded, 0 newly
installed, 0 to remove and 0 not upgraded.

Need to get 0B of archives.
After unpacking 0B will be used.

Writing extended state information...
Done

Setting up postgresql-7.4
(7.4.12-3) ...

Error: cluster configuration
already exists

dpkg: error processing
postgresql-7.4 (--configure):

 subprocess post-installation script returned
error exit status 1

dpkg: dependency problems
prevent configuration of postgresql:

 postgresql depends on postgresql-7.4; however:

  Package postgresql-7.4 is not configured yet.

dpkg: error processing postgresql (--configure):

… 

 

postgresql-pltcl-7.4 depends on postgresql-7.4; however:

  Package postgresql-7.4 is not configured yet.

dpkg: error processing postgresql-pltcl-7.4 (--configure):

 dependency problems - leaving unconfigured

Errors were encountered while
processing:

 postgresql-7.4

 postgresql

 postgresql-plperl-7.4

 postgresql-plpython-7.4

 postgresql-pltcl-7.4

E: Sub-process /usr/bin/dpkg
returned an error code (1)

A package failed to
install.  Trying to recover:

Setting up postgresql-7.4
(7.4.12-3) ...

Error: cluster configuration
already exists

dpkg: error processing
postgresql-7.4 (--configure):

 subprocess post-installation script returned
error exit status 1

dpkg: dependency problems
prevent configuration of postgresql-pltcl-7.4:

 postgresql-pltcl-7.4 depends on
postgresql-7.4; however:

 

…..

  Package postgresql-7.4 is not configured yet.

dpkg: error processing
postgresql-plpython-7.4 (--configure):

 dependency problems - leaving unconfigured

Errors were encountered while
processing:

 postgresql-7.4

 postgresql-pltcl-7.4

 postgresql

 postgresql-plperl-7.4

 postgresql-plpython-7.4

[EMAIL PROTECTED]:/tmp#

 

 

 

Thanks, Miguel

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread stark

Tom Lane <[EMAIL PROTECTED]> writes:

> Greg Stark <[EMAIL PROTECTED]> writes:
>> Hmmm. Or is that true. The problem may be somewhat easier since at least you
>> can be sure every tuple in the heap is in the index. So if you see a
>> DELETE_IN_PROGRESS either it *was* a constraint violation prior to the delete
>> and failing is reasonable or it's an update in which case maybe it's possible
>> to detect that they're part of the same chain?
>
> Unless we are willing to lock every single tuple while we insert it,
> this seems unfixable to me.  Without a lock, the tuple could become
> DELETE_IN_PROGRESS immediately after we look at it.

I think there's some confusion here. This above paragraph was taken from some
thoughts about Hannu's suggestion of having a separate ALTER INDEX SET UNIQUE
command. That command might have an advantage over CREATE INDEX CONCURRENTLY
because it knows the index is already complete; it doesn't have to worry about
potential conflicts with tuples that it will only find later in the scan.

Effectively this is equivalent to making CREATE UNIQUE INDEX CONCURRENTLY
three phases. The first two phases would be a regular CREATE INDEX
CONCURRENTLY and the third phase would be what ALTER INDEX SET UNIQUE does
which is scan the index and verify that it's unique.

ALTER INDEX SET UNIQUE would have to perform a similar two-transaction dance
though. It would have to set the index unique, wait until everyone has seen
the new constraint. Then verify that the property is indeed unique, possibly
rolling back the constraint creation if it's not.

That would make the whole process of creating a unique index quite long. On
the plus side it would be a useful command in itself. Doing an index scan
might be pretty slow but if the table is mostly clean of dead and recently
dead tuples it won't have to visit the heap much and should still be much
quicker than building a new index. And it would itself be a concurrent
command.

> Actually it's worse than that.  We could examine a tuple, see that
> it's good, include it in the uniqueness check.  Then someone updates
> the tuple and puts the new version near the end of the table.  By
> the time we reach that version, it could be committed good.  There
> is absolutely no way that we could notice an issue without applying
> extremely expensive tests to *every* apparently-good tuple.

I think ALTER INDEX SET UNIQUE would not have this problem. It would only have
to look at tuples using its own snapshot and see if there's a violation. If
there isn't a violation as of its own snapshot then it can be sure later
transactions will preserve this property since the index was always complete
and it waited after creating the constraint.

> [ thinks for a bit... ]  At least, it seems hopeless if we use
> SnapshotNow.  Does it help if we use a real snapshot?  I'm thinking
> pass 1 inserts exactly those tuples that are good according to a
> snap taken at its beginning, and then pass 2 considers only tuples
> that are good according to a snap taken at *its* beginning.  But
> having consumed no caffeine yet this morning, I'm not sure I can
> spot any flaws that might exist in this idea.

What about tuples that are inserted and committed in the window between the
two phases. Ie, they're RECENTLY_DEAD but not in phase2's snapshot.

Or do you mean we use SatisfiesVacuum to determine what to insert but
SatisfiesSnapshot to determine whether to check uniqueness?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Autovacuum on by default?

2006-08-25 Thread Peter Eisentraut

Summarizing this thread, I see support for the following:

- autovacuum set to on by default in 8.2.

- stats_row_level also defaults to on.

(Perhaps stats_block_level should also default to on so it's not inconsistent, 
seeing that everything else in on, too.)

- Delayed vacuum and delayed autovacuum will stay disabled.

- Scale factor set to 0.08 (vacuum) and 0.04 (analyze) (?)  (formerly 0.4 and 
0.2)

- Leave base thresholds alone (pending further analysis that might remove them 
altogether?)

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Replication

2006-08-25 Thread Markus Schiltknecht


Jeff Davis wrote:

Which doesn't work very well in the case of two groups of servers set up
in two physical locations. I can see two possibilities:
(1) You require a quorum to be effective, in which case your cluster of
databases is only as reliable as the location which holds more servers.
(2) You have another central authority that determines which databases
are up, and which are down. Then your cluster is only as reliable as
that central authority.


Right, the ideal here would be two sync clusters a both locations, 
connected via async replication :-)



Even if you have a large number of nodes at different locations, then
you end up with strange decisions to make if the network connections are
intermittent or very slow. A temporary slowdown of many nodes could
cause them to be degraded until some kind of human intervention brought
them back. Until that time you might not be able to determine which
nodes make up an authoritative group. 


Side note: in such a case, I think a GCS will just choose only one node 
to be the 'authoritative group'. Because most systems cannot effort to 
have long waits for such decisions. For database replication I also 
think its better to have at least one node running than none.


> This kind of degradation could

happen in the case of a DDoS attack, or perhaps a worm moving around the
internet.


Well, sync replication in general needs a good, low latency and secure 
interconnect. The internet does not seem to be a good fit here.



In practice everyone can find a solution that works for them. However,
synchronous replication is not perfect, and there are many failure
scenarios which need to be resolved in a way that fits your business. I
think synchronous replication is inherently less available than
asynchronous.


This surely depends on the environment. With a dedicated (i.e. low 
latency and secure) interconnect sync replication is surely more 
available because your arguments above don't apply. And because sync 
replication guarantees you won't loose committed transactions.


If however you want or have to replicate over the internet it depends. 
Your arguments above also apply to async replication. Only that because 
of the conflict resolution, async replication systems can continue to 
operate on all the disconnected nodes and merge their work later on as 
the network is up again. But then again, async still has the danger of 
loosing transactions.


So I probably agree: if you are on an unreliable network and if you have 
conflict resolution correctly setup then async replication is more 
available, but less secure.


As I said above, sync replication needs a reliable interconnect, better 
even have two interconnects, because it's a SPOF for a clustered 
database system.


Regards

Markus

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] tsvector/tsearch equality and/or portability issue

2006-08-25 Thread Teodor Sigaev


Oops. Fixed.

Tom Lane wrote:

Teodor Sigaev <[EMAIL PROTECTED]> writes:

Fixed in 8.1 and HEAD. Thank you


This appears to have created a regression test failure:

*** ./expected/tsearch2.out Sun Jun 18 12:55:28 2006
--- ./results/tsearch2.out  Thu Aug 24 14:30:02 2006
***
*** 2496,2503 
   f|
   f| '345':1 'qwerti':2 'copyright':3   
   f| 'qq':7 'bar':2,8 'foo':1,3,6 'copyright':9 
-  f| 'a':1A,2,3C 'b':5A,6B,7C,8B
   f| 'a':1A,2,3B 'b':5A,6A,7C,8 
   f| '7w' 'ch' 'd7' 'eo' 'gw' 'i4' 'lq' 'o6' 'qt' 'y0'  
   f| 'ar' 'ei' 'kq' 'ma' 'qa' 'qh' 'qq' 'qz' 'rx' 'st'  
   f| 'gs' 'i6' 'i9' 'j2' 'l0' 'oq' 'qx' 'sc' 'xe' 'yu'  
--- 2496,2503 
   f| 
   f| '345':1 'qwerti':2 'copyright':3

   f| 'qq':7 'bar':2,8 'foo':1,3,6 'copyright':9
   f| 'a':1A,2,3B 'b':5A,6A,7C,8
+  f| 'a':1A,2,3C 'b':5A,6B,7C,8B
   f| '7w' 'ch' 'd7'

Re: [HACKERS] Tricky bugs in concurrent index build

2006-08-25 Thread Gregory Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> At the moment it may be moot, because I've realized that validate_index
> doesn't work anyway.  It is scanning the index and then assuming that
> any tuple inserted into the index subsequent to that scan will still be
> INSERT_IN_PROGRESS when the heapscan reaches it.  

EGADS

Boy I feel stupid now. In fairness I think what happened is that the original
plan was, like your new plan, based on snapshots. And I only switched to using
HeapSatisfiesVacuum after several iterations. I guess there were some
assumptions in the original thinking that I never revisited.

Because of the way the AM API works changing how the initial heap scan works
is a bit of a pain. It would require either having some global state or
passing the concurrent flag through the AM methods or alternatively having a
whole new AM method.

I'll have to read (and reread) your description again in the morning

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] autovacuum cannot start when connection is full

2006-08-25 Thread ITAGAKI Takahiro

"Guillaume Smet" <[EMAIL PROTECTED]> wrote:

> There is a mention in autovacuum doc:
> http://www.postgresql.org/docs/8.1/interactive/maintenance.html#AUTOVACUUM
> 
> Last sentence of the first paragraph is:
> "Also, it's important to allow a slot for the autovacuum process when
> choosing the value of superuser_reserved_connections."

Thank you!

It may be as well to add the same mention to the description of
superuser_reserved_connections.
| Determines the number of connection "slots" that are reserved for connections
| by PostgreSQL superusers, *including autovacuum*.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[HACKERS] Performance testing of COPY (SELECT) TO

2006-08-25 Thread Böszörményi Zoltán

Hi,

we have a large export here, I made an in-house benchmark
between Informix, plain PostgreSQL-8.1.4 and
8.2devel+COPY(SELECT) using the same data and query.
Find the results below for the two PostgreSQL versions.
With PostgreSQL 8.1.4, I used this:

begin;
select ... into temp myquery1;
copy myquery1 to stdout csv delimiter '|';
rollback;

With 8.2devel, I simple used
copy (select ...) to stdout csv delimiter '|';

# of clients:   1*  3** 10**
PostgreSQL  1:3310:58   55:46
PostgreSQL 8.2  1:194:5518:28

*  - average of 4 runs, the first was with cold caches after reboot
** - 1 run, average of cliens' runtimes

Performance between 8.1.4 and 8.2devel is interesting:
1 client: 15%
3 clients: 55.2%
10 clients: 66.9%

The same machine was used for testing.

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[HACKERS] 8.2 new features

2006-08-25 Thread Albe Laurenz

Maybe the following item would look good in a 'new features' list
targeted at the less technically oriented audience:

- LDAP support for authentication and centralized connection management.

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

95 matches

Mail list logo