Re: [PATCHES] Remove FATAL from pg_lzdecompress

2008-03-07 Thread Tom Lane
Zdenek Kotala <[EMAIL PROTECTED]> writes:
> I attach patch which adds boundaries check and memory overwriting 
> protection when compressed data are corrupted.

Applied with revisions --- it appeared to me that it got the corner case
wrong where we find a tag just at the end of the input but there's no
room for the output.  We'd fall out of the loop and then the error
test would think all is well.

> I did not add any extra information into the message. Reasonable 
> solution seems to be use errcontext how was recommended by Alvaro. But I 
> 'm not sure if printtup is good place for it, because pg_detoast is 
> called from many places. However, is can be solved in separate patch.

I'm still unconvinced that that's worth any added complexity or
slowdown.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] CopyReadAttributesCSV optimization

2008-03-07 Thread Andrew Dunstan



Heikki Linnakangas wrote:
Here's a patch to speed up CopyReadAttributesCSV. On the test case 
I've been playing with, loading the TPC-H partsupp table, about 20% 
CopyReadAttributesCSV (inlined into DoCopy, DoCopy itself is 
insignificant):




[snip]


The trick is to split the loop in CopyReadAttributesCSV into two 
parts, inside quotes, and outside quotes, saving some instructions in 
both parts.


Your mileage may vary, but I'm quite happy with this. I haven't tested 
it much yet, but I wouldn't expect it to be a loss in any interesting 
scenario. The code also doesn't look much worse after the patch, 
perhaps even better.


  


This looks sane enough, and worked for me in testing, so I'm going to 
apply it shortly. I'll probably add a comment or two about how the loops 
interact.


cheers

andrew

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Proposed patch to change TOAST compression strategy

2008-03-07 Thread Tom Lane
Gregory Stark <[EMAIL PROTECTED]> writes:
> "Tom Lane" <[EMAIL PROTECTED]> writes:
>> * Adds an early-failure path to the compressor as suggested by Jan:
>> if it's been unable to find even one compressible substring in the
>> first 1KB (parameterizable), assume we're looking at incompressible
>> input and give up.  (There are various ways this could be done, but
>> this way seems to add the least overhead to the compressor's inner
>> loop.)

> I'm not sure how to test the rest of it, but this bit seems testable. I fear
> this may be too conservative. Even nigh incompressible data will find a few
> backreferences.
> I'll try some tests and see.

On the strength of Teodor's favorable test, I went ahead and applied
this patch as-is.  We can certainly tweak the compressor logic again
if you can show an improvement by changing it further.

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [HACKERS] Re: [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords

2008-03-07 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Added to TODO:

> * Allow text search dictionary to filter out only stop words

>   http://archives.postgresql.org/pgsql-patches/2007-11/msg00081.php

That's a poor description.  I thought the TODO was something more like
"allow dictionaries to change the token that is passed on to later
dictionaries".

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] CopyReadLineText optimization

2008-03-07 Thread Andrew Dunstan



Heikki Linnakangas wrote:

Andrew Dunstan wrote:
I'm still a bit worried about applying it unless it gets some 
adaptive behaviour or something so that we don't cause any serious 
performance regressions in some cases.


I'll try to come up with something. At the most conservative end, we 
could fall back to the current method on the first escape, quote or 
backslash character.


Also, could we perhaps benefit from inlining some calls, or is your 
compiler doing that anyway?


gcc does inline all static functions that are only called from one 
site,  and small functions, using some heuristic. I don't think more 
aggressive inlining would help.




Another question that occurred to me - did you try using strpbrk() to 
look for the next interesting character rather than your homegrown 
searcher gadget? If so, how did that perform?


cheers

andrew

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Cleaner API for appendStringInfoVA

2008-03-07 Thread Bruce Momjian

Patch rejected.  Sorry.  Comment is:

The patch of very dubious portability and I'm not even convinced that
it'd provide a net performance improvement.

---

Marko Kreen wrote:
> Attached patch moves decision how much more room to allocate
> from callers of appendStringInfoVA to inside the function,
> where more info is available.
> 
> On systems with broken vsnprintf() it falls back
> to doubleing the buffer.
> 
> Fixme: the +1 could be something larger?  Aligned?
> 
> -- 
> marko

[ Attachment, skipping... ]

> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] a tsearch2 (8.2.4) dictionary that only filters out stopwords

2008-03-07 Thread Bruce Momjian

Added to TODO:

* Allow text search dictionary to filter out only stop words

  http://archives.postgresql.org/pgsql-patches/2007-11/msg00081.php


---

Tom Lane wrote:
> Oleg Bartunov <[EMAIL PROTECTED]> writes:
> > Let's consider one example - removing accents.
> > In the past I always recommend people to use regex functions before
> > to_tsvector conversion to remove accents, but recently I was noticed that
> > such trick doesn't work with headline(). So, the only way is to have
> > special dictionary dict_remove_accent before, which  works as a filter.
> 
> > I don't remember why do we left this for future releases, though.
> 
> That would require a system-to-dictionary API change (to be able to
> modify the token under inspection), no?  So it's certainly something
> I'd say is too late for 8.3.
> 
> One thought that came to mind is that the option name should be just
> "Accept" not "AcceptAll".  To me "All" implies that it would accept
> *everything* ... including stopwords.
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 4: Have you searched our list archives?
> 
>http://archives.postgresql.org

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] Minimum selectivity estimate for LIKE 'prefix%'

2008-03-07 Thread Peter Eisentraut
Am Donnerstag, 6. März 2008 schrieb Tom Lane:
> To help out Peter's client, who's running 8.1.x, we'd have to backpatch
> at least that far.  We backpatched the last round of LIKE selectivity
> fixes to 8.1, so I don't have too much hesitation about doing the same
> here.

Thanks for the offer.  Personally, I'd be OK with not backpatching and dealing 
with the patch myself.  We've patched that area before, but I'm not sure what 
the original cause of that was, but this here sounds more like a feature 
improvement.

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit

2008-03-07 Thread Teodor Sigaev

To be precise about tsvector:

1) GiST index is lossy for any kind of tserach queries, GIN index for @@ 
operation is not lossy, for @@@ - is lossy.


2) Number of positions per word is limited to 256 number - bigger number of 
positions is not helpful for ranking, but produces a big tsvector. If word has a 
lot of positions in document then it close to be a stopword. We could easy 
increase this limit to 65536 positions


3) Maximum value of position is 2^14, because for position's storage we use 
uint16. In this integer it's needed to reserve 2 bits to store weight of this 
position. It's possible to increase int16 to int32, but it will doubled tsvector 
size, which is unpractical, I suppose. So, part of document used for ranking 
contains first 16384 words - that is about first 50-100 kilobytes.


4) Limit of total size of tsvector is in WordEntry->pos (ts_type.h) field. It 
contains number of bytes between first lexeme in tsvector and needed lexeme.

So, limitation is total length of lexemes plus  theirs positional information.


--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

--
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit

2008-03-07 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> >> I don't think that follows.  A tsearch index is lossy anyway, so there's
> 
> > Uh, the index is lossy but I thought it was lossy in a way that just
> > required additional heap accesses, not lossy in that it doesn't index
> > everything.
> 
> Sure it's lossy.  It doesn't index stopwords, and it doesn't index the
> difference between various forms of a word (when the dictionaries reduce
> them to a common root).

Yes, but you specify the stop words and stemming rules --- it isn't like
it drops words that are out of your control.

> > I am concerned a 1mb limit is too low though.  Exactly why can't we have
> > a higher limit?  Is positional information that significant?
> 
> That's pretty much exactly the point: it's not very significant, and it
> doesn't justify a total inability to index large documents.

Agreed.  I think losing positional information after 1mb is acceptable.

> One thing we could do is index words that are past the limit but not
> store a position, or perhaps have the convention that the maximum
> position value means "somewhere past here".

Sure.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit

2008-03-07 Thread Tom Lane
Bruce Momjian <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> I don't think that follows.  A tsearch index is lossy anyway, so there's

> Uh, the index is lossy but I thought it was lossy in a way that just
> required additional heap accesses, not lossy in that it doesn't index
> everything.

Sure it's lossy.  It doesn't index stopwords, and it doesn't index the
difference between various forms of a word (when the dictionaries reduce
them to a common root).

> I am concerned a 1mb limit is too low though.  Exactly why can't we have
> a higher limit?  Is positional information that significant?

That's pretty much exactly the point: it's not very significant, and it
doesn't justify a total inability to index large documents.

One thing we could do is index words that are past the limit but not
store a position, or perhaps have the convention that the maximum
position value means "somewhere past here".

regards, tom lane

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit

2008-03-07 Thread Bruce Momjian
Edwin Groothuis wrote:
> On Thu, Mar 06, 2008 at 08:19:35PM -0300, Euler Taveira de Oliveira wrote:
> > Edwin Groothuis wrote:
> > 
> > >Is it possible to make it a WARNING instead of an ERROR? Right now I get:
> > >
> > No. All of the other types emit an ERROR if you're trying an out of 
> > range value.
> 
> Does that then mean that, because of the triggers on inserts, we
> will never be able to add this record?

Right now, yes, unless we figure something else out.

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches


Re: [PATCHES] [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit

2008-03-07 Thread Bruce Momjian
Tom Lane wrote:
> Euler Taveira de Oliveira <[EMAIL PROTECTED]> writes:
> > Edwin Groothuis wrote:
> >> Is it possible to make it a WARNING instead of an ERROR? Right now I get:
> >> 
> > No. All of the other types emit an ERROR if you're trying an out of 
> > range value.
> 
> I don't think that follows.  A tsearch index is lossy anyway, so there's

Uh, the index is lossy but I thought it was lossy in a way that just
required additional heap accesses, not lossy in that it doesn't index
everything.

I think just indexing what we can and throwing away the rest is a MySQL
approach we don't want to take.  We could throw a warning on truncation
but that seems wrong too.

I am concerned a 1mb limit is too low though.  Exactly why can't we have
a higher limit?  Is positional information that significant?

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-patches mailing list (pgsql-patches@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches