Re: [HACKERS] pg_detoast_datum_packed and friends

2007-06-02 Thread Gregory Stark

Joe Conway [EMAIL PROTECTED] writes:

 Sorry for my ignorance, but I haven't been able to keep up lately -- 
 what is the difference between pg_detoast_datum_packed and pg_detoast_datum,
 and how do I know when to use each? E.g. I notice that the related macro
 PG_GETARG_TEXT_PP is used in place of PG_GETARG_TEXT_P in many (but not all)
 places in the backend.

We now use only 1 byte for varlena length headers when the datum is up to 126
bytes long. This saves 3-6 bytes since we also don't have to do the four byte
alignment that 4-byte headers require.

This gets expanded into a regular 4-byte header by pg_detoast_datum() so that
all regular data type functions never see the packed varlenas with 1-byte
headers. That lets them use the regular VARDATA() and VARSIZE() macros and
lets them assume 4-byte alignment.

It's always safe to just use the old PG_DETOAST_DATUM() even on a datatype
like text.

In heavily used functions on data types such as text which don't care about
alignment we can avoid having to allocate memory to hold a 4-byte header copy
of the packed varlenas. But we still have to detoast externally stored or
compressed data.

The interface to do so is to use PG_DETOAST_DATUM_PACKED() and then use
VARDATA_ANY() and VARSIZE_ANY_EXHDR() instead of VARDATA and VARSIZE. This
detoasts large data but keeps small data packed and lets you work with either
1-byte or 4-byte headers without knowing which you have.

There's a comment i fmgr.h above pg_detoast_datum which says most of this and
more detailed comments in postgres.h.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] build/install xml2 when configured with libxml

2007-06-02 Thread Andrew Dunstan



Andrew Dunstan wrote:



Nikolay Samokhvalov wrote:


The current CVS' configure is really confusing: it has --with-xslt
option, while there is no XSLT support in the core. At least let's
change the option's comment to smth like build with XSLT support (now
it is used for contrib/xml2 only)...



contrib is a misnomer at best. When 8.3 branches I intend to propose 
that we abandon it altogether, in line with some previous discussions.


We can change the configure help text if people think it matters that 
much - which seems to me much more potentially useful than changing 
comments.






On further consideration I don't see the necessity for this. We don't 
say this about lib-ossp-uuid although it too is only used for a contrib 
module.


cheers

andrew

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[HACKERS] config help neatness

2007-06-02 Thread Andrew Dunstan


Does anyone object if I change these two config help lines:


 --enable-thread-safety-force  force thread-safety in spite of thread 
test failure


 --with-krb-srvnam=NAME  name of the default service principal in 
Kerberos [postgres]



to:


 --enable-thread-safety-force  force thread-safety despite thread test 
failure


 --with-krb-srvnam=NAME  default service principal name in Kerberos 
[postgres]



so that they fit into 80 cols?

cheers

andrew



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [PATCHES] build/install xml2 when configured with libxml

2007-06-02 Thread Andrew Dunstan



Nikolay Samokhvalov wrote:

On 6/2/07, Andrew Dunstan [EMAIL PROTECTED] wrote:

On further consideration I don't see the necessity for this. We don't
say this about lib-ossp-uuid although it too is only used for a contrib
module.


And is it good? For that functionality I would also add comment
describing that this --with... relates to contib only. What we have
now is not absolutely correct situation when user could wrongly think
that (s)he will have capabilities, just adding --with-..., but (s)he
won't.


Sure she will, in contrib. You keep on wanting to treat contrib as not 
part of Postgres. That's a mistake.


cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] build/install xml2 when configured with libxml

2007-06-02 Thread Nikolay Samokhvalov

On 6/2/07, Andrew Dunstan [EMAIL PROTECTED] wrote:

On further consideration I don't see the necessity for this. We don't
say this about lib-ossp-uuid although it too is only used for a contrib
module.


And is it good? For that functionality I would also add comment
describing that this --with... relates to contib only. What we have
now is not absolutely correct situation when user could wrongly think
that (s)he will have capabilities, just adding --with-..., but (s)he
won't.

--
Best regards,
Nikolay

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] To all the pgsql developers..Have a look at the operators proposed by me in my researc

2007-06-02 Thread Tasneem Memon

 CC: pgsql-hackers@postgresql.org From: [EMAIL PROTECTED] Subject: Re: 
 [HACKERS] To all the pgsql developers..Have a look at the operators proposed 
 by me in my research paper. Date: Fri, 1 Jun 2007 19:13:54 -0500 To: [EMAIL 
 PROTECTED]  On Jun 1, 2007, at 8:24 AM, Tasneem Memon wrote:  NEAR   
 It deals with the NUMBER and DATE datatypes simulating the human   behavior 
 and processing the  Why just number and date?
 
 
I have just started working on it for my MS research work..  for the moment I 
have written algorithms for these two datatypes only, but I intend to implement 
these operators for the other datatypes also. As for other datatypes, 
especially those involving strings, its very complicated.
 
 
 
   information contained in NEAR in the same way as we humans take it.   
   This is a binary operator with the syntax:  op1 NEAR op2  Here, the 
   op1 refers to an attribute, whereas op2 is a fixed value,   both of the 
   same datatype.  Suppose we want a list of all the VGAs, price of which 
   should be   somewhere around 30$ .. the query will look like:   
   SELECT *  FROM accessories  WHERE prod_name = ‘VGA’  AND prod_price 
   NEAR 30   A query for the datatype DATE will look like:   SELECT 
   *  FROM sales  WHERE item = ’printer’  AND s_date NEAR 10-7-06  
 The algorithm for the NEAR operator works as follows:   The 
   margins to the op2, i.e. m1 and m2, are added dynamically on   both the 
   sides, considering the value it contains. To keep this   margin big is 
   important for a certain reason discussed later.  The NEAR operator is 
   supposed to obtain the values near to the op2,   thus the target 
   membership degree(md) is initially set to 0.8.  The algorithm compares 
   the op1(column) values row by row to the   elements of the set that 
   NEAR defined, i.e. the values from md 1.0   to 0.8, adding matching 
   tuples to the result set.  How would one change 0.8 to some other value?
 
 
 
We can make the system ask the user as to what membership degree s/he wants to 
get the values, but we don’t want to make the system interactive, where a user 
gives a membership degree value of his/her choice. These operators are supposed 
to work just like the other operators in SQL.. you just put them in the query 
and get a result. I have put 0.8 because all the case studies I have made for 
the NEAR,  0.8 seems to be the best choice.. 0.9 narrows the range.. 0.75 or 
0.7 gets those values also that are irrelevant.. However, these values will no 
more seem to be irrelevant when we haven’t got any values till the md 0.8, so 
the operator fetches them when they are the NEARest. 
 
I would like to mention another thing here that this looks like defining the 
range like BETWEEN operator does, but its different in a way that with BETWEEN 
we define an exact, strict range. Anything outside that range wont be included 
no matter that value might be of interest of the user querying the system, and 
if there are no values between that range, the result set is empty. 

 
 
   4. It is very much possible that the result set is empty since   no 
   values within the range exist in the column. Thus, the algorithm   
   checks for empty result set, and in that case, decreases the target   
   md by 0.2 and jumps to step 3. This is the reason big margins to   the 
   op2 are added.  5. In case there are no values in op1 that are between 
   m1 and   m2 (where the membership degree of the values with respect to 
   NEAR   becomes 0.1) and the result set is empty, the algorithm fetches 
   the   two nearest values (tuples) to op2, one smaller and one larger 
   than   the op2, as the result.   The algorithm will give an empty 
   result only if the table referred   to in the query is empty.   2. 
   NOT NEAR   This operator is also a binary operator, dealing with   
   the datatype NUMBER and DATE. It has the syntax:  op1 NOT NEAR op2  
   The op1 refers to an attribute, whereas op2 is a fixed value, both   of 
   the same data type.  A query containing the operator looks like:   
   SELECT id, name, age, history  FROM casualties  WHERE cause = ‘heart 
   attack’  AND age NOT NEAR 55   Or suppose we need a list of some 
   event that is not clashing with   some commitment of ours:   SELECT 
   *  FROM events  WHERE e_name= ‘concert’  AND date NOT NEAR 
   8/28/2007   The algorithm for NOT NEAR works like this:  First of 
   all it adds the margins to the op2, i.e. m1 and m2,   dynamically on 
   both the sides, considering the value op2 contains.  op1 values outside 
   the scope of the op2 (m1, m2) are retrieved and   added to the result. 
If the result set is empty, the farthest values within the op2   
   fuzzy set (those possessing the least membership degree) are   
   retrieved. This is done by continuing the search from values with   
   md=0.1 till the md=0.6, where the md for NOT NEAR reaches 0.4.  Why 
   isn't this just the exact opposite set of NEAR?
 
 
Because we are talking 

Re: [HACKERS] To all the pgsql developers..Have a look at the operators proposed by me in my research paper.

2007-06-02 Thread Josh Berkus
Tasneem,

  The margins to the op2, i.e. m1 and m2, are added dynamically on  
  both the sides, considering the value it contains. To keep this  
  margin big is important for a certain reason discussed later.
  The NEAR operator is supposed to obtain the values near to the op2,  
  thus the target membership degree(md) is initially set to 0.8.
  The algorithm compares the op1(column) values row by row to the  
  elements of the set that NEAR defined, i.e. the values from md 1.0  
  to 0.8, adding matching tuples to the result set.

Are we talking about a mathematical calculation on the values, or an algorithm 
against the population of the result set?  I'm presuming the latter or you 
could just use a function.  If so, is NEAR an absolute range or based on 
something logarithmic like standard deviation?

Beyond that, I would think that this mechanism would need some kind of extra 
heuristics to be at all performant, otherwise you're querying the entire 
table (or at least the entire index) every time you run a query.  Have you 
given any thought to this?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Tsearch vs Snowball, or what's a source file?

2007-06-02 Thread Tom Lane
While looking at the tsearch-in-core patch I was distressed to notice
that a good fraction of it is derived files, bearing notices such as

/* This file was generated automatically by the Snowball to ANSI C compiler */

Our normal policy is no derived files in CVS, so I went looking to
see if we couldn't avoid that.  I now see that contrib/tsearch2 has been
doing the same thing for awhile, and it's risen up to bite us before, eg
http://archives.postgresql.org/pgsql-committers/2005-09/msg00137.php

I had not previously known anything about Snowball, but after perusing
their website
http://snowball.tartarus.org/
for a bit, I believe the following is an accurate summary:

1. The original word-stemming algorithms are written in a special
language Snowball.  You can get both the Snowball compiler and the
original .sbl source files off the Snowball site, but these files are
not those.

2. The Snowball people also distribute a pre-compiled version of their
stuff, ie, the results of generating ANSI C code from all the stemming
algorithms.  They call this distribution libstemmer.

3. What we've been distributing in contrib/tsearch2/snowball is a
severely cut-back subset of libstemmer, ie, just the English and Russian
stemmers.  This accounts for the occasional complaints in the mailing
lists from people who were trying to add other stemmers from the
libstemmer distribution (and running into version-skew problems, because
the version we're using is not very up-to-date).

4. The proposed tsearch-in-core patch includes a larger subset of
libstemmer, but it's still not the whole thing, and it still seems to be
a modified copy rather than an exact one.


There isn't any part of this that seems to me to be a good idea.
Arguably we should be relying on the original .sbl files, but that would
make the Snowball compiler a required tool for building distributions,
which is a dependency I for one don't want to add.  In any case there's
probably not a lot of practical difference between relying on the
Snowball project's .sbl files and relying on their libstemmer
distribution.  Either way, we are importing someone else's sources.
(At least they're BSD-license sources...)

What I definitely *don't* like is that we've whacked the fileset around
in ways that make it hard for someone to drop in a newer version of the
upstream sources.  The filenames don't match, the directory layout
doesn't match, and to add insult to injury we've plastered our copyright
on their files.

Following the precedent of the zic timezone files would suggest dropping
an *unmodified* copy of the libstemmer distro into its own subdirectory
of our CVS, and doing whatever we have to do to compile it without any
changes, so that we can drop in updates later without creating problems.
(This is, in fact, what the Snowball people recommend for incorporating
their code into a larger application.)

OTOH, keeping our copy of the zic files up-to-date has proven to be a
significant pain in the neck, and so I'm not sure I care to follow that
precedent exactly.  The Snowball files may not change as often as
politicians invent new timezone laws, but they seem to change regularly
enough --- the libstemmer tarball I just downloaded from their website
seems to have been generated barely a week ago, and no it doesn't match
what's in the patch now.

Is there a reasonable way to treat libstemmer as an external library?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Postmaster startup messages

2007-06-02 Thread Peter Eisentraut
Michael Paesold wrote:
 In case of recovery, I think one should still get the full
 output, no?

Recovery happens just after these messages are printed, so the window 
when they are actually relevant would be very small.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Autovacuum launcher doesn't notice death of postmaster immediately

2007-06-02 Thread Peter Eisentraut
I notice that in 8.3, when I kill the postmaster process with SIGKILL or 
SIGSEGV, the child processes writer and stats collector go away 
immediately, but the autovacuum launcher hangs around for up to a 
minute.  (I suppose this has to do with the periodic wakeups?).  When 
you try to restart the postmaster before that it fails with a complaint 
that someone is still attached to the shared memory segment.

These are obviously not normal modes of operation, but I fear that this 
could cause some problems with people's control scripts of the 
sort, it crashed, let's try to restart it.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Tsearch vs Snowball, or what's a source file?

2007-06-02 Thread Josh Berkus
Tom,

 Is there a reasonable way to treat libstemmer as an external library?

Hmmm ... do we want to do that if we're distributing it in core?  That 
would require us to have a --with-tsearch compile switch so that people 
who don't want to find  build libstemmer can build PostgreSQL.  I thought 
the whole point of this feature was to have a version of Tsearch which 
just worked for users.

As annoying as it may be to keep it updated, I think it's probably worth it 
from a user experience standpoint.  However, we should definitely put the 
exact libstemmer C files, as distributed by the project, somewhere so that 
updating stemmer each time we do a patch release is simply a matter of 
download and rsync.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Tsearch vs Snowball, or what's a source file?

2007-06-02 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 Is there a reasonable way to treat libstemmer as an external library?

 Hmmm ... do we want to do that if we're distributing it in core?  That 
 would require us to have a --with-tsearch compile switch so that people 
 who don't want to find  build libstemmer can build PostgreSQL.  I thought 
 the whole point of this feature was to have a version of Tsearch which 
 just worked for users.

True.

I just noticed that the upstream master distribution (their compiler
source and .sbl files) weighs in at half the size of the libstemmer
distribution: 68K vs 129K in tar.gz format --- no doubt due to all the
repetitive boilerplate in the generated files.  I'm not sure if the
compiler source has any portability issues, but if not it is interesting
to consider the idea of bundling the master distro instead of
libstemmer.  This would fix at least one issue that we otherwise will
have, which is that the #include-paths they chose to generate libstemmer
with seem a bit unfriendly for our purposes.  The #include commands are
determined by compiler options, so we could fix them if compiling the
.sbl files on the fly.

This makes no difference in terms of the ease of tracking their changes,
of course, but it just feels better to me to be distributing real
source code and not derived files.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] tracker project

2007-06-02 Thread Andrew Dunstan


All,

Following some public and not so public discussion a little while back, 
I decided to ask a group of people to help me to create an 
experimental tracker instance for bugs and possibly features, to assist 
our development efforts. The people I chose were some I have worked with 
before, e.g. on the buildfarm, or who had expressed general support for 
the idea, and who I thought could usefully contribute to such a project. 
The idea is to run this for one release cycle, at the end of which time 
we should have enough experience to know if it could help or hinder our efforts.


At the moment we are still discussing both scope and software 
candidates, and exploring a couple of candidates.


There is no intention to be secret, but we also don't want to be 
endlessly debating possible merits, which has been an unfortunate 
characteristic of several discussions over the years. Rather, we want to 
demonstrate what we believe to be the benefits, as clearly and directly 
as possible, by actual use.


We currently have a project on pgfoundry, including a mailing list, at 
http://pgfoundry.org/projects/tracker/ and a wiki at 
http://www.kaltenbrunner.cc/wiki/index.php/Pgtracker:evaluation


Anyone who is interested in contributing is welcome to join in, 
especially if they have a history of involvement in PostgreSQL 
development. 



cheers

andrew




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] syslogger line-end processing infelicity

2007-06-02 Thread Andrew Dunstan



Magnus Hagander wrote:

My second thought is that we should quite possibly abandon this
translation altogether - we know that our COPY code is quite happy with
either style of line ending, as long as the file is consistent, and also
many Windows programs will quite happily read files with Unix style line
endings (e.g. Wordpad, although not Notepad).



Agreed. We shouldn't touch the data. Every editor I know on windows
*except* notepad can deal just fine with Unix line endings, and if
you're logging your queries your logfile will be too large to work well
in notepad anyway :-)



  


OK, so do we consider this a bug fix and backpatch it all the way to 
8.0? Nobody's complained so far that I know of, and it's only damaged 
logs, not damaged primary data. I'm inclined just to fix it in HEAD, and 
release note the change in behaviour. It will matter more when we get 
machine-readable logs.


cheers

andrew

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[HACKERS] ERROR: index row size

2007-06-02 Thread Rodrigo Sakai
  Hello,

 

  I'm having a big trouble with the index size! I have looked for a solution
in the internet, but the solutions that I found don't fit for me! 

  I developed a new data type using C and add this new type on PostgreSQL.
Basically, the data type is: (DateADT, DateADT) with some temporal rules
that I'm researching! The data type is ok; the in, out, receive and send
functions are ok; some operations are ok. But the index operators and
functions are not working properly! Actually I can use them, but in some
cases an error occurs about index row size.

  I'm sure that the functions in, out, receive and send are well
implemented. I think the problem is that the data type is really big and
needs a big index.

 

  The implementation code of the data type is::

 

typedef struct t_periodo

{

DateADT   tvi;

DateADT   tvf;

}   Periodo;

 

 

  Any ideas to solve my problem? Perhaps increasing the BLOCKSIZE could be
one solution. Not very smart one, but can solve temporally my problem?!?!

 

  Thanks in advance!



Re: [HACKERS] ERROR: index row size

2007-06-02 Thread Tom Lane
Rodrigo Sakai [EMAIL PROTECTED] writes:
   I developed a new data type using C and add this new type on PostgreSQL.
 Basically, the data type is: (DateADT, DateADT) with some temporal rules
 that I'm researching! The data type is ok; the in, out, receive and send
 functions are ok; some operations are ok. But the index operators and
 functions are not working properly! Actually I can use them, but in some
 cases an error occurs about index row size.

You have a bug in your datatype code.  There's no way an 8-byte datatype
should produce that error.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate