Re: [HACKERS] pg_detoast_datum_packed and friends
Joe Conway [EMAIL PROTECTED] writes: Sorry for my ignorance, but I haven't been able to keep up lately -- what is the difference between pg_detoast_datum_packed and pg_detoast_datum, and how do I know when to use each? E.g. I notice that the related macro PG_GETARG_TEXT_PP is used in place of PG_GETARG_TEXT_P in many (but not all) places in the backend. We now use only 1 byte for varlena length headers when the datum is up to 126 bytes long. This saves 3-6 bytes since we also don't have to do the four byte alignment that 4-byte headers require. This gets expanded into a regular 4-byte header by pg_detoast_datum() so that all regular data type functions never see the packed varlenas with 1-byte headers. That lets them use the regular VARDATA() and VARSIZE() macros and lets them assume 4-byte alignment. It's always safe to just use the old PG_DETOAST_DATUM() even on a datatype like text. In heavily used functions on data types such as text which don't care about alignment we can avoid having to allocate memory to hold a 4-byte header copy of the packed varlenas. But we still have to detoast externally stored or compressed data. The interface to do so is to use PG_DETOAST_DATUM_PACKED() and then use VARDATA_ANY() and VARSIZE_ANY_EXHDR() instead of VARDATA and VARSIZE. This detoasts large data but keeps small data packed and lets you work with either 1-byte or 4-byte headers without knowing which you have. There's a comment i fmgr.h above pg_detoast_datum which says most of this and more detailed comments in postgres.h. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] build/install xml2 when configured with libxml
Andrew Dunstan wrote: Nikolay Samokhvalov wrote: The current CVS' configure is really confusing: it has --with-xslt option, while there is no XSLT support in the core. At least let's change the option's comment to smth like build with XSLT support (now it is used for contrib/xml2 only)... contrib is a misnomer at best. When 8.3 branches I intend to propose that we abandon it altogether, in line with some previous discussions. We can change the configure help text if people think it matters that much - which seems to me much more potentially useful than changing comments. On further consideration I don't see the necessity for this. We don't say this about lib-ossp-uuid although it too is only used for a contrib module. cheers andrew ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] config help neatness
Does anyone object if I change these two config help lines: --enable-thread-safety-force force thread-safety in spite of thread test failure --with-krb-srvnam=NAME name of the default service principal in Kerberos [postgres] to: --enable-thread-safety-force force thread-safety despite thread test failure --with-krb-srvnam=NAME default service principal name in Kerberos [postgres] so that they fit into 80 cols? cheers andrew ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PATCHES] build/install xml2 when configured with libxml
Nikolay Samokhvalov wrote: On 6/2/07, Andrew Dunstan [EMAIL PROTECTED] wrote: On further consideration I don't see the necessity for this. We don't say this about lib-ossp-uuid although it too is only used for a contrib module. And is it good? For that functionality I would also add comment describing that this --with... relates to contib only. What we have now is not absolutely correct situation when user could wrongly think that (s)he will have capabilities, just adding --with-..., but (s)he won't. Sure she will, in contrib. You keep on wanting to treat contrib as not part of Postgres. That's a mistake. cheers andrew ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PATCHES] build/install xml2 when configured with libxml
On 6/2/07, Andrew Dunstan [EMAIL PROTECTED] wrote: On further consideration I don't see the necessity for this. We don't say this about lib-ossp-uuid although it too is only used for a contrib module. And is it good? For that functionality I would also add comment describing that this --with... relates to contib only. What we have now is not absolutely correct situation when user could wrongly think that (s)he will have capabilities, just adding --with-..., but (s)he won't. -- Best regards, Nikolay ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] To all the pgsql developers..Have a look at the operators proposed by me in my researc
CC: pgsql-hackers@postgresql.org From: [EMAIL PROTECTED] Subject: Re: [HACKERS] To all the pgsql developers..Have a look at the operators proposed by me in my research paper. Date: Fri, 1 Jun 2007 19:13:54 -0500 To: [EMAIL PROTECTED] On Jun 1, 2007, at 8:24 AM, Tasneem Memon wrote: NEAR It deals with the NUMBER and DATE datatypes simulating the human behavior and processing the Why just number and date? I have just started working on it for my MS research work.. for the moment I have written algorithms for these two datatypes only, but I intend to implement these operators for the other datatypes also. As for other datatypes, especially those involving strings, its very complicated. information contained in NEAR in the same way as we humans take it. This is a binary operator with the syntax: op1 NEAR op2 Here, the op1 refers to an attribute, whereas op2 is a fixed value, both of the same datatype. Suppose we want a list of all the VGAs, price of which should be somewhere around 30$ .. the query will look like: SELECT * FROM accessories WHERE prod_name = ‘VGA’ AND prod_price NEAR 30 A query for the datatype DATE will look like: SELECT * FROM sales WHERE item = ’printer’ AND s_date NEAR 10-7-06 The algorithm for the NEAR operator works as follows: The margins to the op2, i.e. m1 and m2, are added dynamically on both the sides, considering the value it contains. To keep this margin big is important for a certain reason discussed later. The NEAR operator is supposed to obtain the values near to the op2, thus the target membership degree(md) is initially set to 0.8. The algorithm compares the op1(column) values row by row to the elements of the set that NEAR defined, i.e. the values from md 1.0 to 0.8, adding matching tuples to the result set. How would one change 0.8 to some other value? We can make the system ask the user as to what membership degree s/he wants to get the values, but we don’t want to make the system interactive, where a user gives a membership degree value of his/her choice. These operators are supposed to work just like the other operators in SQL.. you just put them in the query and get a result. I have put 0.8 because all the case studies I have made for the NEAR, 0.8 seems to be the best choice.. 0.9 narrows the range.. 0.75 or 0.7 gets those values also that are irrelevant.. However, these values will no more seem to be irrelevant when we haven’t got any values till the md 0.8, so the operator fetches them when they are the NEARest. I would like to mention another thing here that this looks like defining the range like BETWEEN operator does, but its different in a way that with BETWEEN we define an exact, strict range. Anything outside that range wont be included no matter that value might be of interest of the user querying the system, and if there are no values between that range, the result set is empty. 4. It is very much possible that the result set is empty since no values within the range exist in the column. Thus, the algorithm checks for empty result set, and in that case, decreases the target md by 0.2 and jumps to step 3. This is the reason big margins to the op2 are added. 5. In case there are no values in op1 that are between m1 and m2 (where the membership degree of the values with respect to NEAR becomes 0.1) and the result set is empty, the algorithm fetches the two nearest values (tuples) to op2, one smaller and one larger than the op2, as the result. The algorithm will give an empty result only if the table referred to in the query is empty. 2. NOT NEAR This operator is also a binary operator, dealing with the datatype NUMBER and DATE. It has the syntax: op1 NOT NEAR op2 The op1 refers to an attribute, whereas op2 is a fixed value, both of the same data type. A query containing the operator looks like: SELECT id, name, age, history FROM casualties WHERE cause = ‘heart attack’ AND age NOT NEAR 55 Or suppose we need a list of some event that is not clashing with some commitment of ours: SELECT * FROM events WHERE e_name= ‘concert’ AND date NOT NEAR 8/28/2007 The algorithm for NOT NEAR works like this: First of all it adds the margins to the op2, i.e. m1 and m2, dynamically on both the sides, considering the value op2 contains. op1 values outside the scope of the op2 (m1, m2) are retrieved and added to the result. If the result set is empty, the farthest values within the op2 fuzzy set (those possessing the least membership degree) are retrieved. This is done by continuing the search from values with md=0.1 till the md=0.6, where the md for NOT NEAR reaches 0.4. Why isn't this just the exact opposite set of NEAR? Because we are talking
Re: [HACKERS] To all the pgsql developers..Have a look at the operators proposed by me in my research paper.
Tasneem, The margins to the op2, i.e. m1 and m2, are added dynamically on both the sides, considering the value it contains. To keep this margin big is important for a certain reason discussed later. The NEAR operator is supposed to obtain the values near to the op2, thus the target membership degree(md) is initially set to 0.8. The algorithm compares the op1(column) values row by row to the elements of the set that NEAR defined, i.e. the values from md 1.0 to 0.8, adding matching tuples to the result set. Are we talking about a mathematical calculation on the values, or an algorithm against the population of the result set? I'm presuming the latter or you could just use a function. If so, is NEAR an absolute range or based on something logarithmic like standard deviation? Beyond that, I would think that this mechanism would need some kind of extra heuristics to be at all performant, otherwise you're querying the entire table (or at least the entire index) every time you run a query. Have you given any thought to this? -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Tsearch vs Snowball, or what's a source file?
While looking at the tsearch-in-core patch I was distressed to notice that a good fraction of it is derived files, bearing notices such as /* This file was generated automatically by the Snowball to ANSI C compiler */ Our normal policy is no derived files in CVS, so I went looking to see if we couldn't avoid that. I now see that contrib/tsearch2 has been doing the same thing for awhile, and it's risen up to bite us before, eg http://archives.postgresql.org/pgsql-committers/2005-09/msg00137.php I had not previously known anything about Snowball, but after perusing their website http://snowball.tartarus.org/ for a bit, I believe the following is an accurate summary: 1. The original word-stemming algorithms are written in a special language Snowball. You can get both the Snowball compiler and the original .sbl source files off the Snowball site, but these files are not those. 2. The Snowball people also distribute a pre-compiled version of their stuff, ie, the results of generating ANSI C code from all the stemming algorithms. They call this distribution libstemmer. 3. What we've been distributing in contrib/tsearch2/snowball is a severely cut-back subset of libstemmer, ie, just the English and Russian stemmers. This accounts for the occasional complaints in the mailing lists from people who were trying to add other stemmers from the libstemmer distribution (and running into version-skew problems, because the version we're using is not very up-to-date). 4. The proposed tsearch-in-core patch includes a larger subset of libstemmer, but it's still not the whole thing, and it still seems to be a modified copy rather than an exact one. There isn't any part of this that seems to me to be a good idea. Arguably we should be relying on the original .sbl files, but that would make the Snowball compiler a required tool for building distributions, which is a dependency I for one don't want to add. In any case there's probably not a lot of practical difference between relying on the Snowball project's .sbl files and relying on their libstemmer distribution. Either way, we are importing someone else's sources. (At least they're BSD-license sources...) What I definitely *don't* like is that we've whacked the fileset around in ways that make it hard for someone to drop in a newer version of the upstream sources. The filenames don't match, the directory layout doesn't match, and to add insult to injury we've plastered our copyright on their files. Following the precedent of the zic timezone files would suggest dropping an *unmodified* copy of the libstemmer distro into its own subdirectory of our CVS, and doing whatever we have to do to compile it without any changes, so that we can drop in updates later without creating problems. (This is, in fact, what the Snowball people recommend for incorporating their code into a larger application.) OTOH, keeping our copy of the zic files up-to-date has proven to be a significant pain in the neck, and so I'm not sure I care to follow that precedent exactly. The Snowball files may not change as often as politicians invent new timezone laws, but they seem to change regularly enough --- the libstemmer tarball I just downloaded from their website seems to have been generated barely a week ago, and no it doesn't match what's in the patch now. Is there a reasonable way to treat libstemmer as an external library? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Postmaster startup messages
Michael Paesold wrote: In case of recovery, I think one should still get the full output, no? Recovery happens just after these messages are printed, so the window when they are actually relevant would be very small. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Autovacuum launcher doesn't notice death of postmaster immediately
I notice that in 8.3, when I kill the postmaster process with SIGKILL or SIGSEGV, the child processes writer and stats collector go away immediately, but the autovacuum launcher hangs around for up to a minute. (I suppose this has to do with the periodic wakeups?). When you try to restart the postmaster before that it fails with a complaint that someone is still attached to the shared memory segment. These are obviously not normal modes of operation, but I fear that this could cause some problems with people's control scripts of the sort, it crashed, let's try to restart it. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Tsearch vs Snowball, or what's a source file?
Tom, Is there a reasonable way to treat libstemmer as an external library? Hmmm ... do we want to do that if we're distributing it in core? That would require us to have a --with-tsearch compile switch so that people who don't want to find build libstemmer can build PostgreSQL. I thought the whole point of this feature was to have a version of Tsearch which just worked for users. As annoying as it may be to keep it updated, I think it's probably worth it from a user experience standpoint. However, we should definitely put the exact libstemmer C files, as distributed by the project, somewhere so that updating stemmer each time we do a patch release is simply a matter of download and rsync. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Tsearch vs Snowball, or what's a source file?
Josh Berkus [EMAIL PROTECTED] writes: Is there a reasonable way to treat libstemmer as an external library? Hmmm ... do we want to do that if we're distributing it in core? That would require us to have a --with-tsearch compile switch so that people who don't want to find build libstemmer can build PostgreSQL. I thought the whole point of this feature was to have a version of Tsearch which just worked for users. True. I just noticed that the upstream master distribution (their compiler source and .sbl files) weighs in at half the size of the libstemmer distribution: 68K vs 129K in tar.gz format --- no doubt due to all the repetitive boilerplate in the generated files. I'm not sure if the compiler source has any portability issues, but if not it is interesting to consider the idea of bundling the master distro instead of libstemmer. This would fix at least one issue that we otherwise will have, which is that the #include-paths they chose to generate libstemmer with seem a bit unfriendly for our purposes. The #include commands are determined by compiler options, so we could fix them if compiling the .sbl files on the fly. This makes no difference in terms of the ease of tracking their changes, of course, but it just feels better to me to be distributing real source code and not derived files. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[HACKERS] tracker project
All, Following some public and not so public discussion a little while back, I decided to ask a group of people to help me to create an experimental tracker instance for bugs and possibly features, to assist our development efforts. The people I chose were some I have worked with before, e.g. on the buildfarm, or who had expressed general support for the idea, and who I thought could usefully contribute to such a project. The idea is to run this for one release cycle, at the end of which time we should have enough experience to know if it could help or hinder our efforts. At the moment we are still discussing both scope and software candidates, and exploring a couple of candidates. There is no intention to be secret, but we also don't want to be endlessly debating possible merits, which has been an unfortunate characteristic of several discussions over the years. Rather, we want to demonstrate what we believe to be the benefits, as clearly and directly as possible, by actual use. We currently have a project on pgfoundry, including a mailing list, at http://pgfoundry.org/projects/tracker/ and a wiki at http://www.kaltenbrunner.cc/wiki/index.php/Pgtracker:evaluation Anyone who is interested in contributing is welcome to join in, especially if they have a history of involvement in PostgreSQL development. cheers andrew ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] syslogger line-end processing infelicity
Magnus Hagander wrote: My second thought is that we should quite possibly abandon this translation altogether - we know that our COPY code is quite happy with either style of line ending, as long as the file is consistent, and also many Windows programs will quite happily read files with Unix style line endings (e.g. Wordpad, although not Notepad). Agreed. We shouldn't touch the data. Every editor I know on windows *except* notepad can deal just fine with Unix line endings, and if you're logging your queries your logfile will be too large to work well in notepad anyway :-) OK, so do we consider this a bug fix and backpatch it all the way to 8.0? Nobody's complained so far that I know of, and it's only damaged logs, not damaged primary data. I'm inclined just to fix it in HEAD, and release note the change in behaviour. It will matter more when we get machine-readable logs. cheers andrew ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] ERROR: index row size
Hello, I'm having a big trouble with the index size! I have looked for a solution in the internet, but the solutions that I found don't fit for me! I developed a new data type using C and add this new type on PostgreSQL. Basically, the data type is: (DateADT, DateADT) with some temporal rules that I'm researching! The data type is ok; the in, out, receive and send functions are ok; some operations are ok. But the index operators and functions are not working properly! Actually I can use them, but in some cases an error occurs about index row size. I'm sure that the functions in, out, receive and send are well implemented. I think the problem is that the data type is really big and needs a big index. The implementation code of the data type is:: typedef struct t_periodo { DateADT tvi; DateADT tvf; } Periodo; Any ideas to solve my problem? Perhaps increasing the BLOCKSIZE could be one solution. Not very smart one, but can solve temporally my problem?!?! Thanks in advance!
Re: [HACKERS] ERROR: index row size
Rodrigo Sakai [EMAIL PROTECTED] writes: I developed a new data type using C and add this new type on PostgreSQL. Basically, the data type is: (DateADT, DateADT) with some temporal rules that I'm researching! The data type is ok; the in, out, receive and send functions are ok; some operations are ok. But the index operators and functions are not working properly! Actually I can use them, but in some cases an error occurs about index row size. You have a bug in your datatype code. There's no way an 8-byte datatype should produce that error. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate