Re: [HACKERS] Plan invalidation design
Tom Lane wrote: Lukas Kahwe Smith <[EMAIL PROTECTED]> writes: I remember that there was discussion about invalidating plans who's estimated cost turn out to be severely off when executed. That's something we might think about after the infrastructure is in place. But the question to answer is why the re-plan won't yield just the same plan as before. Yeah, also invalidating plans like this only really makes sense once we have the ability to keep multiple plans around for different sets of parameters. Otherwise we could also end up in a situation where after every execution we determine that a re-plan is necessary because the parameters used differ in distribution. regards, Lukas ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Plan invalidation design
Tom Lane wrote: place. But the question to answer is why the re-plan won't yield just the same plan as before. oh and when the estimated cost repeatedly do not match the actual cost, we of course want to generate an email with all relevant information that is send to this list ;) regards, Lukas ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] [PATCHES] WIP patch - INSERT-able log statements
Added -hackers to CC:. On 2/18/07, Greg Smith <[EMAIL PROTECTED]> wrote: I've thought a bit about how to implement this TODO already (I have a log file parser and I hate maintaining it) Any problem using pgFouine? Also, I feel that supporting the whole log_line_prefix syntax for this feature is not just overkill, it's a bad idea. Output everything in a standard, complete format instead, and then it becomes easy for the community at large to build tools on top of that to analyze the log database entries instead of having so many ad-hoc approaches. You want a subset, use a view or copy just the fields you want into another table. I would guess this simplifies the patch as well. I agree with you on this point. We need every information we can put in the database (database, user and so on) in a structured way. Logging statements is not the only thing to do IMHO. Logging errors in another table is important too. I'm pretty sure there are a lot people who don't know if there are errors in their statements. Anyway, if something is developed to do that, I'll extend pgFouine to support it (or I'll write another application to do it) and I'm pretty sure others will do. The most annoying and difficult part of the work on pgFouine/PQA/... is to maintain the parser. If we don't have to do that anymore, we'll have more time to think about new features. In fact, it was a long plan for pgFouine to be able to store the results of its analysis in a database so it's a good news if it's implemented in core. -- Guillaume ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] return varchar from C function
Hi, just for fun, I wrote a little postgresql contrib, who has a C function called myfun inside it. The function myfun returns a value , now I return a cstring type value, and it works fine if I run from psql shell: select value from myfun(paramteres); but I can't do an insert like: insert (charfield) select value from myfun(paramteres); becuse I have an error, exactly casting error. I want that myfun returns a char,varchar, or text type. Where I can find more documentation about differences between cstring,char,text etc...? Regards, Enrico ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Short varlena header bit-packing options
I'm currently doing the second of these two proposals: --- Begin Message --- Gregory Stark <[EMAIL PROTECTED]> writes: > In any case it seems a bit backwards to me. Wouldn't it be better to > preserve bits in the case of short length words where they're precious > rather than long ones? If we make 0xxx the 1-byte case it means ... Well, I don't find that real persuasive: you're saying that it's important to have a 1-byte not 2-byte header for datums between 64 and 127 bytes long. Which is by definition less than a 2% savings for those values. I think its's more important to pick bitpatterns that reduce the number of cases heap_deform_tuple has to think about while decoding the length of a field --- every "if" in that inner loop is expensive. I realized this morning that if we are going to preserve the rule that 4-byte-header and compressed-header cases can be distinguished from the data alone, there is no reason to be very worried about whether the 2-byte cases can represent the maximal length of an in-line datum. If you want to do 16K inline (and your page is big enough for that) you can just fall back to the 4-byte-header case. So there's no real disadvantage if the 2-byte headers can only go up to 4K or so. This gives us some more flexibility in the bitpattern choices. Another thought that occurred to me is that if we preserve the convention that a length word's value includes itself, then for a 1-byte header the bit pattern 1000 is meaningless --- the count has to be at least 1. So one trick we could play is to take over this value as the signal for "toast pointer follows", with the assumption that the tuple-decoder code knows a-priori how big a toast pointer is. I am not real enamored of this, because it certainly adds one case to the inner heap_deform_tuple loop and it'll give us problems if we ever want more than one kind of toast pointer. But it's a possibility. Anyway, a couple of encodings that I'm thinking about now involve limiting uncompressed data to 1G (same as now), so that we can play with the first 2 bits instead of just 1: 00xx4-byte length word, aligned, uncompressed data (up to 1G) 01xx4-byte length word, aligned, compressed data (up to 1G) 100x1-byte length word, unaligned, TOAST pointer 10102-byte length word, unaligned, uncompressed data (up to 4K) 10112-byte length word, unaligned, compressed data (up to 4K) 11xx1-byte length word, unaligned, uncompressed data (up to 63b) or 00xx4-byte length word, aligned, uncompressed data (up to 1G) 010x2-byte length word, unaligned, uncompressed data (up to 8K) 011x2-byte length word, unaligned, compressed data (up to 8K) 10001-byte length word, unaligned, TOAST pointer 1xxx1-byte length word, unaligned, uncompressed data (up to 127b) (xxx not all zero) This second choice allows longer datums in both the 1-byte and 2-byte header formats, but it hardwires the length of a TOAST pointer and requires four cases to be distinguished in the inner loop; the first choice only requires three cases, because TOAST pointer and 1-byte header can be handled by the same rule "length is low 6 bits of byte". The second choice also loses the ability to store in-line compressed data above 8K, but that's probably an insignificant loss. There's more than one way to do it ... regards, tom lane --- End Message --- There's been some murmuring on the list about simplifying this by junking the two-byte case. I don't see any particular reason to do this. The code is virtually the same either way. It saves one branch in heap_deform_tuple but that branch should basically never be reached since virtually *all* data on disk will be either one-byte headers or two-byte headers. The only big exception will be toast tables themselves. For the sake of completeness the second proposal would become this if that was done: 00xx4-byte length word, aligned, uncompressed data (up to 1G) 01xx4-byte length word, aligned, compressed data (up to 1G) 10001-byte length word, unaligned, TOAST pointer 1xxx1-byte length word, unaligned, uncompressed data (up to 127b) (xxx not all zero) There's no code space to indicate if the out-of-line TOAST pointer is compressed or not. We could just compare the rawsize with the extsize since the toaster guarantees not to compress if there's no gain. Alternatively because of padding we have 24 bytes to play with inside the toast pointer. We could set a flag in there. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Short varlena header bit-packing options
I've sent a message on the subject of 2-byte headers and choice of bitpatterns twice during the mail outage last night. They may come through eventually but please ignore them as the situation has changed as new evidence has come to light. I was previously doing the second of the two bitpatterns mentioned in Tom's email: --- Begin Message --- Gregory Stark <[EMAIL PROTECTED]> writes: > In any case it seems a bit backwards to me. Wouldn't it be better to > preserve bits in the case of short length words where they're precious > rather than long ones? If we make 0xxx the 1-byte case it means ... Well, I don't find that real persuasive: you're saying that it's important to have a 1-byte not 2-byte header for datums between 64 and 127 bytes long. Which is by definition less than a 2% savings for those values. I think its's more important to pick bitpatterns that reduce the number of cases heap_deform_tuple has to think about while decoding the length of a field --- every "if" in that inner loop is expensive. I realized this morning that if we are going to preserve the rule that 4-byte-header and compressed-header cases can be distinguished from the data alone, there is no reason to be very worried about whether the 2-byte cases can represent the maximal length of an in-line datum. If you want to do 16K inline (and your page is big enough for that) you can just fall back to the 4-byte-header case. So there's no real disadvantage if the 2-byte headers can only go up to 4K or so. This gives us some more flexibility in the bitpattern choices. Another thought that occurred to me is that if we preserve the convention that a length word's value includes itself, then for a 1-byte header the bit pattern 1000 is meaningless --- the count has to be at least 1. So one trick we could play is to take over this value as the signal for "toast pointer follows", with the assumption that the tuple-decoder code knows a-priori how big a toast pointer is. I am not real enamored of this, because it certainly adds one case to the inner heap_deform_tuple loop and it'll give us problems if we ever want more than one kind of toast pointer. But it's a possibility. Anyway, a couple of encodings that I'm thinking about now involve limiting uncompressed data to 1G (same as now), so that we can play with the first 2 bits instead of just 1: 00xx4-byte length word, aligned, uncompressed data (up to 1G) 01xx4-byte length word, aligned, compressed data (up to 1G) 100x1-byte length word, unaligned, TOAST pointer 10102-byte length word, unaligned, uncompressed data (up to 4K) 10112-byte length word, unaligned, compressed data (up to 4K) 11xx1-byte length word, unaligned, uncompressed data (up to 63b) or 00xx4-byte length word, aligned, uncompressed data (up to 1G) 010x2-byte length word, unaligned, uncompressed data (up to 8K) 011x2-byte length word, unaligned, compressed data (up to 8K) 10001-byte length word, unaligned, TOAST pointer 1xxx1-byte length word, unaligned, uncompressed data (up to 127b) (xxx not all zero) This second choice allows longer datums in both the 1-byte and 2-byte header formats, but it hardwires the length of a TOAST pointer and requires four cases to be distinguished in the inner loop; the first choice only requires three cases, because TOAST pointer and 1-byte header can be handled by the same rule "length is low 6 bits of byte". The second choice also loses the ability to store in-line compressed data above 8K, but that's probably an insignificant loss. There's more than one way to do it ... regards, tom lane --- End Message --- There had been some mention on the list about ditching the 2-byte cases which I didn't understand because I don't think they add any overhead or complexity to the critical heap_form/deform_tuple code or much complexity to the typedef or macro interface. However there's another factor I hadn't considered. It becomes quite awkward in tuptoaster.c to use pg_lzcompress. Since that generates a regular 4-byte datum and then tuptoaster needs to convert it to a 2-byte datum. Worse, if it's over 4kB it *can't*. It's going to push it out as an external toasted value but how does it represent it in the meantime? This could all be fixed, it's just a SMOP to restructure tuptoaster.c to keep things in regular datums with an array of flags somewhere else indicating which have been compressed right up until the moment before it reruns heap_form_tuple. But I just punted for now. We can do this in a second patch and rejigger the bitpatterns selected at that time. It doesn't look like everyone's convinced we want them anyways. So this is the set of bitpatterns I'm working with now: 00xx4-byte length word, aligned, uncompressed data (up to 1G) 01xx4-byte length word, aligned, compres
[HACKERS] Short varlena header bit-packing options
I've sent a message on the subject of 2-byte headers and choice of bitpatterns twice during the mail outage last night. They may come through eventually but please ignore them as the situation has changed as new evidence has come to light. I was previously doing the second of the two bitpatterns mentioned in Tom's email: --- Begin Message --- Gregory Stark <[EMAIL PROTECTED]> writes: > In any case it seems a bit backwards to me. Wouldn't it be better to > preserve bits in the case of short length words where they're precious > rather than long ones? If we make 0xxx the 1-byte case it means ... Well, I don't find that real persuasive: you're saying that it's important to have a 1-byte not 2-byte header for datums between 64 and 127 bytes long. Which is by definition less than a 2% savings for those values. I think its's more important to pick bitpatterns that reduce the number of cases heap_deform_tuple has to think about while decoding the length of a field --- every "if" in that inner loop is expensive. I realized this morning that if we are going to preserve the rule that 4-byte-header and compressed-header cases can be distinguished from the data alone, there is no reason to be very worried about whether the 2-byte cases can represent the maximal length of an in-line datum. If you want to do 16K inline (and your page is big enough for that) you can just fall back to the 4-byte-header case. So there's no real disadvantage if the 2-byte headers can only go up to 4K or so. This gives us some more flexibility in the bitpattern choices. Another thought that occurred to me is that if we preserve the convention that a length word's value includes itself, then for a 1-byte header the bit pattern 1000 is meaningless --- the count has to be at least 1. So one trick we could play is to take over this value as the signal for "toast pointer follows", with the assumption that the tuple-decoder code knows a-priori how big a toast pointer is. I am not real enamored of this, because it certainly adds one case to the inner heap_deform_tuple loop and it'll give us problems if we ever want more than one kind of toast pointer. But it's a possibility. Anyway, a couple of encodings that I'm thinking about now involve limiting uncompressed data to 1G (same as now), so that we can play with the first 2 bits instead of just 1: 00xx4-byte length word, aligned, uncompressed data (up to 1G) 01xx4-byte length word, aligned, compressed data (up to 1G) 100x1-byte length word, unaligned, TOAST pointer 10102-byte length word, unaligned, uncompressed data (up to 4K) 10112-byte length word, unaligned, compressed data (up to 4K) 11xx1-byte length word, unaligned, uncompressed data (up to 63b) or 00xx4-byte length word, aligned, uncompressed data (up to 1G) 010x2-byte length word, unaligned, uncompressed data (up to 8K) 011x2-byte length word, unaligned, compressed data (up to 8K) 10001-byte length word, unaligned, TOAST pointer 1xxx1-byte length word, unaligned, uncompressed data (up to 127b) (xxx not all zero) This second choice allows longer datums in both the 1-byte and 2-byte header formats, but it hardwires the length of a TOAST pointer and requires four cases to be distinguished in the inner loop; the first choice only requires three cases, because TOAST pointer and 1-byte header can be handled by the same rule "length is low 6 bits of byte". The second choice also loses the ability to store in-line compressed data above 8K, but that's probably an insignificant loss. There's more than one way to do it ... regards, tom lane --- End Message --- There had been some mention on the list about ditching the 2-byte cases which I didn't understand because I don't think they add any overhead or complexity to the critical heap_form/deform_tuple code or much complexity to the typedef or macro interface. However there's another factor I hadn't considered. It becomes quite awkward in tuptoaster.c to use pg_lzcompress. Since that generates a regular 4-byte datum and then tuptoaster needs to convert it to a 2-byte datum. Worse, if it's over 4kB it *can't*. It's going to push it out as an external toasted value but how does it represent it in the meantime? This could all be fixed, it's just a SMOP to restructure tuptoaster.c to keep things in regular datums with an array of flags somewhere else indicating which have been compressed right up until the moment before it reruns heap_form_tuple. But I just punted for now. We can do this in a second patch and rejigger the bitpatterns selected at that time. It doesn't look like everyone's convinced we want them anyways. So this is the set of bitpatterns I'm working with now: 00xx4-byte length word, aligned, uncompressed data (up to 1G) 01xx4-byte length word, aligned, compres
Re: [HACKERS] Plan invalidation design
Tom Lane wrote: I'm starting to think about the long-wanted plan invalidation mechanism. Here's a sketch --- anyone see any problems? * Create a new module, say src/backend/utils/cache/plancache.c, that we will put in charge of all long-lived plans --- or at least those cached by PREPARE, plpgsql, and RI triggers. I'm unsure whether we should make all SPI plans work this way or not; it's possible that doing so would change SPI's API behavior enough to break user-written code. Any thoughts on that? * plancache.c will have two basic functions: 1. Given a query's raw parse tree (that is, the raw output of gram.y), analyze and plan the query. Store both the parse tree and plan in a backend-local cache table, and return a handle for the table entry as well as the plan tree. 2. Given a handle for a previously stored query, check to see if the plan is still up to date; if not, regenerate it from the raw parse tree (note this could result in failure, eg if a column used by the query has been dropped). Then return the plan tree. What do we do in the case of failure? Die in the same way we do now when you can't use the plan that's been made? We probably want to return a direct pointer to the cached plan tree instead of making a copy. This should be safe, because the executor now treats plan trees as read-only, but it does mean that when plan invalidation occurs the cached plan tree might still be in use. We'll probably need to have a notion of a reference count: so the two functions above would increment the plan's refcount and there would be a third "ReleasePlanCache" function to call when done using a plan (and, hence, these references would need to be supported by the ResourceManager mechanism). excuse my ignorance here, but under what circumstances is a plan in use for a single backend at the same time as it's invalidated. What potential failures does this introduce? If you are using the old plan, and the new plan fails as mentioned above. Where are we then? Note that the source object for caching is a raw parse tree. This should work since we already require that gram.y not look into the database during its processing; therefore, the raw tree need never be invalidated. It'd be conceptually simpler if we passed in a query string instead, but I don't think that works for PREPARE, because it might be embedded in a multi-command string. (We do probably want to pass in the original query string too, if available, because it's needed for syntax error reporting.) nodes/copyfuncs.c will need some expansion, as I don't believe it has coverage for all raw-parse-tree node types. If the syntax has become invalid, that is because the columns in the query, or tables have changed. Is this information not available in the plan tree? What other circumstances could you have a syntax error from a query that has been successfully planned and parsed? I've read this paragraph 3 times now and am still quite unclear about the requirements for the original query to be stored. Is the plan cache going to replace the syntax check which I thought would have been done in gram.y. Invalidation will be detected by having plancache.c watch for relcache invalidation events, using the existing inval.c callback mechanism. On any relcache inval, traverse the plan cache looking for plans that mention the invalidated relation in their rangetables, and mark them as needing to be regenerated before next use. (If they currently have refcount zero, we could delete the plan part of the cache entry immediately.) Relcache inval casts a fairly wide net; for example, adding or dropping an index will invalidate all plans using the index's table whether or not they used that particular index, and I believe that VACUUM will also result in a relcache inval due to updating the table's pg_class row. I think this is a good thing though --- for instance, after adding an index it seems a good idea to replan to see if the new index is useful, and replanning after a VACUUM is useful if the table has changed size enough to warrant a different plan. OTOH this might mean that plans on a high-update-traffic table never survive very long because of autovacuum's efforts. If that proves to be a problem in practice we can look at ways to dial down the number of replans, but for the moment I think it's more important to be sure we *can* replan at need than to find ways to avoid replans. Note that I'm currently intending to detect only relcache invals, not changes to functions or operators used in the plan. (Relcache inval will cover view redefinitions, though.) We could extend it to handle that later, but it looks like a lot more mechanism and overhead for not a lot of gain. AFAICS there are only three cases where there'd be a benefit: * if you redefine an immutable function, any places where its result has been pre-computed by constant-folding wouldn't get updated without inval. If you replan and immutable function, aren't y
Re: [HACKERS] return varchar from C function
Hello cstring is clasic c (zero terminated) string and is used only in some PostgreSQL functions. This type isn't compatible with text and you have to explicit cast trick with textin function. root=# select textin(('abc'::cstring)); textin abc (1 row) Standard is using VARLENA types like text, varchar, ... You can find info in PostgreSQL FAQ. These types are similar Pascal string -> first four bytes cary length and next bytes are data without spec. ending symbol. http://www.varlena.com/GeneralBits/68.php using text type in C function is simple: Datum *const_fce(PG_FUNCTION_ARGS) { text *txt = palloc(5 + VARHDRSZ); memcpy(VARDATA(txt), "pavel", 5); VARATT_SIZE(txt) = 5 + VARHDRSZ; PG_RETURN_TEXT_P(txt); } please look to source code my orafce contrib module (you can find it on pgfoundry). Regards Pavel Stehule _ Najdete si svou lasku a nove pratele na Match.com. http://www.msn.cz/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] return varchar from C function
"Pavel Stehule" <[EMAIL PROTECTED]> writes: > > Datum *const_fce(PG_FUNCTION_ARGS) > { >text *txt = palloc(5 + VARHDRSZ); >memcpy(VARDATA(txt), "pavel", 5); >VARATT_SIZE(txt) = 5 + VARHDRSZ; > >PG_RETURN_TEXT_P(txt); > } Much better practice is to use the input function of the data type you want to convert to: { text *txt = DatumGetTextP(DirectFunctionCall1(textin, CStringGetDatum("pavel"))); PG_RETURN_TEXT_P(txt); } -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] return varchar from C function
On Sun, Feb 18, 2007 at 12:56:08PM -0500, [EMAIL PROTECTED] wrote: > Hi, > just for fun, I wrote a little postgresql contrib, > who has a C function called myfun inside it. > The function myfun returns a value , now I return > a cstring type value, and it works fine if > I run from psql shell: You don't provide the definition you used, but: > select value from myfun(paramteres); This isn't the usual way to return things, unless it is a set returning function. Did you provide the return type at declaration time? Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] return varchar from C function
"Pavel Stehule" <[EMAIL PROTECTED]> writes: > > Datum *const_fce(PG_FUNCTION_ARGS) > { >text *txt = palloc(5 + VARHDRSZ); >memcpy(VARDATA(txt), "pavel", 5); >VARATT_SIZE(txt) = 5 + VARHDRSZ; > >PG_RETURN_TEXT_P(txt); > } Much better practice is to use the input function of the data type you want to convert to: { text *txt = DatumGetTextP(DirectFunctionCall1(textin, CStringGetDatum("pavel"))); PG_RETURN_TEXT_P(txt); } Generally we want to do something with text value (concat, trim, ..) and then call textin isn't practic. Respective, for bussness processes implemented in C is textin and similar functions right. For string toolkit is better direct work with VARLENA struct. Regards and thank you note Pavel _ Emotikony a pozadi programu MSN Messenger ozivi vasi konverzaci. http://messenger.msn.cz/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] FETCH from scroll cursor does not return count in libpq PQcmdStatus
Using libpq with pg 8.2.1 on WinXP Pro SP2. After a PQExec with a fetch from a scroll cursor, I only get "FETCH" from PQcmdStatus, and not "FETCH count" as described in the 8.2.1 docs. However, PQntuples returns the number of fetched rows. If I replace the FETCH by a MOVE, PQcmdStatus returns "MOVE count". The discussion we had on irc #postgresql suggested that this might be a bug, or have we missed some points regarding the use of scroll cursors? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Plan invalidation design
Russell Smith <[EMAIL PROTECTED]> writes: > Tom Lane wrote: >> 2. Given a handle for a previously stored query, check to see if the plan >> is still up to date; if not, regenerate it from the raw parse tree (note >> this could result in failure, eg if a column used by the query has been >> dropped). Then return the plan tree. >> > What do we do in the case of failure? Die in the same way we do now > when you can't use the plan that's been made? Well, the difference is that at plan use you might get an error that currently could only occur at initial query parsing. I don't see that this is a big deal, but it will be a change in behavior. One thing I realized since yesterday is that it'll have to be possible for the caller to tell whether the plan has changed since he last saw it (perhaps via a re-plan counter included in the cache entry). It's entirely possible that the set of output columns will have changed, and so the caller may need to re-do derived work. For example plpgsql will need to re-do its analysis of whether a plan is "simple". What we might want in some cases is for the caller to decide to error out if the set of output columns changes. I think this is likely appropriate for queries prepared via the Parse protocol message, because we may have already told the client what the column set is, and it won't be prepared to deal with getting a different set of columns back. I'm not sure now whether that's appropriate for every call site, but if it is then we could avoid some of these definitional issues. >> We probably want to return a direct pointer to the cached plan tree >> instead of making a copy. This should be safe, because the executor now >> treats plan trees as read-only, but it does mean that when plan >> invalidation occurs the cached plan tree might still be in use. > excuse my ignorance here, but under what circumstances is a plan in use > for a single backend at the same time as it's invalidated. There shouldn't be any structural changes in a table once you've acquired lock on it, but there could be statistics changes, eg from VACUUM; and the relcache inval mechanism currently doesn't distinguish those cases. We'd need some such concept anyway if we ever extend the invalidation to cover functions, because there's no locking on them. > What other circumstances could you have a syntax error from a query that > has been successfully planned and parsed? DROP COLUMN, DROP FUNCTION, ... lots of possibilities. > I've read this paragraph 3 times now and am still quite unclear about > the requirements for the original query to be stored. Is the plan cache > going to replace the syntax check which I thought would have been done > in gram.y. We don't need to re-do that syntax check, precisely because it's purely a syntax check and doesn't involve any database state. > If you replan and immutable function, aren't you possibly messing up a > functional index that is using the old function. Hey, if you change an > immutable function that has an index, you are in trouble already. True. > Replanning pl/pgsql with CREATE TEMP TABLE would be a good use here. > You loose the preplanning benefits, but we remove the ongoing problem > where people report that their temp-table isn't working. Yeah, that's one of the main arguments why this is worth the trouble. > Even function alterations to pl/pgsql should a replan. But of more > interest is being able to use the old function for currently running > transactions when the function is changed. Last time I tried to edit a > pl/pgsql function while it was being used by a transaction, the > transaction failed because the function definition changed. I fixed a couple of bugs in that area recently --- the current behavior should be that any active execution of a plpgsql function will finish out using the function definition that was current when it started. But that's something that's local to the PL function manager and doesn't really have anything to do with plans using the function. Inlined SQL functions are the exception to the rule that a plan doesn't know exactly what a function it calls does. > Is the race condition here any more likely to happen than the failure of > a re plan when something has changed from underneath the original query? It's not really the same thing --- the problem is making sure that your check for table changes is accurate, and doesn't miss a change that commits just after you look. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] RFC: Temporal Extensions for PostgreSQL
Ühel kenal päeval, L, 2007-02-17 kell 11:26, kirjutas Tom Lane: > Hannu Krosing <[EMAIL PROTECTED]> writes: > > How easy/hard would it be to create unique indexes on tinterval (unique > > here meaning non-overlapping) ? > > "Overlapping" is not an equality relation (it fails the transitive law), > so I'm not entirely sure what "unique" means in this context ... Well, unique is usually defined as "not equal to any other". And "not equal" also fails transitive law. So I can't see, how failing it makes the defining the meaning of "unique" harder. What I mean by "unique interval" here is an interval over unique points, or just an interval which does not overlap any other interval. If our uniqueness implementation relies on reverse operation (equality) being transitive, then it can't be used for enforcing unique intervals. But it should be trivial to test at insertion time if the interval overlaps with any existing intervals, as it has to be inserted before the previous unique interval and after the next interval. In other words, unique interval index is a unique index of interval start times with extra condition that interval end time is not >= than the next intervals start. > but I can promise you you can't make it work with btree. Sorry to hear that. btree seemed like the best candidate for doing it. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [COMMITTERS] pgsql: Better fix for determining minimum and maximum int64 values that
Alvaro Herrera wrote: > Peter Eisentraut wrote: > > Log Message: > > --- > > Better fix for determining minimum and maximum int64 values that > > doesn't require stdint.h and works for "busted" int64. > > Nice trick -- is this worth being in c.h or thereabouts? I was thinking about adding something like PG_INT64_MAX etc. to c.h, but that might have required adding limit.h to c.h and perhaps other rearrangements, and that would have gone too far for the moment. If more call sites use it, it should be refactored. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] FETCH from scroll cursor does not return count in libpq PQcmdStatus
"Knut P. Lehre" <[EMAIL PROTECTED]> writes: > After a PQExec with a fetch from a scroll cursor, I only get "FETCH" from > PQcmdStatus, and not "FETCH count" as described in the 8.2.1 docs. However, > PQntuples returns the number of fetched rows. If I replace the FETCH by a > MOVE, PQcmdStatus returns "MOVE count". Hmm ... some experimentation shows that 7.2 and 7.3 return a count in the FETCH command tag, but no earlier or later version does. It's clear though that libpq expects FETCH to have a count, see PQcmdTuples(). So I'd have to say this is something we broke in 7.4 and nobody noticed until now. The culprit seems to be pquery.c, which is assuming that PORTAL_UTIL_SELECT queries always return the default command tag. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Plan invalidation design
"Tom Lane" <[EMAIL PROTECTED]> writes: > Russell Smith <[EMAIL PROTECTED]> writes: >> Tom Lane wrote: >>> 2. Given a handle for a previously stored query, check to see if the plan >>> is still up to date; if not, regenerate it from the raw parse tree (note >>> this could result in failure, eg if a column used by the query has been >>> dropped). Then return the plan tree. >>> >> What do we do in the case of failure? Die in the same way we do now >> when you can't use the plan that's been made? > > Well, the difference is that at plan use you might get an error that > currently could only occur at initial query parsing. I don't see that > this is a big deal, but it will be a change in behavior. > > One thing I realized since yesterday is that it'll have to be possible > for the caller to tell whether the plan has changed since he last saw > it (perhaps via a re-plan counter included in the cache entry). It's > entirely possible that the set of output columns will have changed, > and so the caller may need to re-do derived work. For example plpgsql > will need to re-do its analysis of whether a plan is "simple". Hm. The set of output columns could change? How? If you prepare "select *" and add a column, you're saying the query should start failing? That seems strange given the behaviour of views, which is that once parsed the list of columns is written in stone. It seems prepared queries should work the same way that views work and remember which physical column they were referring to previously. (Personally I don't like that behaviour but it feels like this should be consistent with it.) I guess you do have a serious problem if you redefine the type of a column or redefine a view (though I think you would have to drop and recreate it, CREATE OR REPLACE wouldn't let you change the output columns). >>> We probably want to return a direct pointer to the cached plan tree >>> instead of making a copy. This should be safe, because the executor now >>> treats plan trees as read-only, but it does mean that when plan >>> invalidation occurs the cached plan tree might still be in use. > >> excuse my ignorance here, but under what circumstances is a plan in use >> for a single backend at the same time as it's invalidated. Invalidation messages can occur at certain. If you access any new table or object while the plan is still running, either because you're in a plpgsql loop fetching records from it, or because some function you're calling in the query runs some other sql against another table then you'll receive any pending invalidation messages. It should only be possible to receive messages from operations that are legal to execute while someone is using the object. So, for example, creating new indexes. So if you're actively in the process of using the plan it shouldn't be necessary to junk it. Perhaps that means it would be handy to have two kinds of invalidation messages. Hard invalidations mean that anybody with cached plans should immediately junk them and throw up nasty errors and assertion failures if they're in a state when that shouldn't happen. And Soft invalidations mean you shouldn't start any new queries but any that are executing are still ok. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] RFC: Temporal Extensions for PostgreSQL
On Sun, Feb 18, 2007 at 08:14:00PM +0200, Hannu Krosing wrote: > > but I can promise you you can't make it work with btree. > > Sorry to hear that. btree seemed like the best candidate for doing it. The problem with btree is that it's designed to work with a compare function which compares two datums and returns greater than, equal to or less than. You can't build such an operator for intervals, so there's a problem. However, if you decree that a zero return value mean "collision for the purposes of a unique index" then you could probably make it work. *However* using it for lookups probably won't work very well then... Have a nice day, -- Martijn van Oosterhout http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to > litigate. signature.asc Description: Digital signature
Re: [HACKERS] Plan invalidation design
Gregory Stark <[EMAIL PROTECTED]> writes: > If you prepare "select *" and add a column, you're saying the query should > start failing? Either fail or change output; which you like better? The whole point of this exercise is to support plpgsql functions that do something like create temp table foo ... select * into rec from foo ... drop table foo ... and from the system's point of view successive executions of this sequence are talking about completely different tables. There's no reason to suppose they have the same expansion of "*". I'd also like to think that the semantics of a plpgsql function are not going to be different the first time it's executed in a session than subsequent times, which suggests that indeed it ought to cope with "*" expanding differently from last time. To do what you're suggesting, we'd have to redesign parse analysis so that expansion of "*" was a completely separate step from everything else, producing a parse tree that's still raw except for that one change; and I'm not sure what other strange decisions we'd have to make. I don't find that an attractive idea. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] return varchar from C function
Thanks for all your answers, I begin to study. Enrico -- Enrico <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] New feature request: FlashBack Query
Ühel kenal päeval, L, 2007-02-17 kell 22:49, kirjutas Chad Wagner: > > > However, they don't have vacuum, we do. > > Right, and I think that is more or less because Oracle doesn't need > it. Vacuum's main purpose (correct me if I am wrong) is to > recover/mark rows that are no longer used, and Oracle essentially > reuses the space immediately. > > Obviously with Oracle if you bloat out a table and delete a ton of > rows then you have to rebuild the table, but that is more or less the > same problem that PostgreSQL has and where vacuum full comes into > play. > > The only benefit with the Oracle model is that you can achieve > flashback, which is a very rarely used feature in my book. We can have flashbacks up to the last vacuum. It is just not exposed. Don't vacuum, and you have the whole history. (Actually you can't go for more than 2G transactions, or you get trx id rollover). To get a flashback query, you "just" have to construct a snapshot from that time and you are done. We don't store transaction times anywere, so the flashback has to be by transaction id, but there is very little extra work involved. We just don't have syntax for saying "SELECT ... AS SEEN BY TRANSACTION XXX" AFAIK, Oracles flashbacks also can go as far back as there are rollback segments. Postgres' original design prescribed, that VACUUM would not delete dead tuples, but just move them to history tables on cheap(er) WORM storage. Doing that would have very little overhead (except writing the old tuples) and would not need any fundamental changes to how we do things currently. > The disadvantages is likely overhead to perform the "rollback" and > possibly more scattered reads. I've also heard reports, that doing concurrent data loading and big analysis queries is a royal pain in Oracle. > I can say that I have used it, and it has come in handy, but hardly > worth it. The benefit with the PostgreSQL model is the likelihood of > the old rows being inline with the rest of the table data, potentially > reducing scattered reads. The disadvantage is vacuuming, it seems to > be often overlooked -- possibly solved by defaulting autovacuum to on? > (seems to be the way Oracle is heading, defaulting statistics > collection to on and other management features). > -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] New feature request: FlashBack Query
Hannu Krosing wrote: > Ühel kenal päeval, L, 2007-02-17 kell 22:49, kirjutas Chad Wagner: > >> >> However, they don't have vacuum, we do. >> >> Right, and I think that is more or less because Oracle doesn't need >> it. Vacuum's main purpose (correct me if I am wrong) is to >> recover/mark rows that are no longer used, and Oracle essentially >> reuses the space immediately. >> >> Obviously with Oracle if you bloat out a table and delete a ton of >> rows then you have to rebuild the table, but that is more or less the >> same problem that PostgreSQL has and where vacuum full comes into >> play. >> >> The only benefit with the Oracle model is that you can achieve >> flashback, which is a very rarely used feature in my book. > > We can have flashbacks up to the last vacuum. It is just not exposed. > Don't vacuum, and you have the whole history. (Actually you can't go for > more than 2G transactions, or you get trx id rollover). > > To get a flashback query, you "just" have to construct a snapshot from > that time and you are done. We don't store transaction times anywere, so > the flashback has to be by transaction id, but there is very little > extra work involved. We just don't have syntax for saying "SELECT ... AS > SEEN BY TRANSACTION XXX" Well this is certainly interesting. What do we think it would take to enable the functionality? Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] n-gram search function
Hi Oleg, On 2/17/07, Oleg Bartunov wrote: 3-gram is implemented as a contrib/pg_trgm. It currently uses GiST index, but may be enhanced with the GiN. As I'm facing the same problem, I've taken a look to pg_trgm. At the moment, my opinion is quite mixed but perhaps I did something wrong. I have a table (100k rows) with a location name in it generally composed of several words but not that long. I created the index directly on this column (ie I don't create a table with each word of the location name). Then I tried a few queries. Here is an example: prod=# explain analyze select nomlieu from lieu where nomlieu ilike '%gaumont%'; QUERY PLAN - Seq Scan on lieu (cost=0.00..7230.20 rows=7 width=21) (actual time=7.768..556.930 rows=39 loops=1) Filter: ((nomlieu)::text ~~* '%gaumont%'::text) Total runtime: 557.066 ms (3 rows) _prod=# explain analyze select nomlieu from lieu where nomlieu % 'gaumont'; QUERY PLAN - Bitmap Heap Scan on lieu (cost=3.37..200.80 rows=106 width=21) (actual time=689.799..690.035 rows=36 loops=1) Recheck Cond: ((nomlieu)::text % 'gaumont'::text) -> Bitmap Index Scan on idx_lieu_nomlieu_trgm (cost=0.00..3.37 rows=106 width=0) (actual time=689.749..689.749 rows=36 loops=1) Index Cond: ((nomlieu)::text % 'gaumont'::text) Total runtime: 690.195 ms (5 rows) The trigram version is slower and doesn't return 3 results I should have. The 3 results it doesn't return have the word gaumont in them at the start of the string exactly like the others. Is there anything I can do to improve the performances and investigate why I don't have these 3 results? Thanks. -- Guillaume ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Plan invalidation design
On 2/19/07, Tom Lane <[EMAIL PROTECTED]> wrote: Gregory Stark <[EMAIL PROTECTED]> writes: > If you prepare "select *" and add a column, you're saying the query should > start failing? Either fail or change output; which you like better? The whole point of this exercise is to support plpgsql functions that do something like create temp table foo ... select * into rec from foo ... drop table foo ... If that's the case, do you think there is a simpler way to handle this problem than plan invalidation? Maybe I'm oversimplifying things a bit here, but how about something like: create local table foo ... select * into rec from foo ... this isn't completely unsurprising...we have 'SET LOCAL, etc. merlin ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Writing triggers in C++
Tom Lane wrote: > That is most likely not going to work anyway, because the backend > operating environment is C not C++. If you dumb it down enough > --- no exceptions, no RTTI, no use of C++ library --- then it might > work, but at that point you're really coding in C anyway. Writing "normal" user-defined-functions in C++ is not a problem so far. I even handle C++ exceptions, by catching each C++ exception inside my functions. The catch()-blocks in those functions raise Postgres-exceptions using elog in case of a throw(). Writing "normal" user-defined-functions in C++ is even encouraged by the documentation, which says: "User-defined functions can be written in C (or a language that can be made compatible with C, such as C++)." [chapter 33.9.] The question is, why not writing user-defined trigger-functions in C++ ? The difference between a "normal" function and a trigger function is not that big although. The "big" difference is, that one must include some more header-files (executor/spi.h and commands/trigger.h) which themselves include other headers-files containing identifiers which unfortunately are C++-keywords. > > Is there any convention how to rename such identifiers? If I would > > rename those identifiers (I simply would add an underscore to each of > > them), would such a patch be accepted and adopted onto one of the next > > releases? > > No. Because of the above problems, we don't see much reason to avoid > C++'s extra keywords. In order to check how much code would be have to be changed, I renamed the affected keywords in the Postgres-8.2.3-headers files, patched the affected sources and recompiled the code. The resulting patch affects only 189 lines of code in 23 files. Applying this patch would encourage authors of external trigger functions to write their code in C++ instead of using PL/SQL and calling "normal" user-defined functions, or writing wrappers in C to hide the C++-keywords. I will recreate this patch for the CVS-head of the repository, if there are chances that it ever will be commitet. Regards, Jacob ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Plan invalidation design
Gregory Stark wrote: [snip] Hm. The set of output columns could change? How? If you prepare "select *" and add a column, you're saying the query should start failing? That seems strange given the behaviour of views, which is that once parsed the list of columns is written in stone. It seems prepared queries should work the same way that views work and remember which physical column they were referring to previously. (Personally I don't like that behaviour but it feels like this should be consistent with it.) I guess you do have a serious problem if you redefine the type of a column or redefine a view (though I think you would have to drop and recreate it, CREATE OR REPLACE wouldn't let you change the output columns). I would think it best to move towards changing views to not have output columns set in stone. It seems unreasonable that you can add/drop/alter columns in a table as much as you like, but you can't touch a view. I also not excited about the current view restrictions. Which means we don't want to start backing the idea by putting in more code that acts in the same way. I'm guessing from what Tom is saying, that the reason we have views set in stone is because they are/can be an example of inlined SQL. Particularly when views are built on views. Any further enlightenment welcome, but probably off topic for this thread. Russell Smith ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[HACKERS] TopPlan, again
While thinking about having a centralized plan cache for managing plan invalidation, I got annoyed again about the fact that the executor needs access to the Query tree. This means that we'll be storing *three* representations of any cached query: raw parsetree for possible regeneration, plus parsed Query tree and Plan tree. We've repeatedly discussed getting rid of execution-time access to the Query structure --- here's one old message about it: http://archives.postgresql.org/pgsql-hackers/1999-02/msg00388.php and here's a recent one: http://archives.postgresql.org/pgsql-hackers/2006-08/msg00734.php I think it's time to bite the bullet and do that. After looking over the code it seems that the executor needs a limited subset of the Query fields, namely commandType canSetTag rtable returningList returningLists into intoOptions intoOnCommit (why is this separate from intoOptions?) intoTableSpaceName rowMarks resultRelation resultRelations nParamExec (currently in topmost Plan node) which I think we should put into a new TopPlan node type. returningLists and resultRelations could be removed from Query; also, we might need only the list forms and not the singleton returningList/resultRelation fields in TopPlan. The other big problem is the rangetable (rtable): currently it contains Query trees for subqueries (including views) so unless we clean that up we aren't going to be all that far ahead in terms of reducing the overhead. I'm envisioning creating a "compact" rangetable entry struct with just the fields the executor needs: rtekind relid eref(might only need the table alias name not the column names) requiredPerms checkAsUser and flattening subquery rangetables into the main list, so that there's just one list and rangetable indexes are unique throughout a plan tree. That will allow subqueries to execute with the same EState as the main query and thus simplify nodeSubplan and nodeSubqueryScan. This list will also provide a simple way for the plan cache module to know which relations to lock before determining whether the plan has been invalidated. Comments, objections? Also, any thoughts about the names to use for these new node types? As I commented last year, I'm not completely happy with "TopPlan" because it won't actually be a subtype of Plan, but I don't have a better idea. Also I'm unsure what to call the cut-down RangeTblEntry struct; maybe RunTimeRangeTblEntry? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] TopPlan, again
On Sun, 18 Feb 2007, Tom Lane wrote: > We've repeatedly discussed getting rid of execution-time access to the > Query structure --- here's one old message about it: > http://archives.postgresql.org/pgsql-hackers/1999-02/msg00388.php > and here's a recent one: > http://archives.postgresql.org/pgsql-hackers/2006-08/msg00734.php > I think it's time to bite the bullet and do that. Great. > The other big problem is the rangetable (rtable): currently it contains > Query trees for subqueries (including views) so unless we clean that up > we aren't going to be all that far ahead in terms of reducing the > overhead. I'm envisioning creating a "compact" rangetable entry struct > with just the fields the executor needs: > > rtekind > relid > eref(might only need the table alias name not the column names) > requiredPerms > checkAsUser > > and flattening subquery rangetables into the main list, so that there's > just one list and rangetable indexes are unique throughout a plan tree. > That will allow subqueries to execute with the same EState as the main > query and thus simplify nodeSubplan and nodeSubqueryScan. This list > will also provide a simple way for the plan cache module to know which > relations to lock before determining whether the plan has been invalidated. Cool. > Comments, objections? Also, any thoughts about the names to use for > these new node types? As I commented last year, I'm not completely > happy with "TopPlan" because it won't actually be a subtype of Plan, > but I don't have a better idea. Also I'm unsure what to call the > cut-down RangeTblEntry struct; maybe RunTimeRangeTblEntry? I think TopPlan is misleading. What about MetaPlan instead of TopPlan? I think RunTimeRangeTblEntry is okay, though long. ExecRangeTblEntry? Thanks, Gavin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] TopPlan, again
Gavin Sherry wrote: On Sun, 18 Feb 2007, Tom Lane wrote: Comments, objections? Also, any thoughts about the names to use for these new node types? As I commented last year, I'm not completely happy with "TopPlan" because it won't actually be a subtype of Plan, but I don't have a better idea. Also I'm unsure what to call the cut-down RangeTblEntry struct; maybe RunTimeRangeTblEntry? I think TopPlan is misleading. What about MetaPlan instead of TopPlan? I think RunTimeRangeTblEntry is okay, though long. ExecRangeTblEntry? Would ExecPlan be better? - matches ExecRangeTblEntry. Cheers Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] TopPlan, again
Mark Kirkwood <[EMAIL PROTECTED]> writes: > Gavin Sherry wrote: >> On Sun, 18 Feb 2007, Tom Lane wrote: >>> Comments, objections? Also, any thoughts about the names to use for >>> these new node types? As I commented last year, I'm not completely >>> happy with "TopPlan" because it won't actually be a subtype of Plan, >>> but I don't have a better idea. Also I'm unsure what to call the >>> cut-down RangeTblEntry struct; maybe RunTimeRangeTblEntry? >> >> I think TopPlan is misleading. What about MetaPlan instead of TopPlan? I >> think RunTimeRangeTblEntry is okay, though long. ExecRangeTblEntry? > Would ExecPlan be better? - matches ExecRangeTblEntry. Neither of these seem to answer my worry that the node isn't a subtype of "Plan". One thought is that in some contexts this node type will probably appear in lists that might also contain utility statement nodes. (Currently, we represent such lists as Query lists that might or might not have utilityStmt set, but I don't want a utilityStmt field in this node type.) So maybe we should pick something based off "statement". Perhaps "PlannedStmt" or "ExecutableStmt"? ExecRangeTblEntry sounds good to me for the other thing. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] TopPlan, again
"Tom Lane" <[EMAIL PROTECTED]> writes: > Comments, objections? Also, any thoughts about the names to use for > these new node types? As I commented last year, I'm not completely > happy with "TopPlan" because it won't actually be a subtype of Plan, > but I don't have a better idea. Also I'm unsure what to call the > cut-down RangeTblEntry struct; maybe RunTimeRangeTblEntry? My only though is that I suspect this will somehow relate to the cte stuff I was doing for recursive queries. I'm not exactly clear how yet though. I think this has more to do with the RangeTable stuff than the TopPlan though. I was probably going to need a new kind of RangeTable representing a Subquery that was a reference to a cte rather than a separate subquery. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] n-gram search function
On Sun, 18 Feb 2007, Guillaume Smet wrote: Hi Oleg, On 2/17/07, Oleg Bartunov wrote: 3-gram is implemented as a contrib/pg_trgm. It currently uses GiST index, but may be enhanced with the GiN. As I'm facing the same problem, I've taken a look to pg_trgm. At the moment, my opinion is quite mixed but perhaps I did something wrong. I have a table (100k rows) with a location name in it generally composed of several words but not that long. I created the index directly on this column (ie I don't create a table with each word of the location name). Then I tried a few queries. Here is an example: prod=# explain analyze select nomlieu from lieu where nomlieu ilike '%gaumont%'; QUERY PLAN - Seq Scan on lieu (cost=0.00..7230.20 rows=7 width=21) (actual time=7.768..556.930 rows=39 loops=1) Filter: ((nomlieu)::text ~~* '%gaumont%'::text) Total runtime: 557.066 ms (3 rows) _prod=# explain analyze select nomlieu from lieu where nomlieu % 'gaumont'; QUERY PLAN - Bitmap Heap Scan on lieu (cost=3.37..200.80 rows=106 width=21) (actual time=689.799..690.035 rows=36 loops=1) Recheck Cond: ((nomlieu)::text % 'gaumont'::text) -> Bitmap Index Scan on idx_lieu_nomlieu_trgm (cost=0.00..3.37 rows=106 width=0) (actual time=689.749..689.749 rows=36 loops=1) Index Cond: ((nomlieu)::text % 'gaumont'::text) Total runtime: 690.195 ms (5 rows) The trigram version is slower and doesn't return 3 results I should have. The 3 results it doesn't return have the word gaumont in them at the start of the string exactly like the others. Is there anything I can do to improve the performances and investigate why I don't have these 3 results? pg_trgm was developed for spelling corrrection and there is a threshold of similarity, which is 0.3 by default. Readme explains what does it means. Similarity could be very low, since you didn't make separate column and length of the full string is used to normalize similarity. pg_trgm as is isn't well suited for wild card search, but the idea is there. Regards, Oleg _ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] RFC: Temporal Extensions for PostgreSQL
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 > Well, unique is usually defined as "not equal to any other". And "not > equal" also fails transitive law [...] > But it should be trivial to test at insertion time if the interval > overlaps with any existing intervals [...] Putting your point another way: you might construe an equivalence relation by grouping together all intervals which (directly or indirectly) touch each other. Let's say they are "connected". But then the problem becomes clear: let's assume A and C are not connected (i.e. they are in different equivalence classes). Now you add B, which happens to overlap A and C. Now A and C are connected. How do you care for that in your index? That can't happen with a "classical" equivalence relation, which wouldn't change among existing elements when you add a new one. Regards - -- tomás -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFF2TmLBcgs9XrR2kYRAmIHAJ4+x1mOum1rvBkS8/Pypcu8w2QIIQCffFm5 No5aOh901rxfc2mpRYpJMAU= =7Isi -END PGP SIGNATURE- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq