Re: [HACKERS] unaccent extension missing some accents
Alright, I wrote up another patch that uses strchr to parse out the lines of the unaccent.rules file, foregoing sscanf completely. Hopefully this looks a bit better than using swscanf. As for the other problems with isspace and such on OSX, it might be worth looking at the python portability fixes. I played briefly with the isspace and friends macros they have and they looked okay, but I certainly can't speak for how well they'd work for the rest of the PostgreSQL code base. Cheers. 0001-Fix-weirdness-when-dealing-with-UTF-8-in-buggy-libc-.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [patch] Include detailed information about a row failing a CHECK constraint into the error message
Hi, when I insert/update many rows at once using INSERT ... SELECT into a table which has plenty of CHECK constraints, the error message that Postgres returns has no indication of which row failed the constraint check. The attached patch tries to provide information in a similar way to how duplicate items in a UNIQUE constraint are handled. Originally, I tried to simply check the new row's t_ctid, but it was always (0,0) -- I guess that's expected, maybe it's still in memory at that time and maybe such nodes don't have a ctid assigned yet. Please let me know if this patch is suitable for inclusion. It's based on REL9_0_STABLE, because that's the version I'm running. I'd like to thank intgr on IRC for his feedback when I was wondering about the t_ctid. With kind regards, Jan -- Trojita, a fast e-mail client -- http://trojita.flaska.net/ diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 504f4de..eb4f595 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -1364,10 +1364,32 @@ ExecConstraints(ResultRelInfo *resultRelInfo, const char *failed; if ((failed = ExecRelCheck(resultRelInfo, slot, estate)) != NULL) + { + StringInfoData buf; + int natts = rel->rd_att->natts; + int i; + initStringInfo(&buf); + for (i = 0; i < natts; ++i) + { + char *val; + Oid foutoid; + bool typisvarlena; + getTypeOutputInfo(rel->rd_att->attrs[i]->atttypid, &foutoid, &typisvarlena); + if (slot->tts_isnull[i]) + val = "NULL"; + else + val = OidOutputFunctionCall(foutoid, slot->tts_values[i]); + if (i > 0) + appendStringInfoString(&buf, ", "); + appendStringInfoString(&buf, val); + } ereport(ERROR, (errcode(ERRCODE_CHECK_VIOLATION), errmsg("new row for relation \"%s\" violates check constraint \"%s\"", - RelationGetRelationName(rel), failed))); + RelationGetRelationName(rel), failed), +errdetail("New row with data (%s) violates check constraint \"%s\".", + buf.data, failed))); + } } } signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Measuring relation free space
On 07/11/11 10:20, Bernd Helmle wrote: --On 6. November 2011 01:08:11 -0200 Greg Smith wrote: Attached patch adds a new function to the pageinspect extension for measuring total free space, in either tables or indexes. I wonder if that should be done in the pgstattuple module, which output some similar numbers. Not meaning to disparage Greg's effort in any way, but I was thinking the same thing about pg_freespacemap. I have not checked what - if any differences there are in output, but it would be interesting to compare which of the various (3 at present) extensions with slightly overlapping areas of functionality should perhaps be merged. I am guessing (at this point very much guessing) that pg_freespace map may return its data faster, as pageinspect is gonna have to grovel through all the pages for itself (whereas pg_freespacemap relies on using info from the ... free space map fork). regards Mark
Re: [HACKERS] unaccent extension missing some accents
On 2011-11-06, at 7:15 PM, Tom Lane wrote: > > swscanf doesn't seem like an acceptable approach: it's a function that > is relied on nowhere else in PG, so it adds new portability risks of its > own. It doesn't exist on some platforms that we support (like the one > I'm typing this message on) and there's no real good reason to assume > that it's not broken in its own ways on others. > > If you really want to pursue this, I'd suggest parsing the line > manually, perhaps via strchr searches for \t and \n. It likely wouldn't > be very many more lines than what you've got here. > > However, the bigger picture is that OS X's UTF8 locales are broken > through-and-through, and most of their other problems are not feasible > to work around. So basically you can't use them for anything > interesting, and it's not clear that it's worth putting any time into > solving individual problems. In the particular case here, the issue > presumably is that sscanf is relying on isspace() ... but we rely on > isspace() directly, in quite a lot of places, so how much is it going > to fix to dodge it right here? > >regards, tom lane There are some fixes for isspace and friend that I've seen python using so perhaps in those cases a similar fix could be applied. For instance, maybe something like the code around line 674 here: http://svn.python.org/view/python/trunk/Include/pyport.h?revision=81029&view=markup Perhaps that would be suitable on OSX at least in the case of isspace et al. As far as I can tell scanf doesn't seem to use isspace on my system so that would only be a partial fix for this an whatever other situations isspace is used in. (on a mobile now so I can't check a the moment.) This isn't really a huge deal for me but I'll try to get a chance to write up a little parser anyways just for kicks. Cheers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] btree gist known problems
I'm looking for known problem areas in btree_gist. I see: http://archives.postgresql.org/message-id/8973.1286841...@sss.pgh.pa.us With Range Types, I'm anticipating that btree_gist will become more important, so I'd like to know what bugs are holding it back. So, anything else come to mind? Or does btree_gist just need a good review? Or have the problems been fixed? Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unaccent extension missing some accents
J Smith writes: > I've attached a patch against master for unaccent.c that uses swscanf > along with char2wchar and wchar2char instead of sscanf directly to > initialize the unaccent extension and it appears to fix the problem in > both the master and 9.1 branches. swscanf doesn't seem like an acceptable approach: it's a function that is relied on nowhere else in PG, so it adds new portability risks of its own. It doesn't exist on some platforms that we support (like the one I'm typing this message on) and there's no real good reason to assume that it's not broken in its own ways on others. If you really want to pursue this, I'd suggest parsing the line manually, perhaps via strchr searches for \t and \n. It likely wouldn't be very many more lines than what you've got here. However, the bigger picture is that OS X's UTF8 locales are broken through-and-through, and most of their other problems are not feasible to work around. So basically you can't use them for anything interesting, and it's not clear that it's worth putting any time into solving individual problems. In the particular case here, the issue presumably is that sscanf is relying on isspace() ... but we rely on isspace() directly, in quite a lot of places, so how much is it going to fix to dodge it right here? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unaccent extension missing some accents
On Sun, Nov 6, 2011 at 1:18 PM, Florian Pflug wrote: > > What's the locale of the database you're seeing this in, and which charset > does it use? > > I think scanf() uses isspace() and friends, and last time I looked the > locale definitions where all pretty bogus on OSX. So maybe scanf() somehow > decides that 0xA0 is whitespace. > Ah, that does it: the locale I was using in the test code was just plain ol' C locale, while in the database it was en_CA.UTF-8. Changing the locale in my test code produced the wonky results. I should have figured it was a locale problem. Sure enough, in a UTF-8 locale, it believes that both 0xa0 and 0x85 are spaces. Pretty wonky behaviour indeed. Apparently this is a known OSX issue that has its roots in and older version of FreeBSD's libc I guess, eh? I've found various bug reports that allude to the problem and they all seem to point that way. I've attached a patch against master for unaccent.c that uses swscanf along with char2wchar and wchar2char instead of sscanf directly to initialize the unaccent extension and it appears to fix the problem in both the master and 9.1 branches. I haven't added any tests in the expected output file 'cause I'm not exactly sure what I should be testing against, but I could take a crack at that, too, if the patch looks reasonable and is usable. Cheers. 0001-Fix-weirdness-when-dealing-with-UTF-8-in-buggy-libc-.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] optional cleaning queries stored in pg_stat_statements
Peter Geoghegan writes: > On 6 November 2011 15:08, Tom Lane wrote: >> Are you intending to hash the raw >> grammar output tree, the parse analysis result, the rewriter result, >> or the actual plan tree? > I'm hashing the Query tree from a planner plugin (function assigned to > planner_hook), immediately prior to it being passed to > standard_planner. IOW, the rewriter result. Why that choice? That seems like about the least useful of the four possibilities, since it provides no certainty about the plan while also being as far from the original text as you can get (thus making the problem of what to display pretty hard). > A major consideration was backwards compatibility; This is not a consideration that the community is likely to weigh heavily, or indeed at all. We aren't going to back-port this feature into prior release branches, and we aren't going to want to contort its definition to make that easier. If 2ndquadrant wants to ship a back-branch version with the feature, you can certainly also back-port a patch that adds a hook where you need it, if you need a new hook. But frankly I'm a bit surprised that you're not hashing the query plan, since you could get that in the ExecutorStart hook that pg_stat_statements already uses. With a hook placed somewhere else, you add additional implementation problems of matching up the calls to that hook with later calls to the executor hook. >> I'm not real sure whether it's better to classify on the basis of >> similar plans or similar original queries, anyway. This seems like >> something that could be open for debate about use-cases. > Indeed - it's generally difficult to reason about what behaviour is > optimal when attempting to anticipate the needs of every Postgres DBA. > In this case, I myself lean strongly towards similar original queries, > because classifying on the basis of similar plans isn't likely to be > nearly as actionable, and there are really no obvious precedents to > draw on - how can it be turned into a useful tool? Hm, well, if that's what you think, why so late in the sequence? Hashing the raw grammar output would be the best way to identify similar original queries, ISTM. You'd also have a better chance of getting people to hold still for the extra overhead fields, if they didn't need to propagate further than that representation. Basically, I think there are fairly clear arguments for using a hash of the raw grammar output, if you lean towards the "classify by original query" viewpoint; or a hash of the plan tree, if you lean towards the "classify by plan" viewpoint. I don't see what use-cases would make it optimal to hash one of the intermediate stages. I was hoping you had an argument stronger than "it'll be easy to back-port" for that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] optional cleaning queries stored in pg_stat_statements
On 6 November 2011 15:08, Tom Lane wrote: > Are you intending to hash the raw > grammar output tree, the parse analysis result, the rewriter result, > or the actual plan tree? I'm hashing the Query tree from a planner plugin (function assigned to planner_hook), immediately prior to it being passed to standard_planner. A major consideration was backwards compatibility; at one point, it looked like this all could be done without adding any new infrastructure, with perhaps an officially sanctioned 9.2 version of pg_stat_statements that could be built against earlier pg versions. Other than that, it seems intuitively obvious that this should happen as late as possible in the parsing stage - any transformation performed prior to planning was considered essential to the query, even if that potentially meant that successive calls of the same query string were considered different due to external factors. These things probably don't come up to often in practice, but I think that they're nice to have, as they prevent the DBA from looking at statistics that aren't actually representative. >> A guiding principal that >> I've followed is that anything that could result in a different plan >> is a differentiator of queries. > > This claim seems like bunk, unless you're hashing the plan tree, > in which case it's tautological. I think I may have been unclear, for which I apologise. Certainly, if the principle I (mis)described was followed to the letter, I'd end up with something that was exactly the same as what we already have. I merely meant to suggest, with an awareness of the fact that I was saying something slightly controversial, that because the planner is /invariably/ sensitive to changes in limit/offset constants, particularly large changes, it made sense to have those constants as differentiators, and it certainly made sense to have a limit n differentiate the same query with and without a limit n. I do of course appreciate that the use of a different constant in quals could result in a different plan being generated due to their selectivity estimates varying. > I'm not real sure whether it's better to classify on the basis of > similar plans or similar original queries, anyway. This seems like > something that could be open for debate about use-cases. Indeed - it's generally difficult to reason about what behaviour is optimal when attempting to anticipate the needs of every Postgres DBA. In this case, I myself lean strongly towards similar original queries, because classifying on the basis of similar plans isn't likely to be nearly as actionable, and there are really no obvious precedents to draw on - how can it be turned into a useful tool? >> There will be additional infrastructure added to the parser to support >> normalisation of query strings for the patch I'll be submitting (that >> obviously won't be supported in the version that builds against >> existing Postgres versions that I'll make available). Essentially, >> I'll be adding a length field to certain nodes, > > This seems like a good way to get your patch rejected: adding overhead > to the core system for the benefit of a feature that not everyone cares > about is problematic. It's problematic, but I believe that it can be justified. Without being glib, exactly no one cares about the location of tokens that correspond to Const nodes until they have an error that occurs outside the parser that is attributable to a node/corresponding token, which, in a production environment, could take a long time. All I'm doing is moving slightly more towards the default Bison representation of YYLTYPE, where the column and row of both the beginning and end of each token are stored. I'm storing location and length rather than just location, which is a modest change. Not everyone cares about this feature, but plenty do. It will be particularly useful to have the Query representation stable, even to the point where it is stable between pg_stat_statements_reset() calls - third party tools can rely on the stability of the query string. > Why do you need it anyway? Surely it's possible > to determine the length of a literal token after the fact. Probably, but not in a manner that I deem to be well-principled. I want to push the onus on keeping bookkeeping for the replacement of tokens into the parser, which is authoritative - otherwise, I'll end up with a kludge that is liable to fall out of sync. The community will have the opportunity to consider if that trade-off makes sense. > More generally, if you're hashing anything later than the raw grammar > tree, I think that generating a suitable representation of the queries > represented by a single hash entry is going to be problematic anyway. > There could be significant differences --- much more than just the > values of constants --- between query strings that end up being > semantically the same. Or for that matter we could have identical query > strings that wind up being considered different because
Re: [HACKERS] Measuring relation free space
--On 6. November 2011 01:08:11 -0200 Greg Smith wrote: Attached patch adds a new function to the pageinspect extension for measuring total free space, in either tables or indexes. I wonder if that should be done in the pgstattuple module, which output some similar numbers. -- Thanks Bernd -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: psql concise mode
On Sun, Nov 6, 2011 at 1:16 PM, Dickson S. Guedes wrote: >> test=# \d+ foo >> Table "public.foo" >> Column | Type | Storage >> +-+- >> a | integer | plain >> b | integer | plain >> Has OIDs: no > > Using your example, what if column 'b' has a comment and 'a' not? How > the above output will be displayed? Then the comments would be displayed as they previously were, like so: Table "public.foo" Column | Type | Storage | Description +-+-+- a | integer | plain | b | integer | plain | some comment Has OIDs: no Josh -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Measuring relation free space
On Sun, Nov 6, 2011 at 04:08, Greg Smith wrote: > Attached patch adds a new function to the pageinspect extension for > measuring total free space, in either tables or indexes. It returns the > free space as a percentage, so higher numbers mean more bloat. After trying > a couple of ways to quantify it, I've found this particular measure > correlates well with the nastiest bloat issues I've ran into in production > recently. For example, an index that had swelled to over 5X its original > size due to autovacuum issues registered at 0.86 on this scale. I could > easily see people putting an alert at something like 0.40 and picking > candidates to reindex based on it triggering. That would be about a million > times smarter than how I've been muddling through this class of problems so > far. > > Code by Jaime Casanova, based on a prototype by me. Thanks to attendees and > sponsors of the PgWest conference for helping to fund some deeper > exploration of this idea. Looks pretty useful. One quick stylistic comment - we don't generally use "* 1.0" to turn an int into a double - just use a cast. > -Given this is doing a full table scan, should it hook into a ring buffer to > keep from trashing the buffer cache? Or might it loop over the relation in > a different way all together? I was thinking about eyeing the FSM instead > at one point, didn't explore that yet. There's certainly a few ways to > approach this, we just aimed at the easiest way to get a working starter > implementation, and associated results to compare others against. Hooking into a ring buffer seems like an almost requirement before you can run this on a larger production system, wouldn't it? I don't know how hard that is code-wise, but it certainly seems worthwhile. > -Should there be a non-superuser version of this? We'd certainly need to > get a less cache demolishing version before that would seem wise. Not sure that's necessary - at least not for now. Many other diagnostics functions are already superuser only... > -There were related things in the pageinspect module, but a case could be > made for this being a core function instead. It's a bit more likely to be > used in production than the rest of that extension. A case can be made for a lot of things in contrib to be in core ;) I say let's keep it in pageinspect, but then also have you finish off that "split up the contrib" patch :-) > -What if anything related to TOAST should this handle? Similar data for TOAST relations would be intersting, no? But that's easily done from userspace by just querying to the toast table specifically, I assume? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] optional cleaning queries stored in pg_stat_statements
On 11/06/2011 01:08 PM, Tom Lane wrote: Peter Geoghegan writes: ... It also does things like intelligently distinguishing between queries with different limit/offset constant values, as these constants are deemed to be differentiators of queries for our purposes. A guiding principal that I've followed is that anything that could result in a different plan is a differentiator of queries. This claim seems like bunk, unless you're hashing the plan tree, in which case it's tautological. Peter's patch adds a planner hook and hashes at that level. Since this cat is rapidly escaping its bag and impacting other contributors, we might as well share the work in progress early if anyone has a burning desire to look at the code. A diff against the version I've been internally reviewing in prep for community submission is at https://github.com/greg2ndQuadrant/postgres/compare/master...pg_stat_statements_norm Easier to browse than ask questions probing about it I think. Apologies to Peter for sharing his work before he was completely ready; there is a list of known problems with it. I also regret the thread hijack here. The first chunk of code is a Python based regression test program, and an open item here is the best way to turn that into a standard regression test set. There will be additional infrastructure added to the parser to support normalisation of query strings for the patch I'll be submitting (that obviously won't be supported in the version that builds against existing Postgres versions that I'll make available). Essentially, I'll be adding a length field to certain nodes, This seems like a good way to get your patch rejected: adding overhead to the core system for the benefit of a feature that not everyone cares about is problematic. Struggling around this area is the main reason this code hasn't been submitted yet. To step back for a moment, there are basically three options here that any code like this can take, in regards to storing the processed query name used as the key: 1) Use the first instance of that query seen as the "reference" version 2) Use the last instance seen 3) Transform the text of the query in a way that's stable across all duplicates of that statement, and output that Downstream tools operating on this data, things that will sample pg_stat_statements multiple times, need some sort of stable query text they can operate on. It really needs to survive server restart too. Neither (1) nor (2) seem like usable solutions. Both have been implemented already in Peter's patch, with (2) being what's in the current patch. How best to do (3) instead is not obvious though. Doing the query matching operating at the planner level seems effective at side-stepping the problem of needing to parse the SQL, which is where most implementations of this idea get stuck doing fragile transformations. My own first try at this back in September and Tomas's patch both fall into that category. That part of Peter's work seems to work as expected. That still leaves the problem of "parsed query -> stable normalized string". I think that is an easier one to solve than directly taking on the entirety of "query text -> stable normalized string" though. Peter has an idea he's exploring for how to implement that, and any amount of overhead it adds to people who don't use this feature is an obvious concern. There are certainly use cases that don't care about this problem, ones that would happily take (1) or (2). I'd rather not ship yet another not quite right for everyone version of pg_stat_statements again though; only solving too limited of a use case is the big problem with the one that's already there. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] foreign key locks, 2nd attempt
On 2011-11-04 01:12, Alvaro Herrera wrote: I would like some opinions on the ideas on this patch, and on the patch itself. If someone wants more discussion on implementation details of each part of the patch, I'm happy to provide a textual description -- please just ask. Jumping in a bit late here, but thanks for working on this: it looks like it could solve some annoying problems for us. I do find myself idly wondering if those problems couldn't be made to go away more simply given some kind of “I will never ever update this key” constraint. I'm having trouble picturing the possible lock interactions as it is. :-) Jeroen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multiple queries in transit
On 2011-11-03 17:26, Marko Kreen wrote: On Mon, Oct 31, 2011 at 7:09 PM, Tom Lane wrote: Can't you do that today with a multi-command string submitted to PQsendQuery, followed by multiple calls to PQgetResult? It's more annoying to to error handling on that, plus it still keeps the blocking behaviour, just with larger blocks. You can combine multi-command query strings with nonblocking mode, without any change in libpq itself. In fact that's exactly what the libpqxx "pipeline" class does. So if you're working in C++, you already have this feature at your disposal. Also I would ask for opposite feature: "multiple rows in flight". That means that when server is sending big resultset, the app can process it row-by-row (or by 10 rows) without stopping the stream and re-requesting. Cursors. Jeroen -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] optional cleaning queries stored in pg_stat_statements
On 6 Listopad 2011, 16:08, Tom Lane wrote: > Peter Geoghegan writes: >> I'm a couple of days away from posting a much better principled >> implementation of pg_stat_statements normalisation. To normalise, we >> perform a selective serialisation of the query tree, which is hashed. > > That seems like an interesting approach, and definitely a lot less of > a kluge than what Tomas suggested. Are you intending to hash the raw > grammar output tree, the parse analysis result, the rewriter result, > or the actual plan tree? I don't necessarily have a preconceived notion > about which is best (I can think of pluses and minuses for each), but > we should hear your explanation of which one you chose and what your > reasoning was. Could you describe the pluses and minuses? My understanding is that the later the hash is computed, the more it reflects how the queries were actually executed. Would it make sense to turn this into a GUC and leave the decision up to the user, something like pg_stat_statements.hash_phase = {grammar|analysis|rewriter|plan} So that the user could decide how coarse the output should be? > I'm not real sure whether it's better to classify on the basis of > similar plans or similar original queries, anyway. This seems like > something that could be open for debate about use-cases. Well, that's really tricky question - there can be different queries with the same plan. I thing that grouping queries solely by the plan is not much useful, so the original query should be involved somehow. What about using two hashes - hash of the grammar tree (grammar_hash) and hash of the rewriter output (rewriter_hash). The pg_stat_statements would then group the queries by the (grammar_hash, rewriter_hash) pair and include those two columns into the output. So I could select the rows with the same grammar_hash to see observed plans for the given query, or select rows with the same rewriter_hash to see queries leading to that particular plan. To make this actually usable it's important to provide access to the plans, so that the user can get rewriter_hash and get the plan somehow. This is not needed for grammar_hash, because the query string will be there, but the actual plan might change (due to autoanalyze etc.). But maybe this is a severe over-engineering and it's far too complicated. > It might be that the path of least resistance is to document that we > select one of the actual query strings "at random" to represent all the > queries that went into the same hash entry, and not even bother with > trying to strip out constants. The effort required to do that seems > well out of proportion to the reward, if we can't do a perfect job of > representing the common aspects of the queries. Yes, once we have the hashes we can surely use a random query string with the values included. But it'd be nice to have the actual plans stored somewhere, so that it's possible to see them later. Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unaccent extension missing some accents
On Nov6, 2011, at 18:43 , J Smith wrote: > I put some elog debugging lines into unaccent.c and found that sscanf > sometimes reads the scanned line by finding only one byte for the for > the source character rather than the two required for the complete > UTF-8 code point. It appears that the following characters are causing > the problem, along with the code points and such: > > 'Å' => 'A' | c3,85 => 41 > 'à' => 'a' | c3,a0 => 61 > 'ą' => 'a' | c4,85 => 61 > 'Ġ' => 'G' | c4,a0 => 47 > 'Ņ' => 'N' | c5,85 => 4e > 'Š' => 'S' | c5,a0 => 53 > > In each case, one byte was being read in the source string rather than > two, leading to the "duplicate TO" warnings above. This later leads to > the characters that produced the warning being ignored when unaccent > is called and left in the output. What's the locale of the database you're seeing this in, and which charset does it use? I think scanf() uses isspace() and friends, and last time I looked the locale definitions where all pretty bogus on OSX. So maybe scanf() somehow decides that 0xA0 is whitespace. > I haven't been able to reproduce in a smaller example, and haven't > been able to reproduce on a CentOS server, so at this point I'm at a > loss as to the problem. Have you tried to set the same locale as postgres (using setlocale()) in your tests? best regards, Florian Pflug -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: psql concise mode
2011/11/5 Josh Kupershmidt : > I'd like to propose a "concise mode" for psql, which users might turn > on via a \pset option. Concise mode would affect only the output of > psql's backslash commands. For output results which have some all-NULL > columns, as in: > > test=# \d+ foo > Table "public.foo" > Column | Type | Modifiers | Storage | Stats target | Description > +-+---+-+--+- > a | integer | | plain | | > b | integer | | plain | | > Has OIDs: no > > Concise mode would simply omit the all-NULL columns, so that the > output would look like this: > > test=# \d+ foo > Table "public.foo" > Column | Type | Storage > +-+- > a | integer | plain > b | integer | plain > Has OIDs: no Using your example, what if column 'b' has a comment and 'a' not? How the above output will be displayed? -- Dickson S. Guedes mail/xmpp: gue...@guedesoft.net - skype: guediz http://guedesoft.net - http://www.postgresql.org.br -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] optional cleaning queries stored in pg_stat_statements
Tomas Vondra writes: > If it was possible to compare the actual plan (or at least a hash of the > plan), and keeping one record for each plan, that'd be extremely nice. > You'd see that the query was executed with 3 different plans, number of > calls, average duration etc. I like that idea. How does that integrates to current efforts? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] unaccent extension missing some accents
Gah! Accidentally hit Send. Let me finish that last message before sending this time! G'day list. I've been messing around with the unaccent extension and I've noticed that some of the characters listed in the unaccent.rules file aren't actually being unaccented on my system. Here are the system details and whatnot. - OSX 10.7.2 - the server is compiled via macports. Tried using both gcc and llvm 4.2.1 compilers that come with the latest version of XCode. - the same symptoms show up in both 9.0.5 and 9.1.1. I've also tried building manually from the latest REL9_1_STABLE branch from git to make sure macports wasn't the problem, but I'm getting the same results with both compilers. When I first do a CREATE EXTENSION for unaccent, I'm seeing the following warnings in the log file: === WARNING: duplicate TO argument, use first one CONTEXT: line 8 of configuration file "/usr/local/postgresql91-local/share/tsearch_data/unaccent.rules": "à a " WARNING: duplicate TO argument, use first one CONTEXT: line 57 of configuration file "/usr/local/postgresql91-local/share/tsearch_data/unaccent.rules": "Ġ G " WARNING: duplicate TO argument, use first one CONTEXT: line 144 of configuration file "/usr/local/postgresql91-local/share/tsearch_data/unaccent.rules": "Š S " === I've dug around through the unaccent.c code a bit and I've noticed that the sscanf it does when reading the file is producing some odd output. I've tried with a minimal example using the same sort of sscanf code reading from the same unaccent.rules file, but the minimal example doesn't produce the same output. I put some elog debugging lines into unaccent.c and found that sscanf sometimes reads the scanned line by finding only one byte for the for the source character rather than the two required for the complete UTF-8 code point. It appears that the following characters are causing the problem, along with the code points and such: 'Å' => 'A' | c3,85 => 41 'à' => 'a' | c3,a0 => 61 'ą' => 'a' | c4,85 => 61 'Ġ' => 'G' | c4,a0 => 47 'Ņ' => 'N' | c5,85 => 4e 'Š' => 'S' | c5,a0 => 53 In each case, one byte was being read in the source string rather than two, leading to the "duplicate TO" warnings above. This later leads to the characters that produced the warning being ignored when unaccent is called and left in the output. I haven't been able to reproduce in a smaller example, and haven't been able to reproduce on a CentOS server, so at this point I'm at a loss as to the problem. Anybody got any ideas? Cheers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] unaccent extension missing some accents
G'day list. I've been messing around with the unaccent extension and I've noticed that some of the characters listed in the unaccent.rules file aren't actually being unaccented on my system. Here are the system details and whatnot. - OSX 10.7.2 - the server is compiled via macports. Tried using both gcc and llvm 4.2.1 compilers that come with the latest version of XCode. - the same symptoms show up in both 9.0.5 and 9.1.1. I've also tried building manually from the latest REL9_1_STABLE branch from git to make sure macports wasn't the problem, but I'm getting the same results with both compilers. When I first do a CREATE EXTENSION for unaccent, I'm seeing the following warnings in the log file: === WARNING: duplicate TO argument, use first one CONTEXT: line 8 of configuration file "/usr/local/postgresql91-local/share/tsearch_data/unaccent.rules": "à a " WARNING: duplicate TO argument, use first one CONTEXT: line 57 of configuration file "/usr/local/postgresql91-local/share/tsearch_data/unaccent.rules": "Ġ G " WARNING: duplicate TO argument, use first one CONTEXT: line 144 of configuration file "/usr/local/postgresql91-local/share/tsearch_data/unaccent.rules": "Š S " === I've dug around through the unaccent code a bit and I've noticed that the sscanf it does when reading the file is producing some odd output. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Strange problem with create table as select * from table;
hubert depesz lubaczewski writes: > checked lengths of the text/varchar columns in database. > there are 16 such columns in the table. > full report of lengths is in > http://www.depesz.com/various/lengths.report.gz > it was obtained using: > select length( "first_text_column" ) as length_1, count(*) from > etsy_v2.receipts group by 1 order by 1; > and so on for every text column, and at the end I also made summary of > sum-of-lengths. BTW, that probably doesn't prove a lot since it takes no account of compression, and different values could be compressible to varying degrees. It'd be more useful to look at pg_column_size() numbers to get an idea of whether toasting happened or not. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [GENERAL] Strange problem with create table as select * from table;
On Fri, Nov 04, 2011 at 09:04:02PM -0400, Tom Lane wrote: > that. And that they are the only rows that, in addition to the above > conditions, contain data fields wide enough to require out-of-line > toasting. checked lengths of the text/varchar columns in database. there are 16 such columns in the table. full report of lengths is in http://www.depesz.com/various/lengths.report.gz it was obtained using: select length( "first_text_column" ) as length_1, count(*) from etsy_v2.receipts group by 1 order by 1; and so on for every text column, and at the end I also made summary of sum-of-lengths. there is also: http://www.depesz.com/various/lengths2.report.gz which has the same summary, but only of the damaged rows. As you can see the length of columns is not really special - somewhere in the middle of all other rows. summarized length is also not special in any way. > These conditions together are enough to break the assumption in > toast_insert_or_update that the old and new tuples must have the same > value of t_hoff. But it can only happen when the source tuple is an > original on-disk tuple, which explains why only INSERT ... SELECT * > causes the problem, not any variants that require projection of a new > column set. When it does happen, toast_insert_or_update correctly > computes the required size of the new tuple ... but then it tells > heap_fill_tuple to fill the data part at offset olddata->t_hoff, which > is wrong (too small) and so the nulls bitmap that heap_fill_tuple > concurrently constructs will overwrite the first few data bytes. In > your example, the table contains 49 columns so the nulls bitmap requires > 7 bytes, just enough to overwrite the first 6 data bytes as observed. > (In fact, given the values we see being filled in, I can confidently say > that you have two added-since-creation null columns, no more, no less.) > > I can reproduce the problem with the attached test case (using the > regression database). With asserts enabled, the > Assert(new_len == olddata->t_hoff); > fails. With asserts off, corrupt data. How can I make the onek table for the test? is it standard table from something? > This is trivial to fix, now that we know there's a problem --- the > function is only using that assumption to save itself a couple lines > of code. Penny wise, pound foolish :-( Any chance of getting the fix in patch format so we could test it on this system? Best regards, depesz -- The best thing about modern society is how easy it is to avoid contact with it. http://depesz.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [GENERAL] Strange problem with create table as select * from table;
hubert depesz lubaczewski writes: > Any chance of getting the fix in patch format so we could test it on > this system? http://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=23998fe99c1220ba3a9eefee194e37ec1f14ae07 regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] optional cleaning queries stored in pg_stat_statements
Peter Geoghegan writes: > I'm a couple of days away from posting a much better principled > implementation of pg_stat_statements normalisation. To normalise, we > perform a selective serialisation of the query tree, which is hashed. That seems like an interesting approach, and definitely a lot less of a kluge than what Tomas suggested. Are you intending to hash the raw grammar output tree, the parse analysis result, the rewriter result, or the actual plan tree? I don't necessarily have a preconceived notion about which is best (I can think of pluses and minuses for each), but we should hear your explanation of which one you chose and what your reasoning was. > ... It also does things > like intelligently distinguishing between queries with different > limit/offset constant values, as these constants are deemed to be > differentiators of queries for our purposes. A guiding principal that > I've followed is that anything that could result in a different plan > is a differentiator of queries. This claim seems like bunk, unless you're hashing the plan tree, in which case it's tautological. As an example, switching from "where x < 1" to "where x < 2" could make the planner change plans, if there's a sufficiently large difference in the selectivity of the two cases (or even if there's a very small difference, but we're right at the tipping point where the estimated costs are equal). There's no way to know this in advance unless you care to duplicate all the planner cost estimation logic. And I don't think you actually want to classify "where x < 1" and "where x < 2" differently just because they *might* give different plans in corner cases. I'm not real sure whether it's better to classify on the basis of similar plans or similar original queries, anyway. This seems like something that could be open for debate about use-cases. > There will be additional infrastructure added to the parser to support > normalisation of query strings for the patch I'll be submitting (that > obviously won't be supported in the version that builds against > existing Postgres versions that I'll make available). Essentially, > I'll be adding a length field to certain nodes, This seems like a good way to get your patch rejected: adding overhead to the core system for the benefit of a feature that not everyone cares about is problematic. Why do you need it anyway? Surely it's possible to determine the length of a literal token after the fact. More generally, if you're hashing anything later than the raw grammar tree, I think that generating a suitable representation of the queries represented by a single hash entry is going to be problematic anyway. There could be significant differences --- much more than just the values of constants --- between query strings that end up being semantically the same. Or for that matter we could have identical query strings that wind up being considered different because of the action of search_path or other context. It might be that the path of least resistance is to document that we select one of the actual query strings "at random" to represent all the queries that went into the same hash entry, and not even bother with trying to strip out constants. The effort required to do that seems well out of proportion to the reward, if we can't do a perfect job of representing the common aspects of the queries. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] a tsearch issue
On Fri, 2011-11-04 at 11:22 +0100, Pavel Stehule wrote: > Hello > > I found a interesting issue when I checked a tsearch prefix searching. > > We use a ispell based dictionary > > CREATE TEXT SEARCH DICTIONARY cspell >(template=ispell, dictfile = czech, afffile=czech, stopwords=czech); > CREATE TEXT SEARCH CONFIGURATION cs (copy=english); > ALTER TEXT SEARCH CONFIGURATION cs >ALTER MAPPING FOR word, asciiword WITH cspell, simple; > > Then I created a table > > postgres=# create table n(a varchar); > CREATE TABLE > postgres=# insert into n values('Stěhule'),('Chromečka'); > INSERT 0 2 > postgres=# select * from n; > a > ─── > Stěhule > Chromečka > (2 rows) > > and I tested a prefix searching: > > I found a following issue > > postgres=# select * from n where to_tsvector('cs', a) @@ > to_tsquery('cs','Stě:*') ; > a > ─── > (0 rows) Most likely you are hit by this problem. http://archives.postgresql.org/pgsql-hackers/2011-10/msg01347.php 'Stě' may be a stopword in czech. > I expected one row. The problem is in transformation of word 'Stě' > > postgres=# select * from ts_debug('cs','Stě:*') ; > ─[ RECORD 1 ]┬── > alias│ word > description │ Word, all letters > token│ Stě > dictionaries │ {cspell,simple} > dictionary │ cspell > lexemes │ {sto} > ─[ RECORD 2 ]┼── > alias│ blank > description │ Space symbols > token│ :* > dictionaries │ {} > dictionary │ [null] > lexemes │ [null] > ':*' is only specific to to_tsquery. ts_debug just invokes the parser. So this is not correct. -Sushant. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers