Re: [HACKERS] gaussian distribution pgbench
Oh, sorry, I forgot to write URL referring picture. http://en.wikipedia.org/wiki/Normal_distribution http://en.wikipedia.org/wiki/Exponential_distribution regards, -- Mitsumasa KONDO 2014-03-15 17:50 GMT+09:00 Mitsumasa KONDO : > Hi > > 2014-03-15 15:53 GMT+09:00 Fabien COELHO : > > >> Hello Heikki, >> >> >> A couple of comments: >>> >>> * There should be an explicit "\setrandom ... uniform" option too, even >>> though you get that implicitly if you don't specify the distribution >>> >> >> Indeed. I agree. I suggested it, but it got lost. > > OK. If we keep to the SQL grammar, your saying is right. I will add it. > > >> * What exactly does the "threshold" mean? The docs informally explain >>> that "the larger the thresold, the more frequent values close to the middle >>> of the interval are drawn", but that's pretty vague. >>> >> >> There are explanations and computations as comments in the code. If it is >> about the documentation, I'm not sure that a very precise mathematical >> definition will help a lot of people, and might rather hinder >> understanding, so the doc focuses on an intuitive explanation instead. > > Yeah, I think that we had better to only explain necessary infomation for > using this feature. If we add mathematical theory in docs, it will be too > difficult for user. And it's waste. > > > * Does min and max really make sense for gaussian and exponential >>> distributions? For gaussian, I would expect mean and standard deviation as >>> the parameters, not min/max/threshold. >>> >> >> Yes... and no:-) The aim is to draw an integer primary key from a table, >> so it must be in a specified range. This is approximated by drawing a >> double value with the expected distribution (gaussian or exponential) and >> project it carefully onto integers. If it is out of range, there is a loop >> and another value is drawn. The minimal threshold constraint (2.0) ensures >> that the probability of looping is low. > > I think it is difficult to understand from our text... So I create picture > that will help you to understand it. > Please see it. > > >> >> * How about setting the variable as a float instead of integer? Would >>> seem more natural to me. At least as an option. >>> >> >> Which variable? The values set by setrandom are mostly used for primary >> keys. We really want integers in a range. > > I think he said threshold parameter. Threshold parameter is very sensitive > parameter, so we need to set double in threshold. I think that you can > consent it when you see attached picture. > > regards, > -- > Mitsumasa KONDO > NTT Open Source Software Center >
Re: [HACKERS] Patch: show relation and tuple infos of a lock to acquire
On Thu, Mar 13, 2014 at 8:06 PM, Amit Kapila wrote: > On Thu, Mar 13, 2014 at 7:10 PM, Robert Haas wrote: >> On Thu, Mar 13, 2014 at 12:45 AM, Amit Kapila >> wrote: >>> _bt_doinsert - "insert index tuple (X,Y)" (here it will refer to index tuple >>> location) >> >> I don't think that giving the index tuple location is going to be very >> helpful; can we get the TID for conflicting heap tuple? > > Yes, each index tuple contains reference to TID of heap tuple. I have updated the patch to pass TID and operation information in error context and changed some of the comments in code. Let me know if the added operation information is useful, else we can use better generic message in context. Christian, can you please once confirm if the attached patch looks okay to you? According to me all the comments raised till now for this patch are addressed, so I will mark it as Ready For Committer unless some one feels we should do something else for this patch. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com show_table_name_and_tuple_in_lock_log_v9.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gaussian distribution pgbench
Nice drawing! * How about setting the variable as a float instead of integer? Would seem more natural to me. At least as an option. Which variable? The values set by setrandom are mostly used for primary keys. We really want integers in a range. I think he said threshold parameter. Threshold parameter is very sensitive parameter, so we need to set double in threshold. I think that you can consent it when you see attached picture. I'm sure that the threshold must be a double, but I thought it was already the case, because of atof, the static variables which are declared double, and the threshold function parameters which are declared double as well, and the putVariable uses a "%lf" format... Possibly I'm missing something? -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] gaussian distribution pgbench
2014-03-15 19:04 GMT+09:00 Fabien COELHO : > > Nice drawing! > > > * How about setting the variable as a float instead of integer? Would >>> seem more natural to me. At least as an option. >>> >>> Which variable? The values set by setrandom are mostly used for primary >>> keys. We really want integers in a range. >>> >> >> I think he said threshold parameter. Threshold parameter is very sensitive >> parameter, so we need to set double in threshold. I think that you can >> consent it when you see attached picture. > > Oh, sorry.. It is to Heikki. Not to you... > I'm sure that the threshold must be a double, but I thought it was already > the case, because of atof, the static variables which are declared double, > and the threshold function parameters which are declared double as well, > and the putVariable uses a "%lf" format... > I think it's collect. When we get double argument in scanf(), we can use %lf format. > Possibly I'm missing something? Sorry. I think nothing is missing. regards, -- Mitsumasa KONDO
Re: [HACKERS] jsonb and nested hstore
On Sat, Mar 15, 2014 at 1:44 AM, Tomas Vondra wrote: > Because otherwise I don't understand how the index could be used for > queries with @> '{"a" : {"b" : "c"}}' conditions (i.e. path "[a,b]" with > value "c"). Hm, some experimentation here shows it does indeed work for queries like this and works quite nicely. I agree, this contradicts my explanation so I'll need to poke in this some more to understand how it is that this works so well: explain select j->'tags'->>'name' from osm where j @> '{"tags":{"waterway":"dam"}}' ; QUERY PLAN Bitmap Heap Scan on osm (cost=139.47..19565.07 rows=6125 width=95) Recheck Cond: (j @> '{"tags": {"waterway": "dam"}}'::jsonb) -> Bitmap Index Scan on osmj (cost=0.00..137.94 rows=6125 width=0) Index Cond: (j @> '{"tags": {"waterway": "dam"}}'::jsonb) Planning time: 0.147 ms (5 rows) stark=# select j->'tags'->>'name' from osm where j @> '{"tags":{"waterway":"dam"}}' ; ?column? - Alpine Dam Bell Canyon Dam Big Rock Dam Briones Dam Cascade Dam Gordon Valley Dam Kimball Canyon Dam Moore Dam Nicasio Dam Novato Creek Dam Ryland Dam Vasona Dam Warm Springs Dam Crystal Dam (248 rows) Time: 6.126 ms -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] bpchar functinos
I want to fined when is used these functions(what query caused the call of these functions) : -char_bpchar() -bpchar_name() -name_bpchar()
Re: [HACKERS] jsonb and nested hstore
On 15.3.2014 06:40, Peter Geoghegan wrote: > On Fri, Mar 14, 2014 at 6:44 PM, Tomas Vondra wrote: >> Well, depends on how you define useful. With the sample dataset >> 'delicious' (see Peter's post) I can do this: >> >>SELECT doc FROM delicious >> WHERE doc @> '{"title_detail" : {"value" : "TheaterMania"}}'; >> >> with arbitrary paths, and I may create a GIN index to support such >> queries. And yes, it's much faster than GiST for example (by a >> factor of 1000). > > If you know ahead of time the entire nested value you can. So, if > you attach some other data to the "TheaterMania" document, you had > better know that too if you hope to write a query like this. You also > have to index the entire table, where presumably with a little > thought you could get away with a much smaller index. That strikes me > as not very useful. Sure, I need to know some basic rules / do assumptions about the structure of the json document. In other words, schemaless databases are difficult to query. For example when storing mail message headers (i.e. the example I've used before), I do know that the json document is rather well structured - it's not nested at all, and all the values are either scalar values (mostly strings), or arrays of scalars. So it looks like this { "from" : "john@example.com", "to" : ["jane@example.com", "jack@example.com"], ... } So the schema is rather well defined (not the exact keys, but the structure certainly is). Let's say I want to allow arbitrary searches on headers - I can't support that with expression indexes, because there's like a zillion of possible headers and I'd have to create an expression index on each of them separately. But I can support that with a single GIN index ... >> Yes, the GIN index is quite large (~560MB for a ~1.2GB table). > > With the default opclass, without an expressional index, 100% of the > data from the table appears in the index. Why do you think that's > quite large? That wasn't meant as a complaint. I have no problem with the index size (If we can make it smaller in the future, great! But I can live with the current index sizes too.) regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Archive recovery won't be completed on some situation.
On Fri, Mar 14, 2014 at 7:32 PM, Kyotaro HORIGUCHI wrote: > Hello, we found that postgreql won't complete archive recovery > foever on some situation. This occurs HEAD, 9.3.3, 9.2.7, 9.1.12. > > Restarting server with archive recovery fails as following just > after it was killed with SIGKILL after pg_start_backup and some > wal writes but before pg_stop_backup. > > | FATAL: WAL ends before end of online backup > | HINT: Online backup started with pg_start_backup() must be > | ended with pg_stop_backup(), and all WAL up to that point must > | be available at recovery. > > What the mess is once entering this situation, I could find no > formal operation to exit from it. Though this is formal way, you can exit from that situation by (1) Remove recovery.conf and start the server with crash recovery (2) Execute pg_start_backup() after crash recovery ends (3) Copy backup_label to somewhere (4) Execute pg_stop_backup() and shutdown the server (5) Copy backup_label back to $PGDATA (6) Create recovery.conf and start the server with archive recovery > On this situation, 'Backup start location' in controldata has > some valid location but corresponding 'end of backup' WAL record > won't come forever. > > But I think PG cannot tell the situation dintinctly whether the > 'end of backup' reocred is not exists at all or it will come > later especially when the server starts as a streaming > replication hot-standby. > > One solution for it would be a new parameter in recovery.conf > which tells that the operator wants the server to start as if > there were no backup label ever before when the situation > comes. It looks ugly and somewhat danger but seems necessary. > > The first attached file is the script to replay the problem, and > the second is the patch trying to do what is described above. > > After applying this patch on HEAD and uncommneting the > 'cancel_backup_label_on_failure = true' in test.sh, the test > script runs as following, > > | LOG: record with zero length at 0/2010F40 > | WARNING: backup_label was canceled. > | HINT: server might have crashed during backup mode. > | LOG: consistent recovery state reached at 0/2010F40 > | LOG: redo done at 0/2010DA0 > > What do you thing about this? What about adding new option into pg_resetxlog so that we can reset the pg_control's backup start location? Even after we've accidentally entered into the situation that you described, we can exit from that by resetting the backup start location in pg_control. Also this option seems helpful to salvage the data as a last resort from the corrupted backup. Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] HEAD seems to generate larger WAL regarding GIN index
Hi, I executed the following statements in HEAD and 9.3, and compared the size of WAL which were generated by data insertion in GIN index. - CREATE EXTENSION pg_trgm; CREATE TABLE hoge (col1 text); CREATE INDEX hogeidx ON hoge USING gin (col1 gin_trgm_ops) WITH (FASTUPDATE = off); CHECKPOINT; SELECT pg_switch_xlog(); SELECT pg_switch_xlog(); SELECT pg_current_xlog_location(); INSERT INTO hoge SELECT 'POSTGRESQL' FROM generate_series(1, 100); SELECT pg_current_xlog_location(); - The results of WAL size are 960 MB (9.3) 2113 MB (HEAD) The WAL size in HEAD was more than two times bigger than that in 9.3. Recently the source code of GIN index has been changed dramatically. Is the increase in GIN-related WAL intentional or a bug? Regards, -- Fujii Masao -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On 03/15/2014 08:40 PM, Fujii Masao wrote: Hi, I executed the following statements in HEAD and 9.3, and compared the size of WAL which were generated by data insertion in GIN index. - CREATE EXTENSION pg_trgm; CREATE TABLE hoge (col1 text); CREATE INDEX hogeidx ON hoge USING gin (col1 gin_trgm_ops) WITH (FASTUPDATE = off); CHECKPOINT; SELECT pg_switch_xlog(); SELECT pg_switch_xlog(); SELECT pg_current_xlog_location(); INSERT INTO hoge SELECT 'POSTGRESQL' FROM generate_series(1, 100); SELECT pg_current_xlog_location(); - The results of WAL size are 960 MB (9.3) 2113 MB (HEAD) The WAL size in HEAD was more than two times bigger than that in 9.3. Recently the source code of GIN index has been changed dramatically. Is the increase in GIN-related WAL intentional or a bug? It was somewhat expected. Updating individual items on the new-format GIN pages requires decompressing and recompressing the page, and the recompressed posting lists need to be WAL-logged. Which generates much larger WAL records. That said, I didn't expect the difference to be quite that big when you're appending to the end of the table. When the new entries go to the end of the posting lists, you only need to recompress and WAL-log the last posting list, which is max 256 bytes long. But I guess that's still a lot more WAL than in the old format. That could be optimized, but I figured we can live with it, thanks to the fastupdate feature. Fastupdate allows amortizing that cost over several insertions. But of course, you explicitly disabled that... - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] First-draft release notes for next week's releases
First-draft release notes are committed, and should be visible at http://www.postgresql.org/docs/devel/static/release-9-3-4.html once guaibasaurus does its next buildfarm run a few minutes from now. Any suggestions? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [BUGS] BUG #9223: plperlu result memory leak
Hi! On Thu, Mar 6, 2014 at 6:59 AM, Alex Hunsaker wrote: . . . > This will apply cleanly all the way to REL9_2_STABLE. It applies (with > fuzz, but cleanly to REL9_1). REL9_0 does this completely differently > and so does not have this leak. > Looks like patch still not pushed to repo. -- Sergey Burladyan
Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index
On Sat, Mar 15, 2014 at 11:27 PM, Heikki Linnakangas < hlinnakan...@vmware.com> wrote: > On 03/15/2014 08:40 PM, Fujii Masao wrote: > >> Hi, >> >> I executed the following statements in HEAD and 9.3, and compared >> the size of WAL which were generated by data insertion in GIN index. >> >> - >> CREATE EXTENSION pg_trgm; >> CREATE TABLE hoge (col1 text); >> CREATE INDEX hogeidx ON hoge USING gin (col1 gin_trgm_ops) WITH >> (FASTUPDATE = off); >> >> CHECKPOINT; >> SELECT pg_switch_xlog(); >> SELECT pg_switch_xlog(); >> >> SELECT pg_current_xlog_location(); >> INSERT INTO hoge SELECT 'POSTGRESQL' FROM generate_series(1, 100); >> SELECT pg_current_xlog_location(); >> - >> >> The results of WAL size are >> >> 960 MB (9.3) >>2113 MB (HEAD) >> >> The WAL size in HEAD was more than two times bigger than that in 9.3. >> Recently the source code of GIN index has been changed dramatically. >> Is the increase in GIN-related WAL intentional or a bug? >> > > It was somewhat expected. Updating individual items on the new-format GIN > pages requires decompressing and recompressing the page, and the > recompressed posting lists need to be WAL-logged. Which generates much > larger WAL records. > > That said, I didn't expect the difference to be quite that big when you're > appending to the end of the table. When the new entries go to the end of > the posting lists, you only need to recompress and WAL-log the last posting > list, which is max 256 bytes long. But I guess that's still a lot more WAL > than in the old format. > > That could be optimized, but I figured we can live with it, thanks to the > fastupdate feature. Fastupdate allows amortizing that cost over several > insertions. But of course, you explicitly disabled that... Let me know if you want me to write patch addressing this issue. -- With best regards, Alexander Korotkov.
[HACKERS] Minimum supported version of Python?
Our documentation claims that the minimum Python version for plpython is 2.3. However, an attempt to build with that on an old Mac yielded a bunch of failures in the plpython_types regression test, all of the form ! ERROR: could not import a module for Decimal constructor ! DETAIL: ImportError: No module named decimal A quick trawl through the buildfarm revealed no evidence that we're actually testing anything older than Python 2.5, which is why this wasn't evident awhile ago. It looks like the failing test cases date back to PG 9.0. Personally I have no desire to put any effort into fixing this, and thus suggest that we just change the documentation to specify that 2.5 is the minimum Python version since 9.0. Does anybody else want to work harder on it, and if so what would you propose doing exactly? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Portability issues in shm_mq
On Fri, Mar 14, 2014 at 4:43 PM, Tom Lane wrote: > Whilst setting up a buildfarm member on an old, now-spare Mac, I was > somewhat astonished to discover that contrib/test_shm_mq crashes thus: > TRAP: FailedAssertion("!(rb >= sizeof(uint64))", File: "shm_mq.c", Line: 429) > but only in UTF8 locales, not in C locale. You'd have bet your last > dollar that that code was locale-independent, right? > > The reason appears to be that in the payload string generated with > (select string_agg(chr(32+(random()*96)::int), '') from > generate_series(1,400)) > the chr() argument rounds up to 128 every so often. In UTF8 encoding, > that causes chr() to return a multibyte character instead of a single > byte. So, instead of always having a fixed payload string length of > 400 bytes, the payload length moves around a bit --- in a few trials > I see anywhere from 400 to 409 bytes. > > How is that leading to a crash? Well, this machine is 32-bit, so MAXALIGN > is only 4. This means it is possible for an odd-length message cum > message length word to not exactly divide the size of the shared memory > ring buffer, resulting in cases where an 8-byte message length word is > wrapped around the end of the buffer. shm_mq_receive_bytes makes no > attempt to hide that situation from its caller, and happily returns just > 4 bytes with SHM_MQ_SUCCESS. shm_mq_receive, on the other hand, is so > confident that it will always get an indivisible length word that it just > Asserts that that's the case. Argh. I think I forced the size of the buffer to be MAXALIGN'd, but what it really needs is to be a multiple of the size of uint64. > Recommendations: > > 1. Reduce the random() multiplier from 96 to 95. In multibyte encodings > other than UTF8, chr() would flat out reject values of 128, so this test > case is unportable. Agreed. > 2. Why in the world is the test case testing exactly one message length > that happens to be a multiple of 8? Put some randomness into that, > instead. Good idea. I think that started out as a performance test rather than an integrity test, and I didn't think hard enough when revising it about what would make a good integrity test. > 3. Either you need to work a bit harder at forcing alignment, or you need > to fix shm_mq_receive to cope with split message length words. The first one is what is intended. I will look at it. > 4. The header comment for shm_mq_receive_bytes may once have described its > API accurately, but that appears to have been a long long time ago in a > galaxy far far away. Please fix. Ugh, looks like I forgot to update that when I introduced the shm_mq_result return type. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposed feature: Selective Foreign Key
On 11/28/2013 09:15 AM, Tom Dunstan wrote: > Feature Proposal: Selective foreign keys. > - > Allow foreign keys to have where clauses. I like this feature. One of the issues I have been trying to deal with is how to model countries and their subdivision. One table can suffice to register all countries, but the subdivisions of countries can have different columns and constraints. Having a where clause in a foreign key constraint would make it possible to have a table for state_us which can only contain rows referring the to USA and a table for departement_fr which can only contain rows referring to France. I hope this feature somehow gets it into a release some day. Niels
Re: [HACKERS] jsonb and nested hstore
On 15.3.2014 02:15, Peter Geoghegan wrote: > On Fri, Mar 14, 2014 at 5:10 PM, Tomas Vondra wrote: >> I'm on commit a3115f0d, which is just 2 days old, so I suppose this was >> not fixed yet. > > Try merging the feature branch now, which will get you commit 16923d, > which you're missing. That was an open item for a while, which I only > got around to fixing a few days ago. Ok, that seems to be working fine. T. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Upcoming back branch releases
eFrom: "Tom Lane" After some discussion, the core committee has concluded that the WAL-replay bug fixed in commit 6bfa88acd3df830a5f7e8677c13512b1b50ae813 is indeed bad enough to justify near-term update releases. Since there seems no point in being slow about it, tarballs will be wrapped Monday (3/17) for public announcement Thursday (3/20). Regarding: Prevent intermittent "could not reserve shared memory region" failures on recent Windows versions (MauMau) Could you include the patch I sent for 9.0/9.1 in response to Magnus san? http://www.postgresql.org/message-id/9C709659FE3C4B8DA7135C6F307B83F5@maumau Regards MauMau -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Minimum supported version of Python?
On 16 March 2014 11:55, Tom Lane wrote: > Our documentation claims that the minimum Python version for plpython > is 2.3. However, an attempt to build with that on an old Mac yielded > a bunch of failures in the plpython_types regression test, all of the > form > ... > Personally I have no desire to put any effort into fixing this, and > thus suggest that we just change the documentation to specify that 2.5 > is the minimum Python version since 9.0. +1 for updating the documentation. 2.5 has been around since 2006 so we are offering a huge range of compatibility as it stands. Versions earlier than 2.5 are probably only of interest to historians at this point. Cheers, BJ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers