Re: [GENERAL] Postgres 10.1 fails to start: server did not start in time
On 2017-11-12 13:26:58 +0100, Christoph Berg wrote: > Re: To Adam Brusselback 2017-11-11 > <2017205316.u56lkmkakdmcx...@msg.df7cb.de> > > I'm investigating if it's a good idea to tell systemd to ignore the > > exit code of pg_ctl(cluster). > > Telling systemd to ignore ExecStart errors seems to be the correct > solution. The service will still be active, with the startup error > being shown: Wouldn't it be better to remove the timeout? If some other service depends on PostgreSQL it probably shouldn't be startet until PostgreSQL is really up and services which don't need PostgreSQL (e.g. SSH or X11 login or a web- or mail server) shouldn't depend on it. One of the purported advantages of systemd over SystemV init is that it starts up services in parallel, so a service which takes a long (or infinite) time to start doesn't block other services. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: [GENERAL] Client Authentication methods
On 2017-11-10 08:25:24 -0500, chiru r wrote: > I am trying to understand the Authentication method in pg_hba.conf file > (password & md5) in PostgreSQL database server. > > I am assuming that the user provides the connection string host/usser/ > password,then client will go and contact the DB server pg_hba.conf file in > memory without carrying password over the network initially, and then it > confirms the authentication method from pg_hba.conf ,then it decides weather > it > send clear text or md5 encrypted password from client to Server to make a > session? I'm not sure what "it" refers to in this sentence. If "it" refers to the client (as grammatically it should) then the answer is no. The client doesn't have access to the pg_hba.conf file. The client connects to the server, sending the username and database name, but not (yet) the password. Then the server checks the pg_hba.conf file to determine which authentication method to use. The server then sends an authentication request to the client, to which the client sends a response (including, or based on, the password). > Is my assumption is correct ? or What exactly it make the difference for > client > if i use md5/password in pg_hba.conf file in DB server?. See https://www.postgresql.org/docs/10/static/auth-methods.html#AUTH-PASSWORD With method password, passwords are sent in plain text. With md5, an md5 hash of the password, the username, and a nonce is sent instead. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: [GENERAL] EXPLAIN command just hangs...
On 2017-11-02 20:51:23 +, Rhhh Lin wrote: [...] > where timestamp BETWEEN 150667656 AND 150875022 [...] > *Also, as a sidenote - can someone please expand on why one (I was not > involved > in the creation of this DB/schema definition) would choose to have the > definition of the timestamp column as a bigint in this case? The numbers look like Java timestamps (Milliseconds since the epoch). So probably the programs accessing the database are written in Java and the programmer decided that it's simpler to do all timestamp computations in Java than in SQL. Or maybe the programmer just felt more comfortable with Java-like timestamps than with calendar dates. (I have done the same (with Unix timestamps, i.e. seconds since the epoch). Although in the cases where I've done it I had the additional excuse that the database didn't support timestamps with timezones, which isn't the case for PostgreSQL.) hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
[GENERAL] Query plan for Merge Semi Join
║ ║ Execution time: 3275.341 ms ║ ╚╝ That is almost certainly not ideal, but this is not my question. My question is what does that merge semi join actually do? In general a merge join needs two inputs sorted by the merge key. It walks both in parallel and joins matching lines. Correct? The first input is the index scan. The second is the output of the materialize. Since we need only the column arbeitsvolumen this would be something like select arbeitsvolumen from facttable_kon_eh where thema='E' order by arbeitsvolumen; So far so good. But there are a few things I don't understand: Where does Rows Removed by Filter: 3874190 come from? The number doesn't match any count I can come up with: It is a bit larger than the total number of rows where term is not null but smaller than the total number of rows where the filter doesn't match. And it is much larger than the number of rows I would expect if the merge stopped once there could not be a possible match any more. And does it really check the filter condition even for rows that don't satisfy the merge condition? Of course it makes sense from a modularization point of view, but that's a lot of random accesses, most of which are unneccessary. The materialize returns 184791 rows. This one I understand: There are 6 non-null distinct values of arbeitsvolumen in facttable_kon_eh, and each appears 36958 times. 36958 * 5 + 1 = 184791. So it stops once it reaches the largest value. Although now I'm wondering how it knows that this is the largest value without scanning to the end). hp - - _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: [GENERAL] "Shared strings"-style table
On 2017-10-13 12:49:21 -0300, Seamus Abshere wrote: > In the spreadsheet world, there is this concept of "shared strings," a > simple way of compressing spreadsheets when the data is duplicated in > many cells. > > In my database, I have a table with >200 million rows and >300 columns > (all the households in the United States). For clarity of development > and debugging, I have not made any effort to normalize its contents, so > millions of rows have, for example, "SINGLE FAMILY RESIDENCE / > TOWNHOUSE" (yes, that whole string!) instead of some code representing > it. > > Theoretically / blue sky, could there be a table or column type that > transparently handles "shared strings" like this, reducing size on disk > at the cost of lookup overhead for all queries? Theoretically it's certainly possible and I think some column-oriented databases store data that way. > (I guess maybe it's like TOAST, but content-hashed and de-duped and not > only for large objects?) Yes, but if you want to autmatically delete entries which are no longer needed you need to keep track of that. So either a reference count or an index lookup on the parent table. This is starting to look a lot like a foreign key - just hidden from the user. Performance would probably be similar, too. We have done something similar (although for different reasons). We ended up doing the "join" in the application. For most purposes we don't need the descriptive strings and when we need them we can do a dictionary lookup just before sending them to the client (of course the dictionary has to be read from the database, too, but it doesn't change that often, so it can be cached). And from a software maintainability POV I think a dictionary lookup in Perl is a lot nicer than 50 joins (or 300 in your case). hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: PGP signature
Re: [GENERAL] pglogical bidirectional replication of sequences
On 2017-09-01 10:29:51 +0200, Peter J. Holzer wrote: > pglogical supports replication of sequences, and although the way it > does this suggests that it can't really work in both directions > (actually I'm sceptical that it works reliably in one direction), of > course I had to try it. > > So I created a sequence on both nodes and called > select pglogical.replication_set_add_sequence('default', 'test_sequence'); > on both nodes. > > The result was ... interesting. > > First I got the same sequence (1, 2, 3, 4, 5) on both nodes. > > After a few seconds the replication kicked in, and then I got the same > value (1005) on both nodes most of the time, with a few variants (2005, > 3005) thrown in. > > In a word, the sequence was completely unusable. [...some failed attempts to recover...] > So, is there a way to recover from this situation without drastic > measures like nuking the whole database. To answer my own question: delete from pglogical.queue where message_type='S'; on both nodes seems to have the desired effect. A vacuum full pglogical.queue afterwards is a good idea to get the bloated table back to a reasonable size. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: Digital signature
Re: [GENERAL] pglogical bidirectional replication of sequences
On 2017-09-01 09:57:52 -0600, Rob Sargent wrote: > On 09/01/2017 02:29 AM, Peter J. Holzer wrote: > >TLDR: Don't. > > > >I'm currently conducting tests which should eventually lead to a 2 node > >cluster with working bidirectional logical replication. > > > >(Postgresql 9.6.4-1.pgdg90+1, pglogical 2.0.1-1.jessie+1 on Debian 9 > >(Stretch)) > > > >pglogical supports replication of sequences, and although the way it > >does this suggests that it can't really work in both directions > >(actually I'm sceptical that it works reliably in one direction), of > >course I had to try it. > > [and it blew up] > I trust you mean don't use sequences I trust you don't mean what I understood ;-). Seriously: Sequences in general are fine and very useful. I think they should be used where appropriate. Sequences and logical replication don't mix well. That still doesn't mean that you can't use sequences, you just have to be careful how you use them. Since replicating sequence state doesn't really work, I think it is best to use independent sequences on each node and just configure them in a way that they can not produce the same values. A naive approach would be to use MINVALUE/MAXVALUE/START WITH to ensure non-overlapping ranges. A somewhat more elegant approach is to increment by $n$ (the number of nodes in the cluster) and use different start values (I got that idea from http://thedumbtechguy.blogspot.co.at/2017/04/demystifying-pglogical-tutorial.html). There are other ways to get unique ids: A uuid should work pretty well in most cases, and in some even a random 64 bit int might be enough. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: Digital signature
[GENERAL] pglogical bidirectional replication of sequences
TLDR: Don't. I'm currently conducting tests which should eventually lead to a 2 node cluster with working bidirectional logical replication. (Postgresql 9.6.4-1.pgdg90+1, pglogical 2.0.1-1.jessie+1 on Debian 9 (Stretch)) pglogical supports replication of sequences, and although the way it does this suggests that it can't really work in both directions (actually I'm sceptical that it works reliably in one direction), of course I had to try it. So I created a sequence on both nodes and called select pglogical.replication_set_add_sequence('default', 'test_sequence'); on both nodes. The result was ... interesting. First I got the same sequence (1, 2, 3, 4, 5) on both nodes. After a few seconds the replication kicked in, and then I got the same value (1005) on both nodes most of the time, with a few variants (2005, 3005) thrown in. In a word, the sequence was completely unusable. Experiment completed, so I removed the sequence from the replication set: select pglogical.replication_set_remove_sequence('default', 'test_sequence'); on both nodes. But the behaviour of the sequence doesn't change. It still returns 1005 most of the time, and sometimes 2005 or 3005. This is true even after restarting both nodes. Plus, I can't drop the sequence any more (as the user who created the sequence): wds=> drop sequence public.test_sequence ; ERROR: permission denied for schema pglogical So, clearly, pglogical is still managing that sequence. If I drop the sequence as postgres and then recreate it, it works normally for some time (also the sequence on the other node now works normally), but after some time, the replication kicks in again and the sequence is stuck again at 1005. So, is there a way to recover from this situation without drastic measures like nuking the whole database. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: Digital signature
Re: [GENERAL] Porting libpq to QNX 4.25
On 2017-08-22 12:57:15 -0300, marcelo wrote: > We'll replace those QNX machines with WIndows XP ones The future is already here — it's just not very evenly distributed. SCNR, hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: Digital signature
Re: [GENERAL] Count column with name 'count' returns multiple rows. Why?
On 2017-08-18 15:57:39 -0500, Justin Pryzby wrote: > On Fri, Aug 18, 2017 at 10:47:37PM +0200, Peter J. Holzer wrote: > > On 2017-08-18 06:37:15 -0500, Justin Pryzby wrote: > > > On Fri, Aug 18, 2017 at 01:01:45PM +0200, Rob Audenaerde wrote: > > > > Can anyone please explain this behaviour? > > > > > > https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SQL-EXPRESSIONS-FUNCTION-CALLS > > > https://www.postgresql.org/docs/9.6/static/rowtypes.html#ROWTYPES-USAGE > > > > Maybe I overlooked it, but I don't see anything in those pages which > > explains why «count» is parsed as a column name in the first example and > > as a function name in the second. > > > > Nor do I see what «count(base.*)» is supposed to mean. It seems to be > > completely equivalent to just writing «count», but the part in > > parentheses is not ignored: It has to be either the table name or the > > table name followed by «.*». Everything else I tried either led to a > > syntax error or to «count» being recognized as a function. So apparently > > columnname open-parenthesis tablename closed-parenthesis is a specific > > syntactic construct, but I can't find it documented anywhere. > > | Another special syntactical behavior associated with composite values is > that > |we can use functional notation for extracting a field of a composite value. > The > |simple way to explain this is that the notations field(table) and table.field > |are interchangeable. For example, these queries are equivalent: Thanks. I see it now. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: Digital signature
Re: [GENERAL] Count column with name 'count' returns multiple rows. Why?
On 2017-08-18 06:37:15 -0500, Justin Pryzby wrote: > On Fri, Aug 18, 2017 at 01:01:45PM +0200, Rob Audenaerde wrote: > > I don't understand why this query: > > > >select count(base.*) from mytable base; > > > > does return multiple rows. > > > >select count(1) from mytable base; > > > > returns the proper count. > > > > There is a column with the name 'count'. > > > > Can anyone please explain this behaviour? > > https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SQL-EXPRESSIONS-FUNCTION-CALLS > https://www.postgresql.org/docs/9.6/static/rowtypes.html#ROWTYPES-USAGE Maybe I overlooked it, but I don't see anything in those pages which explains why «count» is parsed as a column name in the first example and as a function name in the second. Nor do I see what «count(base.*)» is supposed to mean. It seems to be completely equivalent to just writing «count», but the part in parentheses is not ignored: It has to be either the table name or the table name followed by «.*». Everything else I tried either led to a syntax error or to «count» being recognized as a function. So apparently columnname open-parenthesis tablename closed-parenthesis is a specific syntactic construct, but I can't find it documented anywhere. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: Digital signature
Re: [GENERAL] Dealing with ordered hierarchies
On 2017-07-25 01:15:56 +1200, Tim Uckun wrote: > I don't like the approach with a large increment. It would mean complicated > logic to see if you filled the gap and then update all the other peers if you > did. It sounds like the re-order is going to be expensive no matter what. My > primary concern are race conditions though. What if two or more users are > trying to update the hierarchy either by inserts or updates? I can definitely > see a situation where we have issues transactions trip over each other. You could add a unique index over (parent, sequence_number). That way two transactions won't be able to add a node with the same sequence number under the same parent. You will have to handle duplicate key errors, though. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: Digital signature
Re: [GENERAL] plpgsql function with offset - Postgres 9.1
On 2017-06-16 10:19:45 +1200, Patrick B wrote: > 2017-05-29 19:27 GMT+12:00 Albe Laurenz <laurenz.a...@wien.gv.at>: > Patrick B wrote: > > I am running a background task on my DB, which will copy data from > tableA > to tableB. For > > that, I'm writing a PL/PGSQL function which basically needs to do the > following: > > > > > > 1. Select the data from tableA > > 2. The limit will be put when calling the function > > 3. insert the selected data on Step 1 onto new table [...] > > FOR row IN EXECUTE ' > > SELECT > > id, > > path, > > name, > > name_last, > > created_at > > FROM > > tablea > > WHERE > > ready = true > > ORDER BY 1 LIMIT ' || rows || ' OFFSET ' || > rows || '' > > '... LIMIT ' || p_limit || ' OFFSET ' || p_offset > > > LOOP > > num_rows := num_rows + 1; > > > INSERT INTO tableB (id,path,name,name_last,created_at) > > VALUES (row.id,row.path,row.name,row. > name_last,row.created_at); > > > > END LOOP; [...] > > There are two problems with this approach: > > 1. It will do the wrong thing if rows are added or deleted in "tablea" > while > you process it. > > > > There will be actually records being inserted in tablea while processing the > migration Any ideas here? Is id monotonically increasing? You might be able to use that, as Albe suggests: > The solution is to avoid OFFSET and to use "keyset pagination": > http://use-the-index-luke.com/no-offset But it works only if rows cannot become ready after their id range has already been processed. Otherwise you will miss them. > I can add another column in tablea, like example: row_migrated boolean --> if > that helps Yes that's probably the best way. Instead of using an additional column you could also make ready tristate: New -> ready_for_migration -> migrated. > 2. Queries with hight OFFSET values have bad performance. > > > No problem. The plan is to perform 2k rows at once, which is not much. Are rows deleted from tablea after they are migrated? Otherwise you will have a problem: select ... limit 2000 offset 1234000 will have to retrieve 1236000 rows and then discard 1234000 of them. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: Digital signature
Re: [GENERAL] storing large files in database - performance
On 2017-05-16 12:25:03 +, Eric Hill wrote: > I searched and found a few discussions of storing large files in the database > in the archives, but none that specifically address performance and how large > of files can realistically be stored in the database. > > > > I have a node.js application using PostgreSQL to store uploaded files. The > column in which I am storing the file contents is of type “bytea” with > “Storage” type set to “EXTENDED”. I have mentioned this little experiment before, but I finally put the results on my web site: https://www.hjp.at/databases/blob-bench/ (Please note that so far I have run this only on one system. Generalizing to other systems might be premature). > Storing a 12.5 MB file is taking 10 seconds, and storing a 25MB file > is taking 37 seconds. Two notable things about those numbers: It > seems like a long time, and the time seems to grow exponentially with > file size rather than linearly. > > > > Do these numbers surprise you? Yes. on my system, storing a 25 MB bytea value takes well under 1 second. hp -- _ | Peter J. Holzer| we build much bigger, better disasters now |_|_) || because we have much more sophisticated | | | h...@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/> signature.asc Description: Digital signature
Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis, amqp, s3 in the future?
On 2017-05-05 11:46:55 -0700, John R Pierce wrote: > On 5/5/2017 11:28 AM, Peter J. Holzer wrote: > > On 2017-05-04 23:08:25 +0200, Sven R. Kunze wrote: > > On 03.05.2017 12:57, Thomas Güttler wrote: > > Am 02.05.2017 um 05:43 schrieb Jeff Janes: > > No. You can certainly use PostgreSQL to store blobs. But > then, you > need to store the PostgreSQL data **someplace**. > If you don't store it in S3, you have to store it somewhere > else. > > I don't understand what you mean here. AFAIK storing blobs in PG > is not > recommended since it is not very efficient. > > Seems like several people here disagree with this conventional wisdom. > > I think it depends very much on what level of "efficiency" you need. On > my home server (i5 processor, 32GB RAM, Samsung 850 SSD - not a piece of > junk, but not super powerful either) I can retrieve a small blob from a > 100GB table in about 0.1 ms, and for large blobs the speed approaches > 200MB/s. For just about everything I'd do on that server (or even at > work) this is easily fast enough. > > > S3 is often used for terabyte to petabyte file collections. I would not want > to burden my relational database with this. I repeat the the first sentence I wrote: "I think it depends very much on what level of 'efficiency' you need." Just because some people need to store petabytes of blob data doesn't mean everybody does. If you need to store petabytes of blobs, PostgreSQL may not be the right tool. But it may be the right tool if you just need to store a few thousand PDFs. To tell people to never store blobs in PostgreSQL because PostgreSQL is "not efficient" is just bullshit. There are many factors which determine how you should store your data, and "efficiency" (however that is defined, if it's defined at all and not just used as a buzzword) is only one of them - and rarely, in my experience, the most important one. hp -- _ | Peter J. Holzer| A coding theorist is someone who doesn't |_|_) || think Alice is crazy. | | | h...@hjp.at | -- John Gordon __/ | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html signature.asc Description: Digital signature
Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis, amqp, s3 in the future?
On 2017-05-04 23:08:25 +0200, Sven R. Kunze wrote: > On 03.05.2017 12:57, Thomas Güttler wrote: > >Am 02.05.2017 um 05:43 schrieb Jeff Janes: > >>No. You can certainly use PostgreSQL to store blobs. But then, you > >>need to store the PostgreSQL data **someplace**. > >>If you don't store it in S3, you have to store it somewhere else. > > > >I don't understand what you mean here. AFAIK storing blobs in PG is not > >recommended since it is not very efficient. > > Seems like several people here disagree with this conventional wisdom. I think it depends very much on what level of "efficiency" you need. On my home server (i5 processor, 32GB RAM, Samsung 850 SSD - not a piece of junk, but not super powerful either) I can retrieve a small blob from a 100GB table in about 0.1 ms, and for large blobs the speed approaches 200MB/s. For just about everything I'd do on that server (or even at work) this is easily fast enough. Sure, just telling the kernel "send data from file descriptor A (which happens to be a file) to file descriptor B (a socket)" is a lot more efficient than copying data from disk into a postgresql process, then from that process to an application server, from that to the webserver and that finally sends it to the socket. But if that just lets my server be 99.9% idle instead of 99.0% idle, I haven't gained much. Similarly, if my server spends 90% of it's resources doing other stuff, I won't gain much by optimizing this (I should better optimize that other stuff it's spending so much time on). I am in this regard a firm believer in not optimizing prematurely. Do whatever makes sense from an application point of view. If the blobs are logically part of some other data (e.g. PDFs in a literature database), store them together (either all of them in PostgreSQL, or all in some NoSQL database, or maybe on stone tablets, if that makes sense for some reason). Only if you have good reason[1] to believe that physically separating data which logically belongs together will resolve a bottleneck, then by all means separate them. hp [1] "I read somewhere on the internet" is usually not a good reason. -- _ | Peter J. Holzer| A coding theorist is someone who doesn't |_|_) || think Alice is crazy. | | | h...@hjp.at | -- John Gordon __/ | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html signature.asc Description: Digital signature
Re: [GENERAL] Handling psql lost connections
On 2017-03-29 08:49:57 -0700, Steve Crawford wrote: > When firewalls/VPNs stand between my psql client and a remote PostgreSQL > server > the connection will on occasion time out and drop. This results in the > following scenario: > > -Leave for lunch mid project - leave psql open. > > -Return from lunch, complete and submit large query. > > -Notice query is taking too long. cancel it. > > -Cancel doesn't return - realize that connection has dropped. > > -Kill psql - history is not written out. Start query from scratch. > > Is there: [...] > Yes, I know I and my coworkers could spend brain cycles trying to unerringly > remember to close and restart connections, write all queries in an external > editor and then submit them, etc. but I'm looking for more user friendly > options. One workaround could be to login to the server, start a screen session and psql in the screen session. Then if your network connection drops you can simply login again and resume the screen session. Of course this only works if you have a shell login on the server which may not be the case. hp -- _ | Peter J. Holzer| A coding theorist is someone who doesn't |_|_) || think Alice is crazy. | | | h...@hjp.at | -- John Gordon __/ | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html signature.asc Description: Digital signature
Re: [GENERAL] inevitability of to_date() when converting representations which don't represent whole timestamps
I don't understand what you mean by "inevitability" in the subject. On 2017-03-29 21:19:56 -0400, Shaun Cutts wrote: > When being asked to convert a day of the week, the to_date() function > returns the same day ('0001-01-01 BC’) no matter which day is > converted: > > # select to_date(‘Monday’, ‘Day’) > '0001-01-01 BC’ > > # select to_date(‘Tuesday’, ‘Day’) > '0001-01-01 BC’ > > However, if it were to return a date that was that day of the week, it > could be inverted: > > # select extract(dow from '0001-01-01 BC'::date); — this date should be the > result of to_date(‘Sunday’, ‘Day’) > 6 > > # select extract(dow from '0001-01-02 BC'::date); — this date should be the > result of to_date(‘Monday’, ‘Day’) > 0 > > …. > > David tells this is not a bug, but it still seems like a reasonable > requirement on to_date() to me. Is there some reason why this isn’t > possible? The documentation warns that to_date “interpret input liberally, with minimal error checking” and while it “produce[s] valid output, the conversion can yield unexpected results.” I would agree that producing the same date for every day of the week crosses the line between “unexpected (but valid) result” and “bug”. On the other hand I have no idea what the result of to_date(‘Monday’, ‘Day’) should be. “Any date which is a Monday” seems too vague. “The nearest Monday”, “the previous Monday”, “the next Monday” might be useful in practice, but whichever of them you pick, you've picked the wrong one with a probability of 2/3. “The first monday in the year -1 of the proleptic Gregorian calendar” would be consistent with how to_timestamp('12:34:56', 'HH24:MI:SS') works, but apart from that and being invertible it seems to be a quite useless choice. hp -- _ | Peter J. Holzer| A coding theorist is someone who doesn't |_|_) || think Alice is crazy. | | | h...@hjp.at | -- John Gordon __/ | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html signature.asc Description: Digital signature
Re: [GENERAL] Postgres Permissions Article
On 2017-03-29 08:05:23 -0700, Paul Jungwirth wrote: > On 03/29/2017 06:36 AM, Tom Lane wrote: > >Karsten Hilbert <karsten.hilb...@gmx.net> writes: > >>Being able to create foreign keys may allow to indirectly > >>discover whether certain values exists in a table which I > >>don't otherwise have access to (by means of failure or > >>success to create a judiciously crafted FK). > > > >Aside from that, an FK can easily be used to cause effective > >denial-of-service, for example preventing rows from being deleted > >within a table, or adding enormous overhead to such a deletion. > > Thank you both for taking a look! I agree those are both worthwhile > concerns. It still seems a little strange it is not just part of the CREATE > permission (for example). I understand why not everyone can create a foreign > key, I just have trouble imagining a use case where it is helpful to > separate it from other DDL commands. A foreign key affects not only the table on which it is defined but also the table it references. If Alice creates a table “master” and Bob creates a table “detail” referencing “master”, Bob can prevent Alice from deleting entries from her own table. So Alice must be able to decide whom she allows to reference her tables. I don't see how how this could be part of the create privilege - I certainly want different roles to be able to create their own tables (or views, or whatever) without being able to DOS each other (accidentally or intentionally). (Also I don't understand why you wrote “You need the permission on both tables”: Only the owner of a table can add constraints to it - this privilege cannot be granted to other roles at all. So to create a foreign key constraint you need to be the owner of the referencing table and have the references privilege on the referenced table. It's not symmetrical.) hp -- _ | Peter J. Holzer| A coding theorist is someone who doesn't |_|_) || think Alice is crazy. | | | h...@hjp.at | -- John Gordon __/ | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html signature.asc Description: Digital signature
Re: [GENERAL] Autoanalyze oddity
On 2017-03-05 12:01:07 +0100, Peter J. Holzer wrote: [...] > At the current rate of inserts, this threshold will be reached on > March 24nd. I'll check whether the table is analyzed then. It was (a little earlier than expected because pg_class.reltuples didn't increase in the meantime). hp -- _ | Peter J. Holzer| A coding theorist is someone who doesn't |_|_) || think Alice is crazy. | | | h...@hjp.at | -- John Gordon __/ | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html signature.asc Description: Digital signature
Re: [GENERAL] Autoanalyze oddity
On 2017-03-05 08:39:05 -0800, Adrian Klaver wrote: > On 03/05/2017 03:01 AM, Peter J. Holzer wrote: > >So it is likely that something happened on that day (disk full?) which > >wiped out the contents of pg_stat_user_tables. > > Are there any logs from that time, either Postgres or system? > > I would think a full disk would have been noticed at the time so alternate > theories: > > https://www.postgresql.org/docs/9.5/static/monitoring-stats.html > > "... When the server shuts down cleanly, a permanent copy of the statistics > data is stored in the pg_stat subdirectory, so that statistics can be > retained across server restarts. When recovery is performed at server start > (e.g. after immediate shutdown, server crash, and point-in-time recovery), > all statistics counters are reset. > ..." Oh, of course. That was the day we found out the hard way that the bypass for the UPS didn't work. I knew that date looked familiar, but somehow couldn't place it. Mystery solved, thanks! hp -- _ | Peter J. Holzer| A coding theorist is someone who doesn't |_|_) || think Alice is crazy. | | | h...@hjp.at | -- John Gordon __/ | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html signature.asc Description: Digital signature
Re: [GENERAL] Autoanalyze oddity
On 2017-03-03 06:39:35 -0800, Adrian Klaver wrote: > On 03/03/2017 12:33 AM, Peter J. Holzer wrote: > >This is with PostgreSQL 9.5.6 on Debian Linux. > > > >I noticed that according to pg_stat_user_tables autoanalyze has never > >run on a lot of tables. Here is one example: > > > >wdsah=> select * from pg_stat_user_tables where schemaname='public' and > >relname='facttable_wds_indexstats'; > >─[ RECORD 1 ]───┬─ [...] > >n_tup_ins │ 47128 [...] > >n_live_tup │ 47128 > >n_dead_tup │ 0 > >n_mod_since_analyze │ 47128 > >last_vacuum │ (∅) > >last_autovacuum │ (∅) > >last_analyze│ (∅) > >last_autoanalyze│ (∅) > >vacuum_count│ 0 > >autovacuum_count│ 0 > >analyze_count │ 0 > >autoanalyze_count │ 0 > > > >wdsah=> select count(*) from facttable_wds_indexstats; > > count > > > > 857992 > >(1 row) > > > >So, n_live_tup is way off, and n_tup_ins and n_mod_since_analyze also > >seem to be wrong. Looks like this hasn't been updated in a year or so. > >But track_counts is on: > > > >wdsah=> show track_counts; > > track_counts > >── > > on > >(1 row) > > What are your settings for autovacuum?: > > https://www.postgresql.org/docs/9.5/static/runtime-config-autovacuum.html All the values in the autovacuum section of postgresql.conf are commented out, so they should be the default values: Just to be sure here's the output of show for each of the parameters: wdsah=> show autovacuum; on wdsah=> show log_autovacuum_min_duration; -1 wdsah=> show autovacuum_max_workers; 3 wdsah=> show autovacuum_naptime; 1min wdsah=> show autovacuum_vacuum_threshold; 50 wdsah=> show autovacuum_analyze_threshold;50 wdsah=> show autovacuum_vacuum_scale_factor; 0.2 wdsah=> show autovacuum_analyze_scale_factor; 0.1 wdsah=> show autovacuum_freeze_max_age; 2 wdsah=> show autovacuum_multixact_freeze_max_age; 4 wdsah=> show autovacuum_vacuum_cost_delay;20ms wdsah=> show autovacuum_vacuum_cost_limit;-1 > Have the storage parameters for the table been altered?: > > https://www.postgresql.org/docs/9.5/static/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS No. > >And even if it wasn't, shouldn't the autovacuum daemon notice that > >n_mod_since_analyze is greater than n_live_tup * > >autovacuum_analyze_scale_factor and run an autoanalyze? > > That value is added to autovacuum_analyze_threshold: > > autovacuum_analyze_scale_factor (floating point) > > Specifies a fraction of the table size to add to > autovacuum_analyze_threshold when deciding whether to trigger an ANALYZE. > The default is 0.1 (10% of table size). This parameter can only be set in > the postgresql.conf file or on the server command line; but the setting can > be overridden for individual tables by changing table storage parameters. True. But 50 is negligible compared to 47128*0.1. So that shouldn't make much of a difference. But now that I look closer, I notice that the number in n_tup_ins for that table is exactly the number of records inserted since 2017-02-08T13:00 and there were no records inserted between 09:00 and 13:00 on that day. So it is likely that something happened on that day (disk full?) which wiped out the contents of pg_stat_user_tables. Looking into the source code, I find that reltuples = classForm->reltuples; Am I correct to assume that this is pg_class.reltuples? That would explain why analyze hasn't run yet: This is 862378, which is exactly correct. 862378 * 0.1 + 50 is 86287.8, which is larger than pg_stat_user_tables.n_mod_since_analyze. At the current rate of inserts, this threshold will be reached on March 24nd. I'll check whether the table is analyzed then. hp -- _ | Peter J. Holzer| A coding theorist is someone who doesn't |_|_) || think Alice is crazy. | | | h...@hjp.at | -- John Gordon __/ | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html signature.asc Description: Digital signature
[GENERAL] Autoanalyze oddity
This is with PostgreSQL 9.5.6 on Debian Linux. I noticed that according to pg_stat_user_tables autoanalyze has never run on a lot of tables. Here is one example: wdsah=> select * from pg_stat_user_tables where schemaname='public' and relname='facttable_wds_indexstats'; ─[ RECORD 1 ]───┬─ relid │ 112723 schemaname │ public relname │ facttable_wds_indexstats seq_scan│ 569 seq_tup_read│ 474779212 idx_scan│ 59184 idx_tup_fetch │ 59184 n_tup_ins │ 47128 n_tup_upd │ 0 n_tup_del │ 0 n_tup_hot_upd │ 0 n_live_tup │ 47128 n_dead_tup │ 0 n_mod_since_analyze │ 47128 last_vacuum │ (∅) last_autovacuum │ (∅) last_analyze│ (∅) last_autoanalyze│ (∅) vacuum_count│ 0 autovacuum_count│ 0 analyze_count │ 0 autoanalyze_count │ 0 wdsah=> select count(*) from facttable_wds_indexstats; count 857992 (1 row) So, n_live_tup is way off, and n_tup_ins and n_mod_since_analyze also seem to be wrong. Looks like this hasn't been updated in a year or so. But track_counts is on: wdsah=> show track_counts; track_counts ── on (1 row) And even if it wasn't, shouldn't the autovacuum daemon notice that n_mod_since_analyze is greater than n_live_tup * autovacuum_analyze_scale_factor and run an autoanalyze? But the really weird thing is that pg_stats seems to be reasonably current: I see entries in most_common_vals which were only inserted in January. Is it possible that autoanalyze runs without updating pg_stat_user_tables? hp -- _ | Peter J. Holzer| A coding theorist is someone who doesn't |_|_) || think Alice is crazy. | | | h...@hjp.at | -- John Gordon __/ | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html signature.asc Description: Digital signature
Re: [GENERAL] Loose indexscan and partial indexes
On 2017-02-10 14:24:36 +0100, Thomas Kellerer wrote: > Peter J. Holzer schrieb am 10.02.2017 um 14:02: > > So it's doing a sequential scan on the initial select in the recursive > > CTE, but using the index on the subsequent selects. > > > > But why? If it uses the index on > > SELECT MIN(periodizitaet) FROM facttable_imf_ifs WHERE periodizitaet > > > 'x' > > shouldn't it be able to use the same index on > > SELECT MIN(periodizitaet) FROM facttable_imf_ifs > > What is the definition of the index facttable_imf_ifs_periodizitaet_idx? The solution to the puzzle was just 2 paragraphs further down. Looks like I have to practice this arc of suspense thing ;-) hp -- _ | Peter J. Holzer| A coding theorist is someone who doesn't |_|_) || think Alice is crazy. | | | h...@hjp.at | -- John Gordon __/ | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html signature.asc Description: Digital signature
[GENERAL] Loose indexscan and partial indexes
.51 rows=1 width=2) (actual time=0.039..0.040 rows=1 loops=1) -> Index Only Scan using facttable_imf_ifs_periodizitaet_idx on facttable_imf_ifs (cost=0.44..1516760.47 rows=21080284 width=2) (actual time=0.038..0.038 rows=1 loops=1) Index Cond: ((periodizitaet IS NOT NULL) AND (periodizitaet IS NOT NULL)) Heap Fetches: 1 -> WorkTable Scan on t t_1 (cost=0.00..6.19 rows=10 width=32) (actual time=0.161..0.162 rows=1 loops=4) Filter: (periodizitaet IS NOT NULL) Rows Removed by Filter: 0 SubPlan 3 -> Result (cost=0.59..0.60 rows=1 width=0) (actual time=0.212..0.212 rows=1 loops=3) InitPlan 2 (returns $3) -> Limit (cost=0.44..0.59 rows=1 width=2) (actual time=0.211..0.211 rows=1 loops=3) -> Index Only Scan using facttable_imf_ifs_periodizitaet_idx on facttable_imf_ifs facttable_imf_ifs_1 (cost=0.44..1061729.65 rows=7026761 width=2) (actual time=0.208..0.208 rows=1 loops=3) Index Cond: ((periodizitaet IS NOT NULL) AND (periodizitaet > t_1.periodizitaet)) Heap Fetches: 2 Planning time: 8.883 ms Execution time: 0.801 ms (23 rows) 800 times faster :-). hp -- _ | Peter J. Holzer| A coding theorist is someone who doesn't |_|_) || think Alice is crazy. | | | h...@hjp.at | -- John Gordon __/ | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html signature.asc Description: Digital signature
Re: [GENERAL] Are new connection/security features in order, given connection pooling?
On 2017-01-11 00:49:19 -0800, Guyren Howe wrote: > I’m not following. What I would like is just a lightweight way to switch the > connections to use a different role, or some moral equivalent, that would > prevent an SQL injection from wrecking havoc. I’m not proposing anything that > will change anything else about how the application is using the database. > > SET ROLE doesn’t work, because an SQL injection can just SET ROLE back to the > privileged user. But then you are no worse off than with the commonly used scheme of executing all queries as the same (necessarily "privileged") user. In both cases the attacker can execute queries as a privileged user IF he succeeds at sql injections. But as others have already noted this is relatively easy to prevent. Just preparing all queries is sufficient, even if you don't actually parametrize them. Perl DBI does this, so this dangerous-looking line of code (assume that the string wasn't hardcoded but the result of an SQL injection): $r = $dbh->selectall_arrayref("select * from twoqueries; insert into twoqueries(t) values('b')"); will fail with DBD::Pg::db selectall_arrayref failed: ERROR: cannot insert multiple commands into a prepared statement at ./twoqueries line 21. So I think just using set local role at the beginning of each transaction should work well with session pooling. It doesn't protect you against sql injections, but you won't have to reinvent the authorization system. > I would like a regime where there is no permanent privileged relationship > between the client application and the database; a user would need to supply > validating information that we can use to switch a connection to something > with > minimal privileges for just that role, for the duration of one session or > transaction. I haven't read the blog post referenced in this thread yet, so maybe this is covered there, but I think "supplying validating information" would be the hard part. In general you wouldn't want a web-frontend to cache plain-text passwords to resubmit them for each transaction, but to use something more ethereal, like session cookies or kerberos tickets. hp -- _ | Peter J. Holzer| A coding theorist is someone who doesn't |_|_) || think Alice is crazy. | | | h...@hjp.at | -- John Gordon __/ | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html signature.asc Description: Digital signature
Re: [GENERAL] COPY: row is too big
On 2017-01-04 06:53:31 -0800, Adrian Klaver wrote: > On 01/04/2017 05:00 AM, vod vos wrote: > >Now I am confused about I can create 1100 columns in a table in > >postgresql, but I can't copy 1100 values into the table. And I really > > As pointed out previously: > > https://www.postgresql.org/about/ > Maximum Columns per Table 250 - 1600 depending on column types > > That being dependent on both the number of columns and the actual data in > the columns. I think this is confusingly phrased. In my mind "column type" is static - the type is the same, independent of the values which are stored. So "250 - 1600 depending on column types" implies to me that there is some type A of which I can have only 250 columns and another type B of which I can have 1600 columns. But it doesn't imply to me that the number of columns depends on the values which ar put into those columns. May I suggest the these improvements? In https://www.postgresql.org/about/: Instead of | 250 - 1600 depending on column types write | 250 - 1600 depending on column types and data In https://www.postgresql.org/docs/9.6/static/ddl-basics.html: Replace the sentence: | Depending on the column types, it is between 250 and 1600. with: | For all columns in a row, some information (either the data itself or | a pointer to the data) must be stored in a single block (8 kB). | Because for some types this data is itself of variable length, the | maximum number of columns depends not only on the types of the columns | but also on the data (e.g., a NULL uses less space than a non-NULL | value). Therefore there is no simple way to compute the maximum number | of columns, and it is possible to declare a table with more columns | than can be filled. Keeping all this in mind, the limit is between 250 | and 1600. hp -- _ | Peter J. Holzer| A coding theorist is someone who doesn't |_|_) || think Alice is crazy. | | | h...@hjp.at | -- John Gordon __/ | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html signature.asc Description: Digital signature
Re: [GENERAL] explain analyze showed improved results without changes, why?
On 2016-12-23 10:35:26 -0600, Chris Richards wrote: > Howdy. I was noticing a significant problem with a query on one of my tables. > I > tried recreating the problem and wasn't able to do so on a different install, > and so a few days later I ran the same query on the problem table. Lo' and > behold, there wasn't a problem anymore. I'm at a loss to why. [...] > "blocks_off_sz_idx" btree (off, sz) > > mdb=> explain analyze SELECT * FROM blocks > WHERE cloudidx=98038 AND off+sz >= 0 AND state='seeded' > ORDER BY off LIMIT 1 FOR UPDATE; > QUERY > PLAN > --- > Limit (cost=0.43..587.83 rows=1 width=100) (actual time=4814.579..4814.579 > rows=1 loops=1) > -> LockRows (cost=0.43..1358633.99 rows=2313 width=100) (actual time= > 4814.577..4814.577 rows=1 loops=1) > -> Index Scan using blocks_off_sz_idx on blocks (cost= > 0.43..1358610.86 rows=2313 width=100) (actual time=4813.498..4814.384 rows=2 > loops=1) > Filter: ((cloudidx = 98038) AND (state = > 'seeded'::block_state) AND ((off + sz) >= 0)) > Rows Removed by Filter: 6935023 > Total runtime: 4814.619 ms > (6 rows) This scans the table in ascending (off, sz) order until it finds one row matching the filter. Apparently at the time of the query there were 6935023 rows in the table before the matching row. [...] > And here's the second. Notice that even though there are more rows, it was > much > faster and the "rows removed by filter" were significantly reduced by several > orders of magnitude. > > > mdb=> explain analyze SELECT * FROM blocks > WHERE cloudidx=98038 AND off+sz >= 0 AND state='seeded' > ORDER BY off LIMIT 1 FOR UPDATE; > QUERY PLAN > - > Limit (cost=0.43..584.57 rows=1 width=100) (actual time=0.071..0.071 rows=1 > loops=1) > -> LockRows (cost=0.43..1390825.21 rows=2381 width=100) (actual > time=0.070..0.070 rows=1 loops=1) > -> Index Scan using blocks_off_sz_idx on blocks > (cost=0.43..1390801.40 rows=2381 width=100) (actual time=0.055..0.055 rows=1 > loops=1) > Filter: ((cloudidx = 98038) AND (state = > 'seeded'::block_state) AND ((off + sz) >= 0)) > Rows Removed by Filter: 26 > Total runtime: 0.114 ms > (6 rows) The plan here is exactly the same, but only 26 rows are discarded. My guess is that between those two queries a row was inserted with a really low (off, sz) value which matches the query. So now the query can return after checking only a handful of rows. LIMIT, EXISTS, etc. are awful when you want predictable performance. You may be lucky and the rows you are looking for are just at the start or you may be unlucky and you have to scan through the whole table to find them. The optimizer (usually) doesn't have enough information and assumes they are spread randomly through the table. hp -- _ | Peter J. Holzer| A coding theorist is someone who doesn't |_|_) || think Alice is crazy. | | | h...@hjp.at | -- John Gordon __/ | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html signature.asc Description: Digital signature
Re: [GENERAL] Is is safe to use SPI in multiple threads?
On 2016-12-09 16:52:05 +0800, Qiu Xiafei wrote: > I'm new to PG and want to implement my domain-specific system based on PG. I > wish to arrange my data as several tables in database and translate my DSL > into > SQL statements for query. Since one DSL statement may be mapped to several SQL > statements, it's better to push the DSL server as close to the PG server as > possible. I found PG's backgroud worker meet my needs. I can setup a > background > worker bounded to PG server and listen to a port for network requests. > > But I encounter a problem that the Server Programing Interfaces are not THREAD > SAFE. There are some global variables defined like: SPI_processed, > SPI_tuptable, etc. This limit to my DSL server to work in single thread mode > which is quite inefficient. I had a similar requirement. I solved it by moving the application logic out of the stored procedures. All the stored procedure does is an RPC call (I use ØMQ for that) to a server process and send the result back to the client. The server process converts the request into multiple SQL queries which can be processed in parallel. The downside is of course that the communication overhead is much higher (A minimum of 4 network messages per request). That's not a problem in my case, but you mileage may vary. The advantages in my opinion are: * A standalone server process is easier to test and debug than a bunch of stored procedures. * I can easily scale out if necessary: Currently my database and server process run on the same machine, but I could distribute them over several machines with (almost) no change in logic. hp -- _ | Peter J. Holzer| A coding theorist is someone who doesn't |_|_) || think Alice is crazy. | | | h...@hjp.at | -- John Gordon __/ | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html signature.asc Description: Digital signature
Re: [GENERAL] Index size
On 2016-12-09 21:45:35 -0500, Melvin Davidson wrote: > On Fri, Dec 9, 2016 at 6:40 PM, Samuel Williams > <space.ship.travel...@gmail.com> > wrote: > >I also read that when you change a column which is not index, all the > >indexes for that row need to be updated anyway. Is that correct? > > That is not correct. Indexes are changed under the following conditions: > A. An insert is done to the table which involves an index. > B. A delete is done to the table which involves an index. > C. An update is done that involves columns included in an index. > D. An index is REINDEXed > > Indexes point to the tid of the row for which the column(s) in the index > are involved. So if columns updated are not involved in the index, > there is no need to change the index. I don't think this is generally correct. The TID is a (block,item) tuple. It the updated version of the row doesn't fit into the same block it has to be stored in a different block, so the TID will change (AIUI there is a bit of trickery to avoid changing the TID if the new version is stored in the same block). This means that all the index entries for this row (not just for the changed field) will have to be updated. You can set fillfactor to a smaller value to make this less likely. hp -- _ | Peter J. Holzer| A coding theorist is someone who doesn't |_|_) || think Alice is crazy. | | | h...@hjp.at | -- John Gordon __/ | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html signature.asc Description: Digital signature
Re: [GENERAL] Ascii Elephant for text based protocols
On 2016-05-15 14:02:56 +0200, Charles Clavadetscher wrote: > ++ > | __ ___| > | /)/ \/ \ | > | ( / ___\) | > | \(/ o) ( o) ) | > | \_ (_ ) \ ) / | > | \ /\_/\)_/| > | \/ //| |\\ | > | v | | v | > |\__/| > || > | PostgreSQL 1996-2016 | > | 20 Years of success | > +----+ Nice. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] Using both ident and password in pg_hba.conf
On 2016-05-09 16:18:39 -0400, D'Arcy J.M. Cain wrote: > On Mon, 9 May 2016 13:02:53 -0700 > Adrian Klaver <adrian.kla...@aklaver.com> wrote: > > So define PHP runs as 'nobody'? > > Because of the way PHP and Apache works PHP script have to run as the > Apache user which, in my case anyway, is "nobody" so every PHP script > runs as nobody. This is not really true. You can use FastCGI to run PHP for each site as a different user. For Apache there is also an MPM (http://mpm-itk.sesse.net/) which lets you run apache processes (and therefore also any embedded mod_php) under different uids. So while running everything as nobody is the default, it is possible to use different users, and I would strongly recommend doing this if you have multiple customers. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] Allow disabling folding of unquoted identifiers to lowercase
On 2016-04-29 19:21:30 +0200, Evgeny Morozov wrote: > It would be great if Postgres had a server setting that allowed the automatic > folding of identifiers to lowercase to be disabled, so that camel case > identifiers could be used without having to quote every single identifier, > i.e. > > SELECT MyColumn FROM MyTable ORDER BY MyColumn > > instead of > > SELECT "MyColumn" FROM "MyTable" ORDER BY "MyColumn" [...] > My company is looking into doing this. Currently our table and column names > exactly match our class and property names, which are in camel case. MSSQL > supports this just fine. To move to Postgres we would have to either quote > *everything* or translate names back-and-forth between code and database. Both > options are OK for auto-generated SQL, but we also have many users writing > ad-hoc SQL queries. Having to quote everything would have those users > screaming > to switch back to MSSQL very quickly! That leaves us with the mapping > approach, > which is doable, but also a constant "mental speedbump" at best. What exactly is the problem you are trying to solve? If you and your users are consistent about never using quotes, your users can write: SELECT MyColumn FROM MyTable ORDER BY MyColumn; It will select mycolumn from mytable, but that doesn't matter, since you created the table with CREATE MyTable (MyColumn varchar); so you really have a table mytable with a column mycolumn, not a table MyTable with a column MyColumn. There are three potential problems I can see: 1) Users might be confused that PgAdmin (or whatever tool they use to inspect the database) displays all the names in lowercase, and they might find a name like sometablewithareallylongname less readable than SomeTableWithAReallyLongName. 2) Since case doesn't matter, they might be inconsistent: One programmer might write MyTable, another mytable, the third MYTABLE, ... 3) You might want to use a tool to automatically generate SQL queries, but that tool quotes identifiers. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] Distributed Table Partitioning
On 2016-03-12 21:19:11 -0500, Melvin Davidson wrote: > - Original Message - > From: "Leonardo M. Ramé" <l.r...@griensu.com> > To: "PostgreSql-general" <pgsql-general@postgresql.org> > Sent: Saturday, 12 March, 2016 8:25:01 PM > Subject: [GENERAL] Distributed Table Partitioning > > I have this problem: a Master table containing records with a timestamp > column registering creation date-time, and one Detail table containing > info related to the Master table. > > As time went by, those tables grew enormously, and I can't afford > expanding my SSD VPS. So I'm thinking about storing only NEW data into > it, and move OLD data to a cheaper SATA VPS. [...] > Why don't you just make use of tablespaces and partition the child tablespaces > so that the newer parttion is on the SSD and the older one is on SATA? Since he mentioned virtual private servers (VPS) the reason might be that his hoster offers VPS with SSDs (of various sizes) and VPS with rotating hard disks (of various sizes), but not VPS with both. So he can't rent a VPS with a relatively small SSD and a larger hard disk. That might be a reason to look for an alternate hoster, but if he's otherwise happy, switching to an unknown provider might be considered too large a risk. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] Unable to match same value in field.
On 2016-03-10 11:09:00 +0200, Condor wrote: > I using postgresql 9.5.1 and I have problem to match value in one field. > Both tables are text: [...] > =# select imsi from list_cards_tbl where imsi = '28411123315'; > imsi > -- > (0 rows) > > No value, lets change to LIKE > > =# select imsi, md5(imsi), bit_length(imsi) from list_cards_tbl where imsi > like '28411123315%'; > imsi | md5| bit_length > -+--+ > 28411123315 | b438e984c97483bb942eaaed5c0147f3 |120 > (1 row) That looks familiar. I think I've seen something similar recently. That was on 9.5beta1 (I haven't gotten around to upgrade to 9.5.1 yet). > =# reindex table list_cards_tbl; > REINDEX [...] > Still cant find value. Dropping and recreating the index helped in my case. Still, I find it worrying if a value which obviously is in the table can't be found using the index. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] index problems (again)
On 2016-03-12 21:00:04 +, Geoff Winkless wrote: > On 12 March 2016 at 18:43, Peter J. Holzer <hjp-pg...@hjp.at> wrote: > > The question is what can be done to improve the situation. > > > > Tom thinks that correlation statistics would help. That seems plausible > > to me. [...] > > You claim that no statistics are needed. > > Well that's a bit confrontational. Sorry. Didn't want to sound confrontational. I was just repeating points made by Tom and you previously in this thread to establish a baseline. > > That may or may not be true: You haven't proposed an alternate method > > yet. > > You could make an assumption that perfect distribution isn't true: > that actually the distribution is within a certain _deviation_ of that > perfect distribution. It wouldn't have to have been very much to make > the index-only scan win here and would still keep the planner from > choosing less optimal queries most of the time (and where it did end > up making the "wrong" choice it's not going to be far off anyway). > > But I'm making assumptions here, I'm aware of that. Chances are that > actually most people's data _does_ fit into this perfect distribution > set. Is there any research that shows that real-world data usually > does? I don't think most people's data is perfectly distributed. But as you say most data is probably within some deviation of being perfectly distributed and as long as that deviation isn't too big it doesn't matter. But there are certainly some common examples of highly correlated columns. Having a serial id and a date as in your case is probably quite common. Another example might be a surrogate primary key which is computed from some other fields (e.g. a timeseries code starting with a country code, or a social security number starting with the birth date, ...). That's probably not that uncommon either. So, I agree with you. This is a problem and it should be fixed. I'm just sceptical that it can be done with a simple cost adjustment. > As Jeff points out I'd have a much larger win in this instance by > someone spending the time implementing skip index scans rather than > messing with the planner :) Yeah. I think I have some code which could benefit from this, too. I'll have to try that trick from the wiki. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] index problems (again)
On 2016-03-08 10:16:57 +, Geoff Winkless wrote: > On 7 March 2016 at 20:40, Peter J. Holzer <hjp-pg...@hjp.at> wrote: > > As Tom wrote, the estimate of having to read only about 140 rows is only > > valid if sc_id and sc_date are uncorrelated. In reality your query has > > to read a lot more than 140 rows, so it is much slower. > > But as I've said previously, even if I do select from scdate values > that I know to be in the first 1% of the data (supposedly the perfect > condition) the scan method is insignificantly quicker than the index > (scdate,scid) method. Actually the planner expects find a match within the first 0.0035 %, so to find out how fast that would be you would have to use a value from that range. > Even with the absolute perfect storm (loading in the entire index for > the full range) it's still not too bad (1.3 seconds or so). > > The point is that to assume, knowing nothing about the data, that the > data is in an even distribution is only a valid strategy if the worst > case (when that assumption turns out to be wildly incorrect) is not > catastrophic. That's not the case here. True. The fundamental problem here is that the planner doesn't have any notion of a worst case. It only knows "cost", and that is a single number for each operation. For many operations, both the best case and the worst case are unusable as cost - the first would almost always underestimate the time and choose a plan which is far from optimal and the second would almost always overestimate it and reject an optimal plan. The art of programming a planner (which I've dabbled with in a previous (not postgresql-related) project but certainly can't claim any expertise in) lies in choosing a cost function which is quite close most of the time and catastrophically wrong only very rarely. It is clear that PostgreSQL hasn't succeed in the latter category: Correlated columns do occur and the current cost function, which assumes that all columns are uncorrelated can catastrophically underestimate the cost in this case. The question is what can be done to improve the situation. Tom thinks that correlation statistics would help. That seems plausible to me. You claim that no statistics are needed. That may or may not be true: You haven't proposed an alternate method yet. I feel fairly certain that using the worst case (the cost for scanning the whole table) would be just as bad in and would cause inferior plans to be used in many instances. Maybe computing the cost as weighted average of the best, average and worst case (e.g. cost = cost_best*0.05 + cost_avg*0.90 + cost_worst*0.05) would penalize methods with a large spread between best and worst case enough - but that still leaves the problem of determining the weights and determining what the "average" is. So it's the same black magic as now, just the little more complicated (on the plus side, this would probably be a relatively simple patch). If we assume that we could revamp the planner completely, other possibilities come to mind: For example, since I think that the core problem is having a single number for the cost, the planner could instead compute a distribution (in the most simple case just best and worst case, but ideally many values). Then the planner could say something like: I have two plans A nd B and A is at most 20 % faster in almost all cases. But in the worst case, A is 1000 times slower. Being 20 % faster most of the time is nice but doesn't outweigh the risk of being 1000 times slower sometimes, so I'll use B anyway. Another possibility I've been considering for some time is feeding back the real execution times into the planner, but that sounds like a major research project. (Actually I think Oracle does something like this since version 12) hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] index problems (again)
On 2016-03-07 16:37:37 +, Geoff Winkless wrote: > On 7 March 2016 at 16:02, Tom Lane <t...@sss.pgh.pa.us> wrote: > > In English, what that plan is trying to do is scan the index > > in sc_id order until it hits a row with scdate in the target range. > > The first such row, by definition, has the correct min(sc_id) value. > > The problem is that we're guessing at how soon we'll hit such a row. > > If the columns are independent, then the planner can guess based on how > > many rows in the whole table have scdate in the target range, and it > > will probably be about right. But that estimate can fall down very > > badly if sc_id and scdate increase together, because then the target > > rows aren't randomly distributed in the index sequence but could all be > > all the way at the far end of the index. > > I'm sorry, I'm obviously not being clear. I already accepted this > argument when Victor gave it, although I believe that in part it falls > down because sc_id is also (potentially) randomly distributed so it's > not like you're doing a sequential table scan (it might work better on > a clustered table, but we don't have those :) ) > > So you still have an extra layer of indirection into a large table > with lots of random accesses. > > > If we had cross-column correlation stats we could detect this pitfall, > > but without that it's hard to do. > > But as far as I can see, apart from the absolute extremes, the > index-only scan is _always_ going to be quicker than the index+table > scan. We are talking about an "absolute extreme" here. You have about 420 date values and you are looking for 3 of them. Assuming for the moment that your distribution is uniform, that's 140th of the whole table. So if PostgreSQL were using the (sc_date,sc_id) index, it would have so scan 4E6/140 = 29000 index entries, extract the id value and get the minumum of those 29000 values. OTOH, if it uses the sc_id index, it only expects to have to scan 140 entries until it finds a matching entry. And then it is finished. So it's 140 index entries plus row accesses against 29000 index entries. To choose the second plan, the planner would have to estimate that reading a random row is more than 200 times slower than reading an index entry, which apparently it doesn't. As Tom wrote, the estimate of having to read only about 140 rows is only valid if sc_id and sc_date are uncorrelated. In reality your query has to read a lot more than 140 rows, so it is much slower. > I don't believe you need any further statistics than what is currently > available to be able to make that judgement, and that's why I believe > it's suboptimal. We all know it is suboptimal, but unfortunately, without additional statistics I don't think there is a better way. The other way around - assuming that the columns are correlated in the worst possible way - would remove viable plans in many cases. This is, I think one of the places where hints are a good idea. The programmer sometimes knows more about the characteristics of the data than the planner can possibly know and it is a pity that there is no way for the programmer to pass that knowledge to the planner. (And yes, I know that quite often the programmer is wrong - but I do believe in giving people enough rope to hang themselves with) hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] BRIN Usage
On 2016-02-18 13:37:37 -0500, Tom Smith wrote: > it is for reducing index size as the table become huge. > sorry for confusion, by timestamp, I meant a time series number, not the sql > timestamp type. > I need the unique on the column to ensure no duplicate, but the btree index > is getting > huge so BRIN seems to solve problem but can not ensure unique If it is getting huge, then this is because there are a large number of timestamps. If you want an index to ensure uniqueness, it will have to store every value. I don't think there's a way around that. With a BRIN index, you would only get a list of page ranges which could possibly contain the new value. All these pages would then have to be scanned sequentially to make sure it isn't already there. That could be implemented, but it would make inserts very slow - I don't think you would want that on a huge table even if postgres implemented it. hp signature.asc Description: Digital signature
Re: [GENERAL] strange sql behavior
On 2016-02-01 12:35:35 -0600, Yu Nie wrote: > Recently I am working with a large amount of taxis GIS data and had > encountered > some weird performance issues. I am hoping someone in this community can help > me figure it out. > > The taxi data were loaded in 5 minute block into a table. I have two separate > such tables, one stores a month of data with about 700 million rows, another > stores about 10 days of data with about 300 million rows. The two tables have > the exactly same schema and indexes. There are two indexes: one on taxiid > (text), and the other on the time stamp (date time). In order to process the > data, I need to get all points for a single taxis; to do that, I use something > like: > select * from table1 where taxiid = 'SZB00S41' order by time; > What puzzled me greatly is that this query runs consistently much faster for > the large table than for the small table, which seems to contradict with > intuition. [...] > Results for the small table: it took 141 seconds to finish. The planning time > is 85256.31 > > "Sort (cost=85201.05..85256.31 rows=22101 width=55) (actual time= > 141419.499..141420.025 rows=20288 loops=1)" > " Sort Key: "time"" > " Sort Method: quicksort Memory: 3622kB" > " Buffers: shared hit=92 read=19816" > " -> Bitmap Heap Scan on data2013_01w (cost=515.86..83606.27 rows=22101 > width=55) (actual time=50.762..141374.777 rows=20288 loops=1)" > " Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)" > " Heap Blocks: exact=19826" > " Buffers: shared hit=92 read=19816" ^^ [...] > > Results for the large table: it took 5 seconds to finish. The planning time > is > 252077.10 > "Sort (cost=251913.32..252077.10 rows=65512 width=55) (actual time= > 5038.571..5039.765 rows=44204 loops=1)" > " Sort Key: "time"" > " Sort Method: quicksort Memory: 7753kB" > " Buffers: shared hit=2 read=7543" > " -> Bitmap Heap Scan on data2011_01 (cost=1520.29..246672.53 rows=65512 > width=55) (actual time=36.935..5017.463 rows=44204 loops=1)" > " Recheck Cond: ((taxiid)::text = 'SZB00S41'::text)" > " Heap Blocks: exact=7372" > " Buffers: shared hit=2 read=7543" ^ [] The obvious difference is that the query for the smaller table needs to read about 2.5 times as many blocks (for 1/3 of the records) from the disk. This suggests that the data for a single taxi is more localized in the larger table. In addition, the average time per block on the smaller table is about 7 ms, which is a typical random seek time for a disk. So the blocks are probably randomly scattered through the table. For the larger table, the average time is well below 1 ms, so there are probably many consecutive blocks to read. There are 2880 5 minute intervals in 10 days. You have about 22k records per taxi, so there are about 7.6 records for each taxi per interval. This is very close to the number of records per block in your second query (65512/7372 = 8.9). I suspect that the records in your larger table are sorted by taxiid within each interval. You can almost certainly get a similar speedup by sorting each 5 minute interval by taxi id before appending it to the table. If querying by taxiid is typical and your table is static, you should consider clustering the table by taxiid. If your table is updated every 5 minutes, you could partition it by day and cluster each partition as soon as it is not written any more. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 2015-12-03 10:02:18 -0500, Tom Lane wrote: > "Peter J. Holzer" <hjp-pg...@hjp.at> writes: > > Can those signals be safely ignored? Just blocking them (so that they > > are delivered after the UDF finishes) might be safer. But even that may > > be a problem: If the UDF then executes some SQL, could that rely on > > signals being delivered? I have no idea. > > The minute you start fooling with a backend's signal behavior, we're > going to politely refuse to support whatever breakage you run into. As I understood Jim he was talking about possible changes to postgresql to shield UDFs from those signals, not something the author of a UDF should do. > We aren't sending those signals just for amusement's sake. Right. That's why I was sceptical whether those signals could be ignored. I wouldn't have thought so, but Jim clearly knows a lot more about the inner workings of postgresql than I do (which is easy - I know almost nothing) and maybe he knows of a way (something like "we can ignore signals while executing the UDF and just assume that we missed at least one signal and call the magic synchronize state function afterwards") hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 2015-12-02 19:07:55 -0600, Jim Nasby wrote: > On 12/2/15 9:26 AM, Peter J. Holzer wrote: > >As explained in backend/utils/misc/timeout.c, the timers are never > >cancelled: If a timeout is cancelled, postgres just sees that it has > >nothing to do and resumes whatever it is doing. > > Hrm, if those timers are really just for auth purposes then perhaps they > should be cancelled. But aside from that, there's certainly other things > that can signal a backend (including fairly normal things, like DDL). Yep. I noticed that, too. In one of my test runs I got two signals instead of the one I expected. Checking the logs I found that it seemed be caused by another user dropping a table. > Offhand I don't think functions run in a CRITICAL block (and I don't think > it'd be a good idea for them to). So really, functions have to be handle > being interrupted. Right. I think that should be mentioned somewhere in the manual. Something like this: Note: PostgreSQL uses signals for various purposes. These signals may be delivered while a user-defined function is executed. Therefore user-defined functions must be able to handle being interrupted, in particular they must expect system calls to fail with errno=EINTR and handle that case appropriately. I'm not sure wether that's an issue with all procedural languages. If it is, it should probable go into "Chapter 39. Procedural Languages". If it is specific to plperl(u), I would put it in "42.8.2. Limitations and Missing Features". > Yeah, it'd be nice to detect that this had happened. Or maybe it's worth it > to ignore SIGALARM while a UDF is running. Can those signals be safely ignored? Just blocking them (so that they are delivered after the UDF finishes) might be safer. But even that may be a problem: If the UDF then executes some SQL, could that rely on signals being delivered? I have no idea. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 2015-12-01 20:55:02 +0100, Peter J. Holzer wrote: > On 2015-12-01 18:58:31 +0100, Peter J. Holzer wrote: > > I suspect such an interaction because I cannot reproduce the problem > > outside of a stored procedure. A standalone Perl script doing the same > > requests doesn't get a timeout. [...] > The strace doesn't show a reason for the SIGALRM, though. No alarm(2) or > setitimer(2) system call (I connected strace to a running postgres > process just after I got the prompt from "psql" and before I typed > "select * from mb_search('export');" (I used a different (but very > similar) stored procedure for those tests because it is much easier to > find a search which is slow enough to trigger a timeout at least > sometimes than a data request (which normally finishes in > milliseconds)). > > So I guess my next task will be to find out where that SIGALRM comes > from and/or whether I can just restart the zmq_msg_recv if it happens. Ok, I think I know where that SIGALRM comes from: It's the AuthenticationTimeout. What I'm seeing in strace (if I attach it early enough) is that during authentication the postgres worker process calls setitimer with a 60 second timeout twice. This matches the comment in backend/postmaster/postmaster.c: * Note: AuthenticationTimeout is applied here while waiting for the * startup packet, and then again in InitPostgres for the duration of any * authentication operations. So a hostile client could tie up the * process for nearly twice AuthenticationTimeout before we kick him off. As explained in backend/utils/misc/timeout.c, the timers are never cancelled: If a timeout is cancelled, postgres just sees that it has nothing to do and resumes whatever it is doing. This is also what I'm seeing: 60 seconds after start, the process receives a SIGALRM. If the process is idle or in a "normal" SQL statement at the time, thats not a problem. But if it is in one of my stored procedures which is currently calling a ØMQ function which is waiting for some I/O (zmq_msg_recv(), most likely), that gets interrupted and returns an error which my code doesn't know how to handle (yet). So the error gets back to the user. A strange interaction between postgres and ØMQ indeed. But now that I know what's causing it I can handle that. Thanks for your patience. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 2015-12-01 13:13:27 -0500, Tom Lane wrote: > "Peter J. Holzer" <hjp-pg...@hjp.at> writes: > > Postgres worker processes are single-threaded, are they? Is there > > something else which could interact badly with a moderately complex > > multithreaded I/O library used from a stored procedure? > > Yes, lots. If you cause additional threads to appear inside a backend > process, things could break arbitrarily badly. It's up to you to ensure > that none of those extra threads ever escape to execute any non-Perl > code. Actually, non-�MQ code. Perl doesn't like to be unexpectedly multithreaded either. Yes, those threads should only ever execute code from the �MQ library. In fact they are automatically created and destroyed by the library and there is no way to control them from Perl code (there may be a way to do that from the C API, but I don't remember seeing that in the manual). > I suspect this could easily explain the problems you're seeing. Quite. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 2015-12-01 07:16:04 -0800, Adrian Klaver wrote: > On 12/01/2015 06:51 AM, Peter J. Holzer wrote: > >A rather weird observation from the log files of our server (9.5 beta1): > > > >2015-12-01 09:23:37 CET [26265]: [42-1] user=fragner,db=wds,pid=26265 > >WARNING: Use of uninitialized value $success in concatenation (.) or string > >at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. > >2015-12-01 09:23:37 CET [26265]: [43-1] user=fragner,db=wds,pid=26265 > >CONTEXT: PL/Perl function "mb_timeseriesdata_zmq" > >[lots of other stuff from different connections] > >2015-12-01 09:24:45 CET [26265]: [44-1] user=fragner,db=wds,pid=26265 ERROR: > > impossible result '' (payload=) at > >/usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. > > WDS::Macrobond::Utils::decode_result("") called at line 30 > > main::__ANON__("gen_wqehur") called at -e line 0 > > eval {...} called at -e line 0 > > > >Two messages from the same line of the same plperlu stored procedure, 68 > >seconds apart. So what is this line 36? > > > > confess "impossible result '$success' (payload=$payload)"; > > > >What? The first message clearly comes from interpolating $success > >(which is undef at that point) into the argument. The second from > >confess itself. What could cause a plperlu procedure to freeze for 68 > >seconds between the call to confess and its output? > > > >Is it possible that only the writing of the log entry is delayed? > > > >Another weird thing: $success is undef because a ØMQ rpc call[1] timed > > And the call is? The sequence is: my $req_sck = zmq_socket($context, ZMQ_REQ); zmq_connect($req_sck, $url); my $qry_msg = join(...); zmq_send($req_sck, $qry_msg); my $res_msg = zmq_msg_init(); my $rv = zmq_msg_recv($res_msg, $req_sck); # check rv here. my $data = zmq_msg_data($res_msg); # $data is "" here my $result = WDS::Macrobond::Utils::decode_result($data); # the error messages are from this function (Yeah, ØMQ is quite low-level. There is a higher level Perl Module, but I'm not using it). I omitted that because I don't think it's terribly relevant here. Details of the usage of ØMQ are better discussed on the ØMQ mailing list. But there is something else which may be relevant: ØMQ uses threads internally, and I don't actually know whether zmq_msg_recv returning means that the read(2) call (or whatever) on the socket terminates. It may actually continue in another thread. But I still don't see how that could block the main thread (or wake it up again in a place which has nothing to do with ØMQ (confess is a standard Perl function to print a stack trace and die)). Or - just thinking aloud here - I fear I'm abusing you guys as support teddy bears[1] - maybe it's the other way round: confess dies, so maybe it frees some lock during cleanup which allows the message which should have been sent by zmq_send to finally go out on the wire. But that still doesn't explain the 68 seconds spent in confess ... Postgres worker processes are single-threaded, are they? Is there something else which could interact badly with a moderately complex multithreaded I/O library used from a stored procedure? I suspect such an interaction because I cannot reproduce the problem outside of a stored procedure. A standalone Perl script doing the same requests doesn't get a timeout. I guess Alvaro is right: I should strace the postgres worker process while it executes the stored procedure. The problem of course is that it happens often enough be annoying, but rarely enough that it's not easily reproducible. > >out (after 60 seconds, which is also a mystery, because ØMQ doesn't seem > >to have a default timeout of 60 seconds, and I don't set one). But at > > Network timeout? That was my first guess, but I don't see where it would come from. Or why it only is there if I call the code from a stored procedure, not from a standalone script. > >09:24:45 (i.e. the time of the error message) the answer for that RPC > >call arrived. So it kind of looks like confess waited for the message to > >arrive (which makes no sense at all) or maybe that confess waited for > >something which also blocked the sending of the request (because > >according to the server logs, the RPC request only arrived there at > >09:24:45 and was answered within 1 second), but that doesn't make any > > So if the request timed out how did you get a reply, a second request? Nope. I don't really "get" the reply. I just see in the logs of the other server that it sent a reply at that time. The time line is like this timepostgres processmb_dal process T zmq_send() zmq_msg_recv()
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 2015-12-01 10:20:09 -0800, Adrian Klaver wrote: > On 12/01/2015 09:58 AM, Peter J. Holzer wrote: > >On 2015-12-01 07:16:04 -0800, Adrian Klaver wrote: > >>On 12/01/2015 06:51 AM, Peter J. Holzer wrote: > >>>A rather weird observation from the log files of our server (9.5 beta1): > >>> > >>>2015-12-01 09:23:37 CET [26265]: [42-1] user=fragner,db=wds,pid=26265 > >>>WARNING: Use of uninitialized value $success in concatenation (.) or > >>>string at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. > >>>2015-12-01 09:23:37 CET [26265]: [43-1] user=fragner,db=wds,pid=26265 > >>>CONTEXT: PL/Perl function "mb_timeseriesdata_zmq" > >>>[lots of other stuff from different connections] > >>>2015-12-01 09:24:45 CET [26265]: [44-1] user=fragner,db=wds,pid=26265 > >>>ERROR: impossible result '' (payload=) at > >>>/usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. > >>> WDS::Macrobond::Utils::decode_result("") called at line 30 > >>> main::__ANON__("gen_wqehur") called at -e line 0 > >>> eval {...} called at -e line 0 > >>> > >>>Two messages from the same line of the same plperlu stored procedure, 68 > >>>seconds apart. So what is this line 36? > >>> > >>> confess "impossible result '$success' (payload=$payload)"; > >>> > >>>What? The first message clearly comes from interpolating $success > >>>(which is undef at that point) into the argument. The second from > >>>confess itself. What could cause a plperlu procedure to freeze for 68 > >>>seconds between the call to confess and its output? > >>> > >>>Is it possible that only the writing of the log entry is delayed? > >>> > >>>Another weird thing: $success is undef because a ØMQ rpc call[1] timed > >> > >>And the call is? > > > >The sequence is: > > > >my $req_sck = zmq_socket($context, ZMQ_REQ); > >zmq_connect($req_sck, $url); > >my $qry_msg = join(...); > >zmq_send($req_sck, $qry_msg); > >my $res_msg = zmq_msg_init(); > >my $rv = zmq_msg_recv($res_msg, $req_sck); > ># check rv here. > >my $data = zmq_msg_data($res_msg); # $data is "" here > >my $result = WDS::Macrobond::Utils::decode_result($data); # the error > >messages are from this function > > > >(Yeah, ØMQ is quite low-level. There is a higher level Perl Module, but > >I'm not using it). > > > >I omitted that because I don't think it's terribly relevant here. > >Details of the usage of ØMQ are better discussed on the ØMQ mailing > >list. > > > >But there is something else which may be relevant: ØMQ uses threads > >internally, and I don't actually know whether zmq_msg_recv returning > > Except I see this here: > > http://api.zeromq.org/4-0:zmq-socket > > Thread safety > > ØMQ sockets are not thread safe. Applications MUST NOT use a socket from > multiple threads except after migrating a socket from one thread to another > with a "full fence" memory barrier. Well yes, but I don't use a ØMQ socket in multiple threads, It is created in the stored procedure and destroyed at the end (just checked the strace output: Yes it is. For a moment I wasn't sure whether lexical variables in plperlu procedures go out of scope.). It's the ØMQ library itself which creates extra threads (And it should terminate them properly and afaics from strace it does). [...] > From here: > > http://api.zeromq.org/4-0:zmq-connect > > It seems something like(I am not a Perl programmer, so approach carefully): > > my $rc = zmq_connect($req_sck, $url); > > Then you will have an error code to examine. Yes. I have been a bit sloppy with error checking. I check only the return value of the zmq_msg_recv() call which returns the empty message. It is possible that the problem actually occurs earlier and I should check those calls as well. Mea culpa. However, in at least one case the failed call was indeed zmq_msg_recv() not one of the earlier ones (see my next mail). > Have you looked at the Notes at the bottom of this page: > > http://www.postgresql.org/docs/9.4/interactive/plperl-trusted.html I have. I don't think that's a problem here: Debian perl is built with with both multiplicity and ithreads, and I would assume that the .deb packages from postgresql.org use the shared library provided by the system. But even if that wasn't the case it should not be a problem as only plperlu stored procedures are called. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] plperlu stored procedure seems to freeze for a minute
On 2015-12-01 18:58:31 +0100, Peter J. Holzer wrote: > I suspect such an interaction because I cannot reproduce the problem > outside of a stored procedure. A standalone Perl script doing the same > requests doesn't get a timeout. > > I guess Alvaro is right: I should strace the postgres worker process > while it executes the stored procedure. The problem of course is that > it happens often enough be annoying, but rarely enough that it's not > easily reproducible. I did manage to catch a timeout once with strace in the mean time, although that one was much more straightforward and less mysterious than the original case: postgres process sends message, about 10 seconds later it receives a SIGALRM which interrupts an epoll, reply hasn't yet arrived, error message to client and log file. No waits in functions which shouldn't wait or messages which arrive much later than they were (presumably) sent. The strace doesn't show a reason for the SIGALRM, though. No alarm(2) or setitimer(2) system call (I connected strace to a running postgres process just after I got the prompt from "psql" and before I typed "select * from mb_search('export');" (I used a different (but very similar) stored procedure for those tests because it is much easier to find a search which is slow enough to trigger a timeout at least sometimes than a data request (which normally finishes in milliseconds)). So I guess my next task will be to find out where that SIGALRM comes from and/or whether I can just restart the zmq_msg_recv if it happens. hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
[GENERAL] plperlu stored procedure seems to freeze for a minute
A rather weird observation from the log files of our server (9.5 beta1): 2015-12-01 09:23:37 CET [26265]: [42-1] user=fragner,db=wds,pid=26265 WARNING: Use of uninitialized value $success in concatenation (.) or string at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. 2015-12-01 09:23:37 CET [26265]: [43-1] user=fragner,db=wds,pid=26265 CONTEXT: PL/Perl function "mb_timeseriesdata_zmq" [lots of other stuff from different connections] 2015-12-01 09:24:45 CET [26265]: [44-1] user=fragner,db=wds,pid=26265 ERROR: impossible result '' (payload=) at /usr/local/share/perl/5.20.2/WDS/Macrobond/Utils.pm line 36. WDS::Macrobond::Utils::decode_result("") called at line 30 main::__ANON__("gen_wqehur") called at -e line 0 eval {...} called at -e line 0 Two messages from the same line of the same plperlu stored procedure, 68 seconds apart. So what is this line 36? confess "impossible result '$success' (payload=$payload)"; What? The first message clearly comes from interpolating $success (which is undef at that point) into the argument. The second from confess itself. What could cause a plperlu procedure to freeze for 68 seconds between the call to confess and its output? Is it possible that only the writing of the log entry is delayed? Another weird thing: $success is undef because a ØMQ rpc call[1] timed out (after 60 seconds, which is also a mystery, because ØMQ doesn't seem to have a default timeout of 60 seconds, and I don't set one). But at 09:24:45 (i.e. the time of the error message) the answer for that RPC call arrived. So it kind of looks like confess waited for the message to arrive (which makes no sense at all) or maybe that confess waited for something which also blocked the sending of the request (because according to the server logs, the RPC request only arrived there at 09:24:45 and was answered within 1 second), but that doesn't make any sense either. (Just noticed that 60 + 68 == 128, which is also a round number). hp [1] ØMQ is an IPC framework: See http://zeromq.org/ We use it to make RPC calls from stored procedures to a server process. -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
[GENERAL] Upgrade from postgresql-9.5alpha1 to postgresql-9.5alpha2 (Debian packages)
It looks like the catalog version has changed between 9.5alpha1 and 9.5alpha2: FATAL: database files are incompatible with server DETAIL: The database cluster was initialized with CATALOG_VERSION_NO 201506282, but the server was compiled with CATALOG_VERSION_NO 201507281. HINT: It looks like you need to initdb. Normally, one would use pg_upgradecluster to do the upgrade, but pg_upgradecluster assumes that the old and new version are installed in parallel. Likewise, the low-level tool pg_upgrade needs the old bindir, if I read the man-page correctly, and of course, apt-get upgrade overwrites that, since it's just two versions of the same package (unlike a major upgrade which is a new package). So, what's the best way to do the upgrade? * Copy the bindir before the upgrade (or restore from backup) to a safe place and do pg_upgrade? * Initdb a new cluster and restore yesterdays backup? * Something else? hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] Upgrade from postgresql-9.5alpha1 to postgresql-9.5alpha2 (Debian packages)
On 2015-08-18 20:40:10 +0900, Masahiko Sawada wrote: On Tue, Aug 18, 2015 at 6:06 PM, Peter J. Holzer hjp-pg...@hjp.at wrote: It looks like the catalog version has changed between 9.5alpha1 and 9.5alpha2: [...] So, what's the best way to do the upgrade? * Copy the bindir before the upgrade (or restore from backup) to a safe place and do pg_upgrade? * Initdb a new cluster and restore yesterdays backup? * Something else? I've not used pg_upgrade at such case, but If you have a enough time to do upgrading, I will suggest to take backup(pg_dumpall) from current cluster and then restore it to new cluster. Of course you would have to make a backup before the upgrade to restore it afterwards. I could of course have forcibly downgraded to alpha1 again and made a new backup, but since this is a test system I just decided to drop and recreate the cluster and restore yesterday's backup. (One of my colleagues won't be too pleased about that, I think) I think pg_upgrade is basically used at major version upgrading. This was basically a major version upgrade, the problem was that it wasn't reflected in the package/directory structure (normally the postgres debian packages are quite fastidious in separating everything so that you have both an old and a new installation in the places where pg_upgradecluster expects them), and that I didn't expect it (the possibility of catalog version changes from one alpha release to the next was discussed before the release of alpha1, but I somehow classified that as theoretically possible but not likely - my fault), and finally that I don't really understand the finer points of pg_upgrade (I managed to use it in a similar situation some time ago, but I had to read the source code of pg_upgradecluster (and I think I even single-stepped through it in the debugger) to figure out the parameters and unfortunately I didn't take notes). No big harm done (alpha software on a test system - I expect things to blow up once in a while), but maybe the person preparing the alpha releases can figure out how to make the upgrade smoother. At least a warning in the release announcement would be nice (wouldn't have helped me as I have to admit that I read that only after I upgraded, but it would help those who do things in the right order ;-) ). hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] Queries for unused/useless indexes
On 2015-05-22 09:41:57 -0400, Melvin Davidson wrote: I'd like to share those queries with the community, as I know there must be others out there with the same problem. /* useless_indexes.sql */ SELECT idstat.schemaname AS schema, idstat.relname AS table_name, indexrelname AS index_name, idstat.idx_scan AS times_used, pg_size_pretty(pg_relation_size(quote_ident(idstat.schemaname) || '.' || quote_ident(idstat.relname))) AS table_size, pg_size_pretty(pg_relation_size(quote_ident(idstat.schemaname) || '.' || quote_ident(indexrelname))) AS index_size, n_tup_upd + n_tup_ins + n_tup_del as num_writes, indexdef AS definition FROM pg_stat_user_indexes AS idstat JOIN pg_indexes ON indexrelname = indexname JOIN pg_stat_user_tables AS tabstat ON idstat.relname = tabstat.relname WHERE idstat.idx_scan 200 AND indexdef !~* 'unique' ORDER BY idstat.schemaname, idstat.relname, indexrelname; Thanks, that's useful. However, it doesn't quite work if there are indexes with the same name in different schemas. Better join on the schemaname, too: FROM pg_stat_user_indexes AS idstat JOIN pg_indexes AS idx ON indexrelname = indexname and idstat.schemaname = idx.schemaname JOIN pg_stat_user_tables AS tabstat ON idstat.relname = tabstat.relname and idstat.schemaname = tabstat.schemaname (for some reason that makes it a lot slower, though) hp -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature
Re: [GENERAL] Queries for unused/useless indexes
On 2015-05-25 12:25:01 -0400, Melvin Davidson wrote: I'm not sure why you are using pg_stat_user_indexes. Because you did. I didn't change that. My original query below uses pg_stat_all_indexes and the schema names are joined and it does work. I'm not sure what you mean by original, but this: SELECT n.nspname as schema, i.relname as table, i.indexrelname as index, i.idx_scan, i.idx_tup_read, i.idx_tup_fetch, pg_size_pretty(pg_relation_size(quote_ident(n.nspname) || '.' || quote_ident(i.relname))) AS table_size, pg_size_pretty(pg_relation_size(quote_ident(n.nspname) || '.' || quote_ident(i.indexrelname))) AS index_size, pg_get_indexdef(idx.indexrelid) as idx_definition FROM pg_stat_all_indexes i JOIN pg_class c ON (c.oid = i.relid) JOIN pg_namespace n ON (n.oid = c.relnamespace) JOIN pg_index idx ON (idx.indexrelid = i.indexrelid ) WHERE i.idx_scan 200 AND NOT idx.indisprimary AND NOT idx.indisunique ORDER BY 1, 2, 3; is not the query you posted in your original message. Here is what you posted: On Mon, May 25, 2015 at 10:41 AM, Peter J. Holzer hjp-pg...@hjp.at wrote: On 2015-05-22 09:41:57 -0400, Melvin Davidson wrote: I'd like to share those queries with the community, as I know there must be others out there with the same problem. /* useless_indexes.sql */ SELECT idstat.schemaname AS schema, idstat.relname AS table_name, indexrelname AS index_name, idstat.idx_scan AS times_used, pg_size_pretty(pg_relation_size(quote_ident(idstat.schemaname) || '.' || quote_ident(idstat.relname))) AS table_size, pg_size_pretty(pg_relation_size(quote_ident(idstat.schemaname) || '.' || quote_ident(indexrelname))) AS index_size, n_tup_upd + n_tup_ins + n_tup_del as num_writes, indexdef AS definition FROM pg_stat_user_indexes AS idstat JOIN pg_indexes ON indexrelname = indexname JOIN pg_stat_user_tables AS tabstat ON idstat.relname = tabstat.relname WHERE idstat.idx_scan 200 AND indexdef !~* 'unique' ORDER BY idstat.schemaname, idstat.relname, indexrelname; -- _ | Peter J. Holzer| I want to forget all about both belts and |_|_) || suspenders; instead, I want to buy pants | | | h...@hjp.at | that actually fit. __/ | http://www.hjp.at/ | -- http://noncombatant.org/ signature.asc Description: Digital signature