Re: easy way to acquire height / width from images (PNG, JPEG) stored as bytea?
On 17/04/2020 13:37, Achilleas Mantzios wrote: Hello Dear List, we have a table holding email attachments as bytea, and we would like to filter out images of small dimensions, which are not of any value to our logic. I took a look at pg_image extension, tested it, and it proved problematic, it killed my 200+ days uptime FreeBSD box :( . I dropped the extension and uninstalled this as soon as fsck finally finished. If running an extension crashed your server you should look at how / why, especially if it corrupted your filesystem. That shouldn't happen on a correctly configured system, so the underlying issue might cause you other problems. Crashing postgresql, sure, but not anything that impacts the rest of the server. Cheers, Steve
Re: Using unlogged tables for web sessions
On 16/04/2020 19:39, Stephen Carboni wrote: Hello. I was wondering if anyone was using unlogged tables for website sessions in production. I'm interested if it breaks the prevailing opinion that you don't put sessions in PG. I generally put sessions in postgresql, with regular tables, when I'm using persistent server-side sessions, rather than just stashing all the user data in a signed cookie or local storage. It ... works fine? It drastically simplifies app deployment to have a single point of persistent storage, and having one that you can easily interrogate by hand simplifies development and debugging. Reads are a single indexed query, writes are typically HOT, so the IO and overhead aren't drastically different from any other persistent store. A lot of webapp development advice is based on "Ah, but how will you scale it up to Facebook levels of traffic?" and then skipping over the answer "Um, I won't ever need to. And if I did I'd have enough revenue to hire someone very, very good to rearchitect my session storage.". Cheers, Steve
Re: Rules versus triggers
On 07/03/2020 11:56, stan wrote: Could someone give me a brief description of the intended functionally, and how the 2 features work of rules, versus triggers? It appears to me that they are simply 2 different ways to set up triggers, but I am certain that is just because of my lack of knowledge. The pages https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use_rules links to will give you some idea of the differences. Cheers, Steve
Re: Fwd: sensible configuration of max_connections
On 07/02/2020 13:18, Chris Withers wrote: On 07/02/2020 12:49, Chris Ellis wrote: What's "too much" for max_connections? What happens when you set it to high? What factors affect that number? When sizing max_connections you need to trade off how many connections your application will use at peak vs how much RAM and CPU you have. Each connection is capable of allocating work_mem and has a stack etc. As such you don't want max_connections to be able to run your system out of RAM. Sure, but that's where I'm trying to find out what's sensible. The box has 196GB memory, most of that in hugepages, 18 core Intel Skylake with HT on giving 36 cores and tonnes of SSD for storage. How would I turn that spec into a sensible number for max_connections? As that number grows, what contention points in postgres will start creaking (shared memory where the IPC happens?) The max_connections setting is an upper limit after which postgresql will reject connections. You don't really want to hit that limit, rather you want to keep the number of concurrent connections to a reasonable number (and have max_connections somewhere above that). Each connection is a postgresql process, so active connections are competing for resources and even idle connections take up some RAM. Creating a new connection is launching a new process (and doing some setup) so it's relatively expensive. Doing some sort of connection pooling is a good idea, especially for web-apps that connect, do a few short queries and disconnect. Django is probably doing a passable job at pooling already, so you might want to see how many connections it's using under normal load. Adding a dedicated pooler in between Django and PostgreSQL would give you more flexibility and might be a good idea, but if what Django is already doing is reasonable you may not need it. What's a good number of active connections to aim for? It probably depends on whether they tend to be CPU-bound or IO-bound, but I've seen the rule of thumb of "around twice the number of CPU cores" tossed around, and it's probably a decent place to start, then run it under normal load and see how it behaves - cpu usage, RAM, IO, request latency and throughput. Cheers, Steve
Re: slow insert speeds with bytea
On 01/12/2019 17:59, Alex O'Ree wrote: Is there anything I can to increase insert speeds for bytea? Currently running postgres 9.6.15 I have a few tables without a bytea and a few with bytea. There is a large performance difference with inserts between the two. I'm inserting a byte[] that's usually less than 1MB on content. The content itself is actually just utf8 string data. For the non-bytea table, inserts can be as high as 40k rows/sec, whereas the bytea table is closer to 4k/sec or less. 4k 1MB rows/sec would be 4GB a second. It would need to be a fairly decent IO system to manage that speed, let alone ten times that. What's the typical row size of the tables with bytea fields vs the other tables? What are your IO stats during the two sorts of insert? Cheers, Steve
Re: Remote Connection Help
On 21/11/2019 14:30, Ekaterina Amez wrote: El 21/11/19 a las 15:21, Jason L. Amerson escribió: I am at a loss for what to do. I have read article after article about how to allow remote connections on my PostgreSQL server and none of what the articles say do, worked for me. I have edited the “postgresql.conf” file and changed “listen_address = ‘localhost’ to listen_address = ‘*’. I think that's "listen_addresses" on recent versions of postgresql. Did you stop and restart the service after you edited the config file? Check the logs for errors too, maybe. I have even tried it commented out and uncommented and I get the same results. I also edited the “pg_hba.conf” file and added the following at the end of the file: host all all 0.0.0.0/0 md5 host all all ::/0 md5 The first line in pg_hba.conf that matches a connection will take effect; later lines won't. For testing connection purposes I'm used to change md5 to trust, this way you won't have troubles with passwords nor users. Don't do this on a machine that's reachable from the open internet, ever. It's asking to get your box compromised. My computer that is running the server is Ubuntu and it has a static IP. I am trying to connect remotely with computers running Windows 10 using the static IP. When I run pgAdmin from my Windows 10 machine, or use the command line to connect, I get the following error: unable to connect to server: could not connect to server: Connection refused (Ox274D/10061) Is the server running on host " xx.xx.xx.xx" and accepting TCP/IP connections on port 5432' I would like to be able to connect to my Ubuntu PostgreSQL server from all Windows 10 machines as well as from a client on my Android phone while away from home. That is my goal now. I am hoping that someone can help me to get this working. This is been very frustrating. Can you connect to your server on it's external address at all? i.e. if it's external IP address is 10.11.12.13, can you run "psql -h 10.11.12.13" on your ubuntu box and connect / log in? If you can then postgresql is configured correctly and you can focus on where the issue on the network is. If not, then the problem is the local machine, either postgresql configuration or _maybe_ local network configuration silliness. Cheers, Steve
Re: jsonb_set() strictness considered harmful to data
On 21/10/2019 17:39, Steven Pousty wrote: Turning a JSON null into a SQL null and thereby "deleting" the data is not the path of least surprises. In what situation does that happen? (If it's already been mentioned I missed it, long thread, sorry). Cheers, Steve
Re: jsonb_set() strictness considered harmful to data
On 19/10/2019 07:52, Ariadne Conill wrote: I would say that any thing like update whatever set column=jsonb_set(column, '{foo}', NULL) should throw an exception. It should do, literally, *anything* else but blank that column. steve=# create table foo (bar jsonb not null); CREATE TABLE steve=# insert into foo (bar) values ('{"a":"b"}'); INSERT 0 1 steve=# update foo set bar = jsonb_set(bar, '{foo}', NULL); ERROR: null value in column "bar" violates not-null constraint DETAIL: Failing row contains (null). steve=# update foo set bar = jsonb_set(bar, '{foo}', 'null'::jsonb); UPDATE 1 I don't see any behaviour that's particularly surprising there? Though I understand how an app developer who's light on SQL might get it wrong - and I've made similar mistakes in schema upgrade scripts without involving jsonb. Cheers, Steve
Re: PostgreSQL License
On 19/09/2019 13:48, Steve Litt wrote: My understanding is the PostgreSQL license is more like the MIT license, which actually allows one to modify the code and claim it as proprietary. You could do that, yes. :) https://wiki.postgresql.org/wiki/PostgreSQL_derived_databases Cheers, Steve
Re: Work hours?
> On Aug 27, 2019, at 11:27 PM, stan wrote: > > I am just starting to explore the power of PostgreSQL's time and date > functionality. I must say they seem very powerful. > > I need to write a function that, given a month, and a year as input returns > the "work hours" in that month. In other words something like > > 8 * the count of all days in the range Monday to Friday) within that > calendar month. > > Any thoughts as to the best way to approach this? You might find this useful: https://gist.github.com/wttw/b6f5d0d67c31d499c05f22a4f2b6f628 It's not the most efficient approach, but it's relatively simple to customize. Cheers, Steve
Re: Importing from CSV, auto creating table?
On 21/08/2019 22:15, stan wrote: I have a situation where we need to import data, as an interim measure, from spreadsheets. I have read up on \copy and COPY, but I do not see that either of these can use the header from a CSV file to define a new table. Am I missing something? Also downloaded something called pgfutter, which I thought would do this, but have not had any success with this. After I (thought) I had figured out the arguments, it just seams to hag forever. You might find https://github.com/wttw/csvpg useful. It creates tables with column names based on the CSV header, and data types intuited from the data. (The only pre-built binary there is for Mac right now; I should at least add one for Windows). Cheers, Steve
Re: DRY up GUI wiki pages
> On Jul 10, 2019, at 7:38 PM, Bruce Momjian wrote: > > On Wed, Jul 10, 2019 at 09:08:30AM -0600, Roger Pack wrote: >> I learned today there are "two" wiki pages for GUI clients: >> >> https://wiki.postgresql.org/wiki/PostgreSQL_Clients >> >> https://wiki.postgresql.org/wiki/Community_Guide_to_PostgreSQL_GUI_Tools >> >> I'd like to DRY them up so there aren't two lists which confuses >> newcomers. Any objections? If not I'll probably make one of those >> pages into GUI's and one into "non GUI's" or something like that. I created the first one because the second one was full of old, stale, useless things. I believe that everything valid on the second one was added to the first one at the time. Also look at https://wiki.postgresql.org/wiki/Design_Tools and https://wiki.postgresql.org/wiki/Documentation_Tools if you're looking to combine / divide things. Cheers, Steve > > Agreed, a cleanup would be nice. :-) > > -- > Bruce Momjian http://momjian.us > EnterpriseDB http://enterprisedb.com > > + As you are, so once was I. As I am, so you will be. + > + Ancient Roman grave inscription + > >
Re: Research on ?? operators
> On Jun 4, 2019, at 1:35 PM, Michael Lewis wrote: > > ":foo" named placeholders > > If I may, is this supported natively in Postgres prepared statements? It's not. The only type we support are numbered $1 type placeholders. > Can I see an example? I do not care much for the numbers of positional > placeholders and would love to use names instead if possible. It'd be nice. They're supported via rewriting at the driver level in some drivers, and I've written shims to convert them in an app a few times and it makes for much more readable - and bug-resistant - code. Supporting it at the protocol level would be very nice. Cheers, Steve
Re: Research on ?? operators
> On Jun 4, 2019, at 10:00 AM, Matteo Beccati wrote: > > Hello generals, > > I'm trying to resurrect a change to the PHP PDO driver to allow using > the "?" json operator (and other operators containing a question mark), > without it being interpreted as a placeholder for a query argument. In > order to do so, like Hibernate, I went for the double "??" escaping: > > https://wiki.php.net/rfc/pdo_escape_placeholders > > One question that I'm supposed to answer now is: is there any known > usage in the wild of some custom "??" operator that would require funny > escaping like ""? I don't know of one today, but that doesn't mean there isn't or won't be tomorrow. > I've tried to search pgxn but couldn't find any match, so I thought it > was worth to try and ask here. Doubling "?" to "??" isn't an obvious, intuitive way to do that, at least to me. Maybe it would be to people coming from Java. Perl's DBD::Pg deals with it in two ways. One is to allow escaping the ? with a backslash - so "?" is a placeholder, while "\?" is passed as "?" to the database. That's more consistent with other languages, and I think you're far less likely to ever see a backslash in a custom operator than "??". The other is that it supports the postgresql standard placeholders - $1, $2 etc. - which avoid the issue and are more flexible. It also has a configuration option to completely ignore "?" in queries, so "$1" is a placeholder and "?" is passed through to the database. (Though I like the ":foo" or "@foo" style named placeholders best) Cheers, Steve
Re: schema change tracking
> On May 16, 2019, at 5:41 PM, Benedict Holland > wrote: > > Hi All, > > I am fairly this question has many answers but here it goes: > > I need a tool that can track schema changes in a postgesql database, write > scripts to alter the tables, and store those changes in git. Are there tools > that exist that can do this? If you're looking for something that'll reverse engineer schema change scripts from a database someone has made ad-hoc changes to ... try and avoid doing that if at all possible. Use the change scripts in git as the source of truth. If you do that there are many answers, starting at "it's fairly easy to do yourself, with a simple schema version and upgrade / downgrade scripts". Depending on what language you're writing your app in there may be libraries that can help. But if you're looking for something standalone, look at https://sqitch.org Cheers, Steve
Re: SQL queries not matching on certain fields
> On Apr 3, 2019, at 2:06 PM, Felix Ableitner wrote: > > Hello, > > I'm having a very strange problem with the Postgres database for my website. > Some SQL queries are not matching on certain fields. I am running these > commands via the psql command. > > Here is a query that works as expected: > > # SELECT id, "preferredUsername" FROM actor WHERE "preferredUsername"='emma'; > id | preferredUsername > ---+--- > 48952 | emma > 58672 | emma > (2 rows) > > The following query should work as well, because the username exists. But in > fact, it consistently returns nothing: > > # SELECT id, "preferredUsername" FROM actor WHERE > "preferredUsername"='mailab'; > id | preferredUsername > +--- > > (0 rows) > > There are some workarounds which fix the WHERE statement, all of the > following work as expected: > > SELECT id, "preferredUsername" FROM actor WHERE > trim("preferredUsername")='mailab'; > > > > SELECT id, "preferredUsername" FROM actor WHERE "preferredUsername" ILIKE > 'mailab'; > > SELECT id, "preferredUsername" FROM actor WHERE > md5("preferredUsername")=md5('mailab'); > > > Now you might think that there is something wrong with the encoding, or the > field contains some extra whitespace. But SHOW SERVER_ENCODING and SHOW > SERVER_ENCODING show UTF8. Length and m5 sum are also exactly as expected. > And I checked the individual bytes with get_byte(), all of them are in the > range 97-122. > > About the setup, this is Postgres 10 running in Docker, on an Ubuntu VPS (see > below for all versions etc). I had this problem before on the same setup, so > I did an export to text file with pg_dump, and imported into a completely new > database with psql. That fixed the problem for a few days, but it came back > soon after. > > The problem only seems to affect one or two specific columns, and only a few > specific rows in those columns. Most other rows work normally. Affected > columns also randomly start working again after a few days, and other columns > get affected. I havent noticed any kind of pattern. > > You can find the table definition here: https://gitlab.com/snippets/1840320 You can use "explain" to see what plan is being used for the query, but I'm betting that it's using an index on preferredUsername. Your workarounds won't use that index, they'll scan the table. If that index is corrupted it could cause the symptoms you're seeing. You can use "reindex" to rebuild the index from scratch and see if it fixes it but corrupted indexes aren't normal, and the issue seems to be recurring. On physical hardware I'd be wondering about filesystem corruption and taking a good look at my system logs. On someone else's VPS you don't have the same visibility, but I'd still check the system logs for issues. Cheers, Steve > > Version info: > > Postgres Docker Image: postgres:10.7-alpine > Docker version: 18.09.2 > OS: Ubuntu 18.04.2 > > Please tell me if you have any idea how to fix or debug this. I already asked > multiple people, and no one has a clue what is going on. > > Best, > Felix Ableitner >
Re: Case Insensitive
> On Mar 28, 2019, at 9:08 AM, Ron wrote: > > On 3/28/19 3:33 AM, Steve Atkins wrote: >> >>> On Mar 28, 2019, at 8:29 AM, Ron wrote: >>> >>> On 3/28/19 3:23 AM, Sameer Kumar wrote: >>> [snip] >>>> You can write a query with upper function: >>>> >>>> select * from emp where upper(ename)=upper('aaa'); >>> That's a guaranteed table scan. >> Unless you have an index on upper(ename). > > Are you sure? I thought the lpart had to be immutable for the query > optimizer to decide to use an index (and upper(ename) is mutable). Yeah. Case insensitive searches like this are pretty much the first example given for why you might want to use an expression index. The expression in an expression index has to be immutable, but upper() is - it will always give the same output for a given input. (For values of "always" that probably depend on not performing major surgery on collations, but that falls into the "lie to the planner, get rotten results" category). Check "\df+ upper" Cheers, Steve
Re: Case Insensitive
> On Mar 28, 2019, at 8:29 AM, Ron wrote: > > On 3/28/19 3:23 AM, Sameer Kumar wrote: > [snip] >> You can write a query with upper function: >> >> select * from emp where upper(ename)=upper('aaa'); > > That's a guaranteed table scan. Unless you have an index on upper(ename). Cheers, Steve
Re: Forks of pgadmin3?
> On Mar 22, 2019, at 10:56 AM, Christian Henz > wrote: > > I know I'm late to the party, but we're only now migrating from > Postgres 9.x, realizing that pgadmin3 does not support Postgres 11. > > I have checked out pgadmin4, but I don't like it at all. My colleagues > feel the same way, and some web searching suggests that we are not > alone. > > So I wonder if there are any active forks of pgadmin3? There's the BigSQL fork, which had at least some minimal support for 10. I've no idea whether it's had / needs anything for 11. > > I found some on Github with some significant changes that I assume > were done by people working for VK, the Russian social network. These > appear to be personal hacks though (monosyllabic commit messages, build > scripts added with hard coded local paths etc.). > > There are also the Debian packages that have patches adding Postgres > 10 support among other things. Not sure if there would be interest > there in continuing to support newer Postgres versions. > > Are there other, more organized efforts to continue pgadmin3? > > Are there technical reasons why such a continuation would not make > sense? > It's significant work, and it'd be expended maintaining a fairly mediocre GUI client. You might see if you like OmniDB, or one of the other GUI clients, perhaps? https://wiki.postgresql.org/wiki/PostgreSQL_Clients Cheers, Steve > Cheers, > Christian > > -- > Christian Henz > Software Developer, software & vision Sarrazin GmbH & Co. KG > >
Re: Replication
> On Feb 26, 2019, at 9:41 AM, Sonam Sharma wrote: > > Hi, > > Can we do master to master replication in Postgres. Look at https://bucardo.org/Bucardo/ , BDR (e.g. https://www.2ndquadrant.com/en/resources/postgres-bdr-2ndquadrant/ ), RubyRep or any of the other multimaster solutions at https://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling#Replication . You probably don't actually need bidirectional master-master replication, though, and might want to look hard at other ways to do what you want. Cheers, Steve
Re: the installation of pgadmin4 makes me weep in frustration
> On Jan 14, 2019, at 7:58 AM, robert wrote: > > Hi There > > > > first, thanks a lot for the great an beautiful software we get from > PostgreSQL and all people around it. > > But I wonder how it comes, that installing pgadmin4 is so incredibly hard? > > And no documentation. > > I would like to install pgadmin4 to my ubuntu 18.1 laptop. > > Where do I find doku on how to install pgadmin4. Preferably I would like to > install it using pip? pgadmin 4 is nothing to do with the PostgreSQL project itself, it's just a third party client. There are many other third-party clients listed here - https://wiki.postgresql.org/wiki/PostgreSQL_Clients - most of them probably better than pgadmin4. Cheers, Steve
Re: incomplete startup packet messages in logs
> On Nov 1, 2018, at 1:27 PM, Vijaykumar Jain wrote: > > Hi Team, > > I know this is has been answered a lot on the internet wrt ignoring, > but i am unable to figure out why I get these messages in logs > "incomplete startup packet" Check which of your monitoring systems is configured to check postgresql liveness 15 times an hour. If you can't find it, and you care about the log entries, start sniffing packets to see where the traffic is coming from. > > tail -1000 /var/log/postgresql/postgresql-10-main.log| grep > 'incomplete startup packet' > > 2018-11-01 13:04:18 UTC LOG: incomplete startup packet > > 2018-11-01 13:08:18 UTC LOG: incomplete startup packet > > 2018-11-01 13:12:18 UTC LOG: incomplete startup packet > > > Note: exactly at 4 min interval. Cheers, Steve
Re: Pg_logical without subscription. Can we log the operations ?
> On Oct 9, 2018, at 11:45 PM, Dilshan wrote: > > Hi Team, > I am working on a requirement, where I need to log each and every operation > on the master db. So I was thinking if I could do with pg_logical by setting > my master db as publisher and setting a new db as subscriber and dropping the > subscription there after. I am wondering, how to get the operations that a > subscription would receive just into logs. Could you guide me on that? > Otherwise is there a possibility to receive all the operation without > dropping subscription and logging the details and deleting the subscription > tables to save space. I am planning to have logs rotated and purging logs > every month. Could you please guide me about the possibility of this approach? You can read the logical replication stream with clients other than postgresql, and you can use plugins to format it in different ways. https://wiki.postgresql.org/wiki/Logical_Decoding_Plugins has some of the third-party plugins to format the changes. There are libraries for most languages to consume the logical decoding stream, or the included client "pg_recvlogical" can be used to write it to disk. pg_recvlogical + wal2json might be the simplest way to do a basic audit trail. Cheers, Steve
Re: Code of Conduct plan
> On Sep 17, 2018, at 4:57 PM, Steve Litt wrote: > > On Mon, 17 Sep 2018 08:27:48 -0700 > "Joshua D. Drake" wrote: > >> >> At this point it is important to accept that the CoC is happening. We >> aren't going to stop that. The goal now is to insure a CoC that is >> equitable for all community members and that has appropriate >> accountability. At hand it appears that major concern is the CoC >> trying to be authoritative outside of community channels. As well as >> wording that is a bit far reaching. Specifically I think people's >> main concern is these two sentences: >> >> "To that end, we have established this Code of Conduct for community >> interaction and participation in the project’s work and the community >> at large. This Code is meant to cover all interaction between >> community members, whether or not it takes place within >> postgresql.org infrastructure, so long as there is not another Code >> of Conduct that takes precedence (such as a conference's Code of >> Conduct)." >> >> If we can constructively provide feedback about those two sentences, >> great (or constructive feedback on other areas of the CoC). If we >> can't then this thread needs to stop. It has become unproductive. >> >> My feedback is that those two sentences provide an overarching >> authority that .Org does not have the right to enforce and that it is >> also largely redundant because we allow that the idea that if another >> CoC exists, then ours doesn't apply. Well every single major >> collaboration channel we would be concerned with (including something >> like Blogger) has its own CoC within its Terms of use. That >> effectively neuters the PostgreSQL CoC within places like Slack, >> Facebook, Twitter etc... > > The perfect is the enemy of the good. Whatever CoC is decided upon, it > will be updated later. If it's easier, for now, to pass it with > enforcement WITHIN the Postgres community, why not do that? If, later > on, we get instances of people retaliating, in other venues, for > positions taken in Postgres, that can be handled when it comes up. I'll note that a fairly common situation with mailing lists I've seen is people taking an on-list disagreement off-list and being offensive there. I've not had that happen to me personally on the pgsql-* lists, but I have had it happen on other technical mailing lists. That harassment would be "outside of community channels". A CoC that doesn't cover that situation (or it's equivalent on IRC) isn't going to be particularly easy to apply. Whether the CoC can be applied or not isn't necessarily the most important thing about it - it's more a statement of beliefs - but if the situation comes up where someone is behaving unacceptably via IRC or email and "we" say that we aren't interested in helping, or our hands are tied, because "off-list" communication isn't covered by the CoC that's likely to lead to a loud and public mess. Cheers, Steve
Re: Question on postgresql.conf
> On Jul 31, 2018, at 10:59 AM, Dimitri Maziuk wrote: > > On 07/31/2018 12:33 PM, George Neuner wrote: > >> Is there some reason that postgresql.conf cannot be a link to your >> file? > > It's six of one, half a dozen of the other. The big problem is when the > network share is unavailable at host boot and all of your databases are > belong to bitbucket. If you want to do it right, you need chef/puppet as > already suggested. > > One useful thing linux (at least) has now is conf.d directories whose > contents are auto-included after the main config: this way you can keep > the vendor-provided /etc/blah.conf and have all your host-specific > settings in /etc/blah.d/myhost.conf, all living together happily ever > after. You still want the latter to be a local file that's guaranteed > available when blah service starts up. (Postgres devs: hint! hint!) You'd do that with "include_dir 'conf.d'" or similar, I think, which PostgreSQL has in all current versions. Cheers, Steve
Re: Load data from a csv file without using COPY
> On Jun 19, 2018, at 10:14 PM, Ravi Krishna wrote: > >> >> If performance is relevant then your app should probably be using COPY >> protocol, not line by line inserts. It's >> supported by most postgresql access libraries. If your app does that then >> using "\copy" from psql would be >> an appropriate benchmark. > > Actually the reluctance to not use COPY is to make the benchmark same across > two different RDBMS in > two diff env. That's something I'd only do if I intended to rig a benchmark between a RDBMS with good bulk import and one without. If that's not your goal, your approach doesn't seem to make sense and is unlikely to provide performance metrics that are useful or related to your app performance, unless you intend to hamstring your app in exactly the same way you're running the benchmark. Maybe use your app, or write ten minutes worth of code that'll interact with the database in much the same way as your app will? Cheers, Steve
Re: Load data from a csv file without using COPY
> On Jun 19, 2018, at 9:16 PM, Ravi Krishna wrote: > > In order to test a real life scenario (and use it for benchmarking) I want to > load large number of data from csv files. > The requirement is that the load should happen like an application writing to > the database ( that is, no COPY command). > Is there a tool which can do the job. Basically parse the csv file and > insert it to the database row by row. If performance is relevant then your app should probably be using COPY protocol, not line by line inserts. It's supported by most postgresql access libraries. If your app does that then using "\copy" from psql would be an appropriate benchmark. Cheers, Steve
Re: Performance problem postgresql 9.5
> On Jun 8, 2018, at 1:09 PM, Alvaro Herrera wrote: > > On 2018-Jun-08, Miguel Angel Sanchez Sandoval wrote: > >> Hi guys, migrate from 8.4 to 9.5, all OK except that 2-3 days pass and the >> database experiences slowness, I execute the linux top command and it shows >> me a postgres user process executing a strange command (2yhdgrfrt63788) >> that I consume a lot of CPU, I see the querys active and encounter select >> fun ('./ 2yhdgrfrt63788') , this did not happen in version 8.4, any help >> would appreciate it. > > Hmm, has your database been compromised? You may have an intruder there -- > beware. Definitely. The machine is compromised and doing Bad Things. Image it if possible; save the compromise payload you know about if not. Treat it as compromised and unsafe to attach to a network until you completely wipe and reinstall it. It's probably a compromise via postgresql open to the network with insecure settings. I've seen several of those reported recently, and this one is saving it's payload to the postgresql data directory - somewhere no other user or app will have access to, but which a compromised postgresql can easily write to. Check the pg_hba.conf and packet filter / firewall settings and see what the issue may be. Do the same checks on all your other postgresql servers, test and production. If there's a configuration mistake that let one server be compromised it's may well be there on others too. Unless you are positive the server was not attacked, don't trust it unless you can be absolutely certain it is clean. Best bet is to backup any critical data (and check it for trustworthiness), wipe and rebuild. Cheers, Steve
Re: Query hitting empty tables taking 48 minutes
> On Jun 7, 2018, at 12:11 PM, Rob Sargent wrote: > > What's the url doing in "blob_id = ds3.blob.id
Re: Code of Conduct plan
> On Jun 5, 2018, at 9:51 AM, James Keener wrote: > > > [T]he > main goal is to ensure that if someone is being harassed by a community > member, they have an appropriate avenue to safely report it and ensure > the CoC committee will review > > To be honest, this is a bigger problem. Why would someone not feel > comfortable contacting the core team? Why would they feel better contacting > the CoC board who is probably mostly core team or otherwise self-selected > community members who have a strong belief in the CoC (and I don't mean that > kindly)? The whole point of having a CoC is to advertise that we, as an organization, don't tolerate harassment and offensive behaviour. It also advertises that "we" will deal with it, if reported, and provides a clear, appropriate point of contact to do so. It also states roughly what process will be taken at that point. Also, an alternative perspective, what makes you think every member of the core team would be comfortable being contacted? Handling allegations of, for example, drunken tech bros sexually harassing people isn't comfortable, is time consuming and does require a particular set of soft skills - skills that do not correlate with software architecture chops. Cheers, Steve
Re: Code of Conduct plan
> On Jun 4, 2018, at 3:30 PM, Joshua D. Drake wrote: > > On 06/04/2018 01:46 PM, Tom Lane wrote: >> "Joshua D. Drake" writes: >>> On 06/03/2018 11:29 AM, Tom Lane wrote: We are now asking for a final round of community comments. >> Actually, it's intentional that we are not saying that. The idea is >> that any interaction between PG community members is subject to the CoC, >> whether it takes place in postgresql.org infrastructure or not, so long as >> there is not another CoC that takes precedence (such as a conference's >> CoC). The reason for this is an unfortunate situation that took place in >> the FreeBSD community awhile back [1], wherein one community member was >> abusing another via Twitter, and their existing CoC failed to cover that >> because it had been explicitly written to cover only community-run forums. >> So we're trying to learn from that mistake, and make sure that if such a >> situation ever came up here, the CoC committee would have authority to >> act. > > O.k. I can see that. The problem I am trying to prevent is contributor X > being disciplined for behavior that has nothing to do with PostgreSQL.Org. I > am not sure what the exact good solution is for that but it is none of our > business if contributor X gets into a fight (online or not) with anyone who > is not within the PostgreSQL.Org community. That can be a problem when people who are known by some to be toxic join a community, and those who have previous experience with them leave. That can leave them as a "missing stair" or, worse, if they continue to be horrible but within moderation guidelines they can provoke responses from other participants that can cause them to be moderated or be chastized and then leave. In some cases that has caused the entire culture to drift, and pretty much destroyed the community. (Community management is hard. The more you formalize some of it the more you have to formalize all of it and do so near-perfectly. Developers, who tend to prefer hard black/white, true/false rules rather than leaving some decisions to personal judgement can be some of the worst people at doing community management, and some of the easiest to manipulate.) Cheers, Steve
Re: posgresql.log
> On May 21, 2018, at 3:21 PM, Steve Crawford> wrote: > > > > If this is a test server and you can take it offline for forensics I would do > so, especially if it could provide a path to other internal or critical > resources. If you can image it for safekeeping and forensics, even better. +1 It's compromised. Image it if possible; save the compromise payload you know about if not. Treat it as compromised and unsafe to attach to a network until you completely wipe and reinstall it. > > That appears to be output from wget but the intrusion, if any, could be > through any number of vectors (web, ssh, local attack, etc.) not directly > related to PostgreSQL. Check in your other logs starting with a search for > anything related to that IP address. It's probably a compromise via postgresql open to the network with insecure settings. I've seen several of those reported recently, and this one is saving it's payload to the postgresql data directory - somewhere no other user or app will have access to, but which a compromised postgresql can easily write to. Check the pg_hba.conf and packet filter / firewall settings and see what the issue may be. Do the same checks on all your other postgresql servers, test and production. If there's a configuration mistake that let one server be compromised it's may well be there on others too. > > Verify the usual. Patches up to date, ports appropriately firewalled off, no > default passwords, etc. > > IP comes back to vultr.com which is a cloud company (i.e. could be anyone) > but if it is an attack perhaps contact their abuse department. The C server there is already down; It can't hurt to notify them, but I doubt Choopa would be particularly interested beyond that point unless a subpoena or search warrant were involved. > Unless you are positive the server was not attacked, don't trust it unless > you can be absolutely certain it is clean. Best bet is to backup any critical > data (and check it for trustworthiness), wipe and rebuild. +1. > > Only you (well, OK, maybe them, now) know what data was on this server but > depending on its importance, internal policies, legal requirements and > agreements with third-parties you may have notification requirements and > could need to engage forensics experts. Cheers, Steve
Re: how to securely delete the storage freed when a table is dropped?
> On Apr 13, 2018, at 10:48 AM, Jonathan Morgan> wrote: > > For a system with information stored in a PostgreSQL 9.5 database, in which > data stored in a table that is deleted must be securely deleted (like shred > does to files), and where the system is persistent even though any particular > table likely won't be (so can't just shred the disks at "completion"), I'm > trying to figure out my options for securely deleting the underlying data > files when a table is dropped. > > As background, I'm not a DBA, but I am an experienced implementor in many > languages, contexts, and databases. I've looked online and haven't been able > to find a way to ask PostgreSQL to do the equivalent of shredding its > underlying files before releasing them to the OS when a table is DROPped. Is > there a built-in way to ask PostgreSQL to do this? (I might just not have > searched for the right thing - my apologies if I missed something) > > A partial answer we're looking at is shredding the underlying data files for > a given relation and its indexes manually before dropping the tables, but > this isn't so elegant, and I'm not sure it is getting all the information > from the tables that we need to delete. > > We also are looking at strategies for shredding free space on our data disk - > either running a utility to do that, or periodically replicating the data > volume, swapping in the results of the copy, then shredding the entire volume > that was the source so its "free" space is securely overwritten in the > process. > > Are we missing something? Are there other options we haven't found? If we > have to clean up manually, are there other places we need to go to shred data > than the relation files for a given table, and all its related indexes, in > the database's folder? Any help or advice will be greatly appreciated. Just "securely" deleting the files won't help much, as you'll leave data in spare space on the filesystem, in filesystem journals and so on. Maybe put the transient tables an indexes in their own tablespace on their own filesystem, periodically move them to another tablespace and wipe the first one's filesystem (either physically or forgetting the key for an encrypted FS)? That'd leave you with just the WAL data to deal with. Seems like a slightly odd requirement, though. What's your threat model? Cheers, Steve
Re: best way to write large data-streams quickly?
> On Apr 9, 2018, at 8:49 AM, Mark Moellering> wrote: > > Everyone, > > We are trying to architect a new system, which will have to take several > large datastreams (total of ~200,000 parsed files per second) and place them > in a database. I am trying to figure out the best way to import that sort of > data into Postgres. > > I keep thinking i can't be the first to have this problem and there are > common solutions but I can't find any. Does anyone know of some sort method, > third party program, etc, that can accept data from a number of different > sources, and push it into Postgres as fast as possible? Take a look at http://ossc-db.github.io/pg_bulkload/index.html. Check the benchmarks for different situations compared to COPY. Depending on what you're doing using custom code to parse your data and then do multiple binary COPYs in parallel may be better. Cheers, Steve
Re: Prepared statements
> On Mar 21, 2018, at 2:09 PM, Tim Crosswrote: > > > a simple question I wasn't able to get a clear answer on > > It is general best practice to use prepared statements and parameters > rather than concatenated strings to build sql statements as mitigation > against SQL injection. However, in some databases I've used, there is > also a performance advantage. For example, the planner may be able to > more easily recognise a statement and reuse an existing plan rather than > re-planning the query. > > I wasn't sure what the situation is with postgres - is there a > performance benefit in using prepared statements over a query string > where the values are just concatenated into the string? There are two separate things. Parameterized queries are a query made by your code such that the values are passed in alongside SQL that has placeholders such as $1, $2, ... They're what help save you from SQL injection. A prepared statement is a reference to a query that has previously been passed to the database, and likely pre-interpreted and planned, that's ready to accept parameters and run. Using a prepared statement saves the planner from having to decide on a plan to run the query, which saves you planning time. But it does that by preparing a generic plan that'll work for any bound parameter. The planner might be able to come up with a specific plan based on the particular values passed in that is better than the generic plan, so a naive implementation of prepared statements might lead to the execution of the query being slower in some cases, as it uses a generic plan when a specific one might be better. Postgresql avoids the worst cases of that by only switching to a generic plan for a prepared statement after it's re-planned it a few times with specific values, and the specific plans have been costed more expensive than the generic one (or something like that). The generic plan is also frozen in to the prepared statement, so if the data statistics vary significantly during the lifetime of the prepared statement the plan may no longer be a particularly good one. Prepared statements are certainly useful, but choosing whether to use them or not isn't quite as simple as "it'll avoid the planning overhead". Parameterized queries are almost always a good idea. Cheers, Steve
Re: Best options for new PG instance
> On Mar 5, 2018, at 8:53 AM, David Gauthierwrote: > > Hi: > > I'm going to be requesting a PG DB instance (v9.6.7) from an IT dept in a > large corp setting. I was wondering if anyone could comment on the pros/cons > of getting this put on a virtual machine vs hard metal ? Locally mounted > disk vs nfs ? I've been running postgresql instances on ESXi VMs for years with no issues. I've not benchmarked them, but performance has been good enough despite their running on fairly wimpy hardware. Performance relative to bare metal is probably going to be dominated by disk IO, and depending on how you're hosting VMs that can be anywhere between pretty good and terrible - in a large corporation I'd expect it to be pretty good. Just don't skimp on RAM - having your hot data in the filesystem cache is always good and can make high latency storage tolerable. If performance isn't critical then a VM is great. If it is, you'll want to plan and maybe benchmark a bit to decide whether bare metal is going to be significantly better for what you're doing. I wouldn't let NFS anywhere near it. I'd ideally want something that looks to the VM like a locally mounted disk, whether that be really local or served from a SAN or iSCSI or ... https://www.slideshare.net/jkshah/best-practices-of-running-postgresql-in-virtual-environments has some hints on VM-specific things to consider. Cheers, Steve
Re: Is there a continuous backup for pg ?
> On Mar 2, 2018, at 11:05 AM, Gary Mwrote: > > Hi, > > I have an unusual requirement for schema based, live backup of a 24/7 > database processing 100K inserts/updates per hour. The data store is around > 100TB. > > The requirement is supporting an incremental backup of 10 minute windows. > Replication is not considered backup from malicious action. > > Are there any best practices or solutions that can meet these requirements ? Sounds almost like you're looking for point-in-time recovery, which will let you restore an entire cluster to any time in the past (if you can afford the storage), using physical replication. https://www.postgresql.org/docs/current/static/continuous-archiving.html There are several sets of third-party tools that'll help with the setup, monitoring and other tooling. Look for omnipitr or pitrtools. (I'd guess you could build something similar that would let you do logical recovery by recording changesets from a logical replication connection, but I don't know if anyone has put that together.) Cheers, Steve
Re: Enforce primary key on every table during dev?
(Melvin's mail doesn't quote properly; I've tried to fix it but may have broken something) > On Mar 1, 2018, at 8:50 AM, Melvin Davidsonwrote: > > > On Thu, Mar 1, 2018 at 11:43 AM, Alban Hertroys wrote: > > >> On 1 March 2018 at 17:22, Steven Lembark wrote: >>> If you have a design with un-identified data it means that you havn't >>> normalized it properly: something is missing from the table with >>> un-identifiable rows. >> >> While that holds true for a relational model, in reporting for >> example, it is common practice to denormalize data without a >> requirement to be able to identify a single record. The use case for >> such tables is providing quick aggregates on the data. Often this >> deals with derived data. It's not that uncommon to not have a primary >> or even a uniquely identifiable key on such tables. >> >> I do not disagree that having a primary key on a table is a bad thing, >> but I do disagree that a primary key is a requirement for all tables. >> >> More generally: For every rule there are exceptions. Even for this one. > > You may perceive that to be "common practice", but in reality it is not, and > in fact a bad one. As was previously stated, PosgreSQL is a _relational_ > database, > and breaking that premise will eventually land you in very big trouble. There > is no solid reason not to a primary key for every table. Sure there is. It's an additional index and significant additional insert / update overhead. If you're never going to retrieve single rows, nor join in such a way that uniqueness on this side is required there's no need for a unique identifier. It's a rare case that you won't want a primary key, and I'll often add a surrogate one for convenience even when it's not actually needed, but there are cases where it's appropriate not to have one, even in OLTP work. Log tables, for example. "Every table should have a primary key, whether natural or surrogate" is a great guideline, and everyone should follow it until they understand when they shouldn't. More generally: For every rule there are exceptions. Even for this one. Cheers, Steve
Re: persistent read cache
> On Feb 11, 2018, at 5:14 PM, Sand Stonewrote: > > > Hi. I wonder if there is such a thing or extension in the PG world. > > Here is my use case. I am using PG (PG10 to be more specific) in a > cloud VM environment. The tables are stored in RAID0 managed SSD > backed attached storage. Depending on the VM I am using, I usually > have 256GB local SSD unused. > > I wonder if PG could leverage this local SSD as a read (page/block) > cache, to complement/extend the DRAM by used by shared_buffer today. It seems something that PostgreSQL could take advantage of, but it's probably the wrong layer to implement it. If your VM infrastructure doesn't have any way to use it directly, maybe you could do it at the drive / filesystem level with something like bcache, lvmcache or enhanceio? Adding that sort of complexity to something that needs solid data integrity makes me nervous, but those solutions have been in the field for years. Cheers, Steve
Re: Notify client when a table was full
> On Jan 21, 2018, at 6:44 PM, Vincenzo Romano> wrote: > > 2018-01-21 19:31 GMT+01:00 Francisco Olarte : >> On Sun, Jan 21, 2018 at 1:27 PM, Michael Paquier >> wrote: >>> On Fri, Jan 19, 2018 at 03:40:01PM +, Raymond O'Donnell wrote: >> ... How do you define "full"? > > The only possible and meaningful case, IMHO, as stated by David > earlier, is "file system full". If your filesystem is full you're pretty much off the air. It's something that should never happen on a production system. So ... any automation around "the filesystem is full" is going to be much the same as "the server is dead". You're unlikely to be able to do anything useful from the app at that point, let alone from a trigger function. If the answer involves handling the case where the file system is full we're not answering a useful question, and the original poster probably needs to clarify. > Which is communicated by Postgres with the "Class 53 — Insufficient > Resources" error codes. > Please refer to official documentation like: > > https://www.postgresql.org/docs/10/static/errcodes-appendix.html > > For specific programming languages more details need to be checked. > Cheers, Steve
Re: Connection type
> On Jan 18, 2018, at 4:06 PM, Enrico Pirozziwrote: > > Hi, > > is there a way to know what kind of connection a client is doing? > (ODBC,JDBC..etc) > > I saw the pg_stat_activity view, > > > but in the application name field there no infomation about > what kind of connection a client is doing. No. They're all exactly the same as far as postgresql is concerned all speaking the same postgresql native protocol, or close enough. The ODBC / JDBC / libpq difference is purely on the client side. A client can voluntarily set the application_name, e.g. as part of it's connection string, to identify itself to the server, if you want to be able to identify which sort of client is connected easily. Cheers, Steve
Re: Mailing list archiver
> On Jan 3, 2018, at 6:00 AM, Jordan Deitchwrote: > > Thanks for the feedback! I will continue to address these issues, and I > intend on adding #postgresql IRC logs as well :-) Please don't make a public archive of #postgresql without discussion. I believe the general feeling of those who use the channel is that a public archive of it is not wanted. Cheers, Steve
Re: Does PostgreSQL check database integrity at startup?
> On Dec 29, 2017, at 6:14 PM, Melvin Davidsonwrote: > > Edson's original request was for a query that shows the FILENAMEs for the > table. Which the query you provide does not do. > As for "qood" query, that is entirely an opinion. But a well informed one. Your query may work well enough for you, but when you're suggesting others rely on it you should expect more informed users to point out that it has some critical flaws - especially ones that might not be obvious to new users - lest others naively rely on it. > The query WILL show all files associated > with ALL tables. No, it won't. You're filtering out a bunch of things via the name of the table. That doesn't do the right thing. You're heading in the right direction, but the query you gave gets some things badly wrong. Listen to the criticism from others and you'll improve your knowledge and skills. There's absolutely nothing wrong with posting information that's not quite right, nor with getting feedback from others on what's wrong with it. Copping an attitude in response to that feedback is where things go downhill. Cheers, Steve
Re: postgresql-10 for ubuntu-17.10 (artful)?
> On Dec 26, 2017, at 6:21 PM, Stuart McGrawwrote: > > Is there a repository for Postgresql-10 available at > http://apt.postgresql.org/pub/repos/apt > for Ubuntu-17.10 (artful)? When I look at the dist/ > subdirectory there, there seem to be repos for all the > other Ubuntu releases including an upcoming one (bionic) > but not artful. > > Am I looking in the wrong place? (I am new to Ubuntu > and Debian packaging.) > > I think that LTS releases (14.04, 16.04, 18.04) are the main releases targeted, with intermediate releases only added if they're incompatible with the previous LTS release. See the FAQ at https://wiki.postgresql.org/wiki/Apt/FAQ - I think you should be able to use the 17.04 package on 17.10. Cheers, Steve
Re: To all who wish to unsubscribe
> On Nov 21, 2017, at 10:39 AM, Andrew Sullivan <a...@crankycanuck.ca> wrote: > > On Mon, Nov 20, 2017 at 02:46:08PM -0800, Steve Atkins wrote: >> That's poor practice, for several reasons - replay attacks with added content >> and it being an extremely rare practice that's likely to trigger bugs in DKIM >> validation are two. The latter is the much bigger deal. >> >> It also doesn't help much for most MIME encoded mail (including base64 >> encoded plain text, like the mail I'm replying to). >> >> Pretending those paragraphs aren't there is the right thing to do. > > Yes. Also the DMARC and forthcoming ARC mechanisms -- super important > for people behind gmail and yahoo and so on -- make that feature not > really work, AFAICT. It doesn't break DMARC or ARC as they only care if the mail is authenticated via SPF or DKIM (with an aligned, blah, blah, blah) so if the DKIM passes, even with an l= flag, it's OK. But ... > I think that part of DKIM is busted, and the > authors of it I've talked to seem to agree. Pretty much, yes. Certainly for mail where you don't have tight control over how it's generated. Removing Subject tagging and footers of the messages isn't an accidental side effect of the migration, it's (I assume) a primary goal of it. If that weren't done, more and more people at large consumer mailbox providers would increasingly have problems sending mail successfully to the lists. Peoples mail filters will just have to adjust. Blame Yahoo, not PGDG. Cheers, Steve
Re: To all who wish to unsubscribe
> On Nov 20, 2017, at 2:18 PM, Piotr Stefaniak> wrote: > > On 2017-11-20 21:03, Tom Lane wrote: >> "Joshua D. Drake" writes: >>> On 11/20/2017 11:40 AM, Magnus Hagander wrote: >>> One thing I would note is that there is no longer a footer that >>> tells people what to do if they want to unsubscribe. Perhaps one >>> thing that could be done is a header (for a temporary time period) >>> that says: >>> The mailing list software of Postgresql.org has changed. Please see >>> this page on instructions on how to manage your subscription and >>> filters. And then after the temporary time period that becomes a >>> footer? >> >> Unfortunately, the removal of the footer is a feature not a bug. In >> order to be DKIM-compatible and thus help avoid becoming classified >> as spammers, we can't mangle message content anymore, just like we >> can't mangle the Subject: line. > I don't miss the footers, but see RFC 6376, "5.3.1. Body Length Limits". > Two fragments quoted are copied below: That's poor practice, for several reasons - replay attacks with added content and it being an extremely rare practice that's likely to trigger bugs in DKIM validation are two. The latter is the much bigger deal. It also doesn't help much for most MIME encoded mail (including base64 encoded plain text, like the mail I'm replying to). Pretending those paragraphs aren't there is the right thing to do. Cheers, Steve > >> A body length count MAY be specified to limit the signature >> calculation to an initial prefix of the body text, measured in >> octets. If the body length count is not specified, the entire message >> body is signed. > >> INFORMATIVE RATIONALE: This capability is provided because it is very >> common for mailing lists to add trailers to messages (e.g., >> instructions on how to get off the list). Until those messages are >> also signed, the body length count is a useful tool for the Verifier >> since it can, as a matter of policy, accept messages having valid >> signatures with extraneous data.