Efficiently searching for the most recent rows where a column matches any result from a different query
Hello, I have the following schema: CREATE TABLE users ( id BIGSERIAL PRIMARY KEY, name TEXT NOT NULL UNIQUE ); CREATE TABLE friends ( user_id BIGINT NOT NULL REFERENCES users, friend_user_id BIGINT NOT NULL REFERENCES users, UNIQUE (user_id, friend_user_id) ); CREATE TABLE posts ( id BIGSERIAL PRIMARY KEY, user_id BIGINT NOT NULL REFERENCES users, content TEXT NOT NULL ); CREATE INDEX posts_user_id_id_index ON posts(user_id, id); Each user can unilaterally follow any number of friends. The posts table has a large number of rows and is rapidly growing. My goal is to retrieve the 10 most recent posts of a user's friends. This query gives the correct result, but is inefficient: SELECT posts.id, users.name, posts.content FROM posts JOIN users ON posts.user_id = users.id WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id = 1) ORDER BY posts.id DESC LIMIT 10; If the user's friends have recently posted, the query is still reasonably fast (https://explain.depesz.com/s/6ykR). But if the user's friends haven't recently posted or the user has no friends, it quickly deteriorates (https://explain.depesz.com/s/OnoG). If I match only a single post author (e.g. WHERE posts.user_id = 5), Postgres uses the index posts_user_id_id_index. But if I use IN, the index doesn't appear to be used at all. How can I get these results more efficiently? I've uploaded the schema and the queries I've tried to dbfiddle at http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=cf1489b7f6d53c3fe0b55ed7ccbad1f0. The output of "SELECT version()" is "PostgreSQL 9.6.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit" for me. Thank you in advance for any insights, pointers or suggestions you are able to give me. Regards,Milo
Re: Efficiently searching for the most recent rows where a column matches any result from a different query
Hello: EXPLAIN (ANALYZE, BUFFERS) select * from ( SELECT posts.id, users.name, posts.content FROM posts JOIN users ON posts.user_id = users.id WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id = 1) ORDER BY posts.id DESC ) as a ORDER BY a.id DESC LIMIT 10; -- EXPLAIN (ANALYZE, BUFFERS) select * from ( SELECT posts.id, users.name, posts.content FROM posts JOIN users ON posts.user_id = users.id WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id = 2) ORDER BY posts.id DESC ) as a ORDER BY a.id DESC LIMIT 10; 2018-02-13 8:28 GMT-05:00 : > Hello, > > I have the following schema: > > CREATE TABLE users ( > id BIGSERIAL PRIMARY KEY, > name TEXT NOT NULL UNIQUE > ); > > CREATE TABLE friends ( > user_idBIGINT NOT NULL REFERENCES users, > friend_user_id BIGINT NOT NULL REFERENCES users, > UNIQUE (user_id, friend_user_id) > ); > > CREATE TABLE posts ( > id BIGSERIAL PRIMARY KEY, > user_id BIGINTNOT NULL REFERENCES users, > content TEXT NOT NULL > ); > CREATE INDEX posts_user_id_id_index ON posts(user_id, id); > > Each user can unilaterally follow any number of friends. The posts table > has a large number of rows and is rapidly growing. > > My goal is to retrieve the 10 most recent posts of a user's friends. This > query gives the correct result, but is inefficient: > > SELECT posts.id, users.name, posts.content > FROM posts JOIN users ON posts.user_id = users.id > WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE > user_id = 1) > ORDER BY posts.id DESC LIMIT 10; > > If the user's friends have recently posted, the query is still reasonably > fast (https://explain.depesz.com/s/6ykR). But if the user's friends > haven't recently posted or the user has no friends, it quickly deteriorates > (https://explain.depesz.com/s/OnoG). > > If I match only a single post author (e.g. WHERE posts.user_id = 5), > Postgres uses the index posts_user_id_id_index. But if I use IN, the index > doesn't appear to be used at all. > > How can I get these results more efficiently? > > I've uploaded the schema and the queries I've tried to dbfiddle at > http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle= > cf1489b7f6d53c3fe0b55ed7ccbad1f0. The output of "SELECT version()" is > "PostgreSQL 9.6.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10) > 4.9.2, 64-bit" for me. > > Thank you in advance for any insights, pointers or suggestions you are > able to give me. > > Regards, > Milo > -- Cordialmente, Ing. Hellmuth I. Vargas S. Esp. Telemática y Negocios por Internet Oracle Database 10g Administrator Certified Associate EnterpriseDB Certified PostgreSQL 9.3 Associate
Re: Details after Load Peak was: OT: Performance of VM
Am 06.02.2018 um 15:31 schrieb Thomas Güttler: > > > Am 05.02.2018 um 14:26 schrieb Andreas Kretschmer: >> >> >> Am 05.02.2018 um 14:14 schrieb Thomas Güttler: >>> What do you suggest to get some reliable figures? >> >> sar is often recommended, see >> https://blog.2ndquadrant.com/in-the-defense-of-sar/. >> >> Can you exclude other reasons like vacuum / vacuum freeze? > > In the current case it was a problem in the hypervisor. > > But I want to be prepared for the next time. > > The tool sar looks good. This way I can generate a chart where I can see > peaks. Nice. > > But one thing is still unclear. Imagine I see a peak in the chart. > The peak > was some hours ago. AFAIK sar has only the aggregated numbers. > > But I need to know details if I want to answer the question "Why?". The > peak > has gone and ps/top/iotop don't help me anymore. > > Any idea? I love atop (atoptool.nl) for exactly that kind of situation. It will save a snapshot every 10 minutes by default, which you can then simply "scroll" back to. Helped me pinpointing nightly issues countless times. Only really available for Linux though (in case you're on *BSD). Best regards, -- Gunnar "Nick" Bluth RHCE/SCLA Mobil +49 172 8853339 Email: gunnar.bl...@pro-open.de _ In 1984 mainstream users were choosing VMS over UNIX. Ten years later they are choosing Windows over UNIX. What part of that message aren't you getting? - Tom Payne signature.asc Description: OpenPGP digital signature
Re: Details after Load Peak was: OT: Performance of VM
+1 for atop. Be sure to adjust the sampling interval so it suits your needs. It'll tell you what caused the spike. Alternatively you could probably use sysdig, but I expect that'd result in a fair performance hit if your system is already struggling. Micky On 14 February 2018 at 08:15, Gunnar "Nick" Bluth wrote: > Am 06.02.2018 um 15:31 schrieb Thomas Güttler: > > > > > > Am 05.02.2018 um 14:26 schrieb Andreas Kretschmer: > >> > >> > >> Am 05.02.2018 um 14:14 schrieb Thomas Güttler: > >>> What do you suggest to get some reliable figures? > >> > >> sar is often recommended, see > >> https://blog.2ndquadrant.com/in-the-defense-of-sar/. > >> > >> Can you exclude other reasons like vacuum / vacuum freeze? > > > > In the current case it was a problem in the hypervisor. > > > > But I want to be prepared for the next time. > > > > The tool sar looks good. This way I can generate a chart where I can see > > peaks. Nice. > > > > But one thing is still unclear. Imagine I see a peak in the chart. > > The peak > > was some hours ago. AFAIK sar has only the aggregated numbers. > > > > But I need to know details if I want to answer the question "Why?". The > > peak > > has gone and ps/top/iotop don't help me anymore. > > > > Any idea? > > I love atop (atoptool.nl) for exactly that kind of situation. It will > save a snapshot every 10 minutes by default, which you can then simply > "scroll" back to. Helped me pinpointing nightly issues countless times. > > Only really available for Linux though (in case you're on *BSD). > > Best regards, > -- > Gunnar "Nick" Bluth > RHCE/SCLA > > Mobil +49 172 8853339 > Email: gunnar.bl...@pro-open.de > _ > In 1984 mainstream users were choosing VMS over UNIX. > Ten years later they are choosing Windows over UNIX. > What part of that message aren't you getting? - Tom Payne > > >
Re: OT: Performance of VM
On 11/02/18 00:20, Robert Klemme wrote: On Mon, Feb 5, 2018 at 5:22 PM, Andrew Kerber wrote: Have them check the memory and CPU allocation of the hypervisor, make sure its not overallocated. Make sure the partitions for stroage are aligned (see here: https://blogs.vmware.com/vsphere/2011/08/guest-os-partition-alignment.html) . Install tuned, and enable the throughput performance profile. Oracle has a problem with transparent hugepages, postgres may well have the same problem, so consider disabling transparent hugepages. There is no reason why performance on a VM would be worse than performance on a physical server. Not theoretically. But in practice if you have anything run in a VM like in this case you do not know what else is working on that box. Analyzing these issues can be really cumbersome and tricky. This is why I am generally skeptical of running a resource intensive application like a RDBMS in a VM. To get halfway predictable results you want at least a minimum of resources (CPU, memory, IO bandwidth) reserved for that VM. Anecdote: we once had a customer run our application in a VM (which is supported) and complain about slowness. Eventually we found out that they over committed memory - not in sum for all VMs which is common, but this single VM had been configured to have more memory than was physically available in the machine. Agreed. If you can get the IO layer to have some type of guaranteed performance (e.g AWS Provisioned IOPS), then that is a big help. However (as you say above) debugging memory and cpu contention (from within the guest) is tricky indeed. Anecdote: concluded VM needed more cpu, so went to 8 to 16 - performance got significantly *worse*. We prevailed on the devops guys (this was *not* AWS) to migrate the VM is a less busy host. Everything was fine thereafter. regards Mark