date:20180213

Efficiently searching for the most recent rows where a column matches any result from a different query

2018-02-13 Thread mkslaf

Hello,
I have the following schema:
    CREATE TABLE users (        id   BIGSERIAL PRIMARY KEY,        name TEXT    
  NOT NULL UNIQUE    );        CREATE TABLE friends (        user_id        
BIGINT NOT NULL REFERENCES users,        friend_user_id BIGINT NOT NULL 
REFERENCES users,        UNIQUE (user_id, friend_user_id)    );        CREATE 
TABLE posts (        id      BIGSERIAL PRIMARY KEY,        user_id BIGINT    
NOT NULL REFERENCES users,        content TEXT      NOT NULL    );    CREATE 
INDEX posts_user_id_id_index ON posts(user_id, id);
Each user can unilaterally follow any number of friends. The posts table has a 
large number of rows and is rapidly growing.
My goal is to retrieve the 10 most recent posts of a user's friends. This query 
gives the correct result, but is inefficient:
    SELECT posts.id, users.name, posts.content    FROM posts JOIN users ON 
posts.user_id = users.id    WHERE posts.user_id IN (SELECT friend_user_id FROM 
friends WHERE user_id = 1)    ORDER BY posts.id DESC LIMIT 10;
If the user's friends have recently posted, the query is still reasonably fast 
(https://explain.depesz.com/s/6ykR). But if the user's friends haven't recently 
posted or the user has no friends, it quickly deteriorates 
(https://explain.depesz.com/s/OnoG).
If I match only a single post author (e.g. WHERE posts.user_id = 5), Postgres 
uses the index posts_user_id_id_index. But if I use IN, the index doesn't 
appear to be used at all.
How can I get these results more efficiently?
I've uploaded the schema and the queries I've tried to dbfiddle at 
http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=cf1489b7f6d53c3fe0b55ed7ccbad1f0. 
The output of "SELECT version()" is "PostgreSQL 9.6.5 on x86_64-pc-linux-gnu, 
compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit" for me.
Thank you in advance for any insights, pointers or suggestions you are able to 
give me.
Regards,Milo

Re: Efficiently searching for the most recent rows where a column matches any result from a different query

2018-02-13 Thread Hellmuth Vargas

Hello:


EXPLAIN (ANALYZE, BUFFERS)
select * from (
SELECT posts.id, users.name, posts.content
FROM posts JOIN users ON posts.user_id = users.id
WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id =
1)

ORDER BY posts.id DESC
) as a
ORDER BY a.id DESC
LIMIT 10;

--


EXPLAIN (ANALYZE, BUFFERS)
select * from (
SELECT posts.id, users.name, posts.content
FROM posts JOIN users ON posts.user_id = users.id
WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE user_id =
2)

ORDER BY posts.id DESC
) as a
ORDER BY a.id DESC
LIMIT 10;

2018-02-13 8:28 GMT-05:00 :

> Hello,
>
> I have the following schema:
>
> CREATE TABLE users (
> id   BIGSERIAL PRIMARY KEY,
> name TEXT  NOT NULL UNIQUE
> );
>
> CREATE TABLE friends (
> user_idBIGINT NOT NULL REFERENCES users,
> friend_user_id BIGINT NOT NULL REFERENCES users,
> UNIQUE (user_id, friend_user_id)
> );
>
> CREATE TABLE posts (
> id  BIGSERIAL PRIMARY KEY,
> user_id BIGINTNOT NULL REFERENCES users,
> content TEXT  NOT NULL
> );
> CREATE INDEX posts_user_id_id_index ON posts(user_id, id);
>
> Each user can unilaterally follow any number of friends. The posts table
> has a large number of rows and is rapidly growing.
>
> My goal is to retrieve the 10 most recent posts of a user's friends. This
> query gives the correct result, but is inefficient:
>
> SELECT posts.id, users.name, posts.content
> FROM posts JOIN users ON posts.user_id = users.id
> WHERE posts.user_id IN (SELECT friend_user_id FROM friends WHERE
> user_id = 1)
> ORDER BY posts.id DESC LIMIT 10;
>
> If the user's friends have recently posted, the query is still reasonably
> fast (https://explain.depesz.com/s/6ykR). But if the user's friends
> haven't recently posted or the user has no friends, it quickly deteriorates
> (https://explain.depesz.com/s/OnoG).
>
> If I match only a single post author (e.g. WHERE posts.user_id = 5),
> Postgres uses the index posts_user_id_id_index. But if I use IN, the index
> doesn't appear to be used at all.
>
> How can I get these results more efficiently?
>
> I've uploaded the schema and the queries I've tried to dbfiddle at
> http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=
> cf1489b7f6d53c3fe0b55ed7ccbad1f0. The output of "SELECT version()" is
> "PostgreSQL 9.6.5 on x86_64-pc-linux-gnu, compiled by gcc (Debian 4.9.2-10)
> 4.9.2, 64-bit" for me.
>
> Thank you in advance for any insights, pointers or suggestions you are
> able to give me.
>
> Regards,
> Milo
>



-- 
Cordialmente,

Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate

Re: Details after Load Peak was: OT: Performance of VM

2018-02-13 Thread Gunnar "Nick" Bluth

Am 06.02.2018 um 15:31 schrieb Thomas Güttler:
> 
> 
> Am 05.02.2018 um 14:26 schrieb Andreas Kretschmer:
>>
>>
>> Am 05.02.2018 um 14:14 schrieb Thomas Güttler:
>>> What do you suggest to get some reliable figures? 
>>
>> sar is often recommended, see
>> https://blog.2ndquadrant.com/in-the-defense-of-sar/.
>>
>> Can you exclude other reasons like vacuum / vacuum freeze?
> 
> In the current case it was a problem in the hypervisor.
> 
> But I want to be prepared for the next time.
> 
> The tool sar looks good. This way I can generate a chart where I can see
> peaks. Nice.
> 
>  But one thing is still unclear. Imagine I see a peak in the chart.
> The peak
> was some hours ago. AFAIK sar has only the aggregated numbers.
> 
> But I need to know details if I want to answer the question "Why?". The
> peak
> has gone and ps/top/iotop don't help me anymore.
> 
> Any idea?

I love atop (atoptool.nl) for exactly that kind of situation. It will
save a snapshot every 10 minutes by default, which you can then simply
"scroll" back to. Helped me pinpointing nightly issues countless times.

Only really available for Linux though (in case you're on *BSD).

Best regards,
-- 
Gunnar "Nick" Bluth
RHCE/SCLA

Mobil +49 172 8853339
Email: gunnar.bl...@pro-open.de
_
In 1984 mainstream users were choosing VMS over UNIX.
Ten years later they are choosing Windows over UNIX.
What part of that message aren't you getting? - Tom Payne




signature.asc
Description: OpenPGP digital signature

Re: Details after Load Peak was: OT: Performance of VM

2018-02-13 Thread Micky Gough

+1 for atop. Be sure to adjust the sampling interval so it suits your
needs. It'll tell you what caused the spike.

Alternatively you could probably use sysdig, but I expect that'd result in
a fair performance hit if your system is already struggling.

Micky

On 14 February 2018 at 08:15, Gunnar "Nick" Bluth 
wrote:

> Am 06.02.2018 um 15:31 schrieb Thomas Güttler:
> >
> >
> > Am 05.02.2018 um 14:26 schrieb Andreas Kretschmer:
> >>
> >>
> >> Am 05.02.2018 um 14:14 schrieb Thomas Güttler:
> >>> What do you suggest to get some reliable figures?
> >>
> >> sar is often recommended, see
> >> https://blog.2ndquadrant.com/in-the-defense-of-sar/.
> >>
> >> Can you exclude other reasons like vacuum / vacuum freeze?
> >
> > In the current case it was a problem in the hypervisor.
> >
> > But I want to be prepared for the next time.
> >
> > The tool sar looks good. This way I can generate a chart where I can see
> > peaks. Nice.
> >
> >  But one thing is still unclear. Imagine I see a peak in the chart.
> > The peak
> > was some hours ago. AFAIK sar has only the aggregated numbers.
> >
> > But I need to know details if I want to answer the question "Why?". The
> > peak
> > has gone and ps/top/iotop don't help me anymore.
> >
> > Any idea?
>
> I love atop (atoptool.nl) for exactly that kind of situation. It will
> save a snapshot every 10 minutes by default, which you can then simply
> "scroll" back to. Helped me pinpointing nightly issues countless times.
>
> Only really available for Linux though (in case you're on *BSD).
>
> Best regards,
> --
> Gunnar "Nick" Bluth
> RHCE/SCLA
>
> Mobil +49 172 8853339
> Email: gunnar.bl...@pro-open.de
> _
> In 1984 mainstream users were choosing VMS over UNIX.
> Ten years later they are choosing Windows over UNIX.
> What part of that message aren't you getting? - Tom Payne
>
>
>

Re: OT: Performance of VM

2018-02-13 Thread Mark Kirkwood




On 11/02/18 00:20, Robert Klemme wrote:

On Mon, Feb 5, 2018 at 5:22 PM, Andrew Kerber  wrote:

Have them check the memory and CPU allocation of the hypervisor, make sure
its not overallocated. Make sure the partitions for stroage are aligned (see
here:
https://blogs.vmware.com/vsphere/2011/08/guest-os-partition-alignment.html)
. Install tuned, and enable the throughput performance profile. Oracle has a
problem with transparent hugepages, postgres may well have the same problem,
so consider disabling transparent hugepages.  There is no reason why
performance on a VM would be worse than performance on a physical server.

Not theoretically. But in practice if you have anything run in a VM
like in this case you do not know what else is working on that box.
Analyzing these issues can be really cumbersome and tricky. This is
why I am generally skeptical of running a resource intensive
application like a RDBMS in a VM. To get halfway predictable results
you want at least a minimum of resources (CPU, memory, IO bandwidth)
reserved for that VM.

Anecdote: we once had a customer run our application in a VM (which is
supported) and complain about slowness. Eventually we found out that
they over committed memory - not in sum for all VMs which is common,
but this single VM had been configured to have more memory than was
physically available in the machine.



Agreed. If you can get the IO layer to have some type of guaranteed 
performance (e.g AWS Provisioned IOPS), then that is a big help. However 
(as you say above) debugging memory and cpu contention (from within the 
guest) is tricky indeed.


Anecdote: concluded VM needed more cpu, so went to 8 to 16 - performance 
got significantly *worse*. We prevailed on the devops guys (this was 
*not* AWS) to migrate the VM is a less busy host. Everything was fine 
thereafter.


regards
Mark

Efficiently searching for the most recent rows where a column matches any result from a different query

Re: Efficiently searching for the most recent rows where a column matches any result from a different query

Re: Details after Load Peak was: OT: Performance of VM

Re: Details after Load Peak was: OT: Performance of VM

Re: OT: Performance of VM

5 matches

Site Navigation

Mail list logo

Footer information