Re: [GENERAL] Understanding "seq scans"

2015-10-13 Thread Lele Gaifax
Merlin Moncure  writes:

> Your data and indexes are organized such that an index is only marginally
> helpful, or so the planner thinks.

Yes, just wanted to get rid of that suspicious "seq scan", that was
effectively caused by a bad index, as Alvaro pointed out.

Thank you,
ciao, lele.
-- 
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
l...@metapensiero.it  | -- Fortunato Depero, 1929.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Understanding "seq scans"

2015-10-13 Thread Lele Gaifax
Alvaro Herrera  writes:

> So 10% of your rows in the master_l10n table start with "quattro"?
> That's pretty odd, isn't it?  How did you manufacture these data?

Well, not a real scenario for sure, but definitely not odd: I just needed an
"extremely" big dataset to test out several different strategies, both on
table layout and indexes. The tables are populated by mechanically translating
the integer primary key into the corresponding "in words" string (1 -> "one")...

See https://gitlab.com/lelix/hstore4l10n for details.

> I wonder if the l10n_text_index index should be on (lang, lower(text)).

Oh wow, thank you for this enlightment! Sure, that's the problem indeed!

> How often are you going to look for translated text without specifying a
> language?

Never. The most frequently used criteria is «LIKE '%word%'» in the context of
a user session, and thus with a "preferred language".

Thank you again,
ciao, lele.
-- 
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
l...@metapensiero.it  | -- Fortunato Depero, 1929.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] hanged session on index creation

2015-10-13 Thread Mikhail
 Hi,
I'm running supposedly an easy operation - renaming index and my session hangs 
on it. pg_stat_activity shows that process is active and is waiting for lock:
=# select application_name, waiting, state, query from pg_stat_activity where 
pid = 15179;
application_name │ waiting │ state  │ query
─┼─┼── ─ 
─┼─
psql             │       t │ active │ ALTER INDEX 
ebsm.fki_eb_order_details_acquiring_order_details_guid RENAME TO idx_tst; locks 
for that database:

=# select locktype, relation, c.relname, virtualtransaction, pid, mode, 
granted, fastpath,
(SELECT MIN(l1.pid)
FROM pg_locks l1
WHERE GRANTED
AND (relation IN (SELECT relation FROM pg_locks l2 WHERE l2.pid = l.pid AND NOT 
granted)
OR
transactionid IN (SELECT transactionid FROM pg_locks l3 WHERE l3.pid = l.pid 
AND NOT granted))) AS blockedby
from pg_locks l
JOIN pg_class c ON c.oid = l.relation
where l.database in (select oid from pg_database where datname = 'mytstdb');
locktype ¦ relation ¦ relname ¦ virtualtransaction ¦ pid ¦ mode ¦ granted ¦ 
fastpath ¦ blockedby
-+--+---++---+-+-+--+--
relation ¦ 3455     ¦ pg_class_tblspc_relfilenode_index ¦ 3/482 ¦ 24388 ¦ 
AccessShareLock ¦ t ¦ t ¦ NULL
relation ¦ 2663     ¦ pg_class_relname_nsp_index ¦ 3/482 ¦ 24388 ¦ 
AccessShareLock ¦ t ¦ t ¦ NULL
relation ¦ 2662     ¦ pg_class_oid_index ¦ 3/482 ¦ 24388 ¦ AccessShareLock ¦ t 
¦ t ¦ NULL
relation ¦ 1259     ¦ pg_class ¦ 3/482 ¦ 24388 ¦ AccessShareLock ¦ t ¦ t ¦ NULL
relation ¦ 11187    ¦ pg_locks ¦ 3/482 ¦ 24388 ¦ AccessShareLock ¦ t ¦ t ¦ NULL
relation ¦ 28686    ¦ eb_order_details ¦ -1/36974 ¦ NULL ¦ AccessShareLock 
¦ t ¦ f ¦ NULL
relation ¦ 28756    ¦ fki_eb_order_details_acquiring_order_details_guid ¦ 
-1/36974 ¦ NULL ¦ AccessShareLock ¦ t ¦ f ¦ NULL
relation ¦ 28689    ¦ pk_eb_order_details ¦ -1/36974 ¦ NULL ¦ 
AccessShareLock ¦ t ¦ f ¦ NULL
relation ¦ 28756    ¦ fki_eb_order_details_acquiring_order_details_guid ¦ 
11/2284 ¦ 15179 ¦ AccessExclusiveLock ¦ f ¦ f ¦ NULL
(9 rows) So, it seems like no other sessions blocks my session. (If it makes 
sense, the fields page, tuple, virtualxid, transactionid, classid, objid, 
objsubid have NULL values).
There is no activity on cpu/hdd and even no activity inside process
# strace -p 15179
Process 15179 attached - interrupt to quit
recvfrom(9, "Q\0\0\0\200ALTER INDEX ebsm.fki_eb_ord"..., 8192, 0, NULL, NULL) = 
129
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={1, 0}}, NULL) = 0
semop(16973888, {{2, -1, 0}}, 1) = -1 EINTR (Interrupted system call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0x1c4f7260806a7) = -1 EINTR (Interrupted system call)
semop(16973888, {{2, -1, 0}}, 1 and no further action within hours. Perf and 
top shows the same: no action performing by process.

The situation is reproducible: i can interrupt the process, run index renaming 
again and i'll get the same situation, described above.

Is there any ideas what is happening inside the database?

Regards, Mikhail



[GENERAL] hanged session on index creation

2015-10-13 Thread Mikhail
 Sorry for my previous message, it went to wrong thread. Creating this as a new 
one.

I'm running supposedly an easy operation - renaming index and my session hangs 
on it. pg_stat_activity shows that process is active and is waiting for lock:
=# select application_name, waiting, state, query from pg_stat_activity where 
pid = 15179;
application_name │ waiting │ state  │ query
─┼─┼── ─ 
─┼─
psql             │       t │ active │ ALTER INDEX 
ebsm.fki_eb_order_details_acquiring_order_details_guid RENAME TO idx_tst; locks 
for that database:

=# select locktype, relation, c.relname, virtualtransaction, pid, mode, 
granted, fastpath,
(SELECT MIN(l1.pid)
FROM pg_locks l1
WHERE GRANTED
AND (relation IN (SELECT relation FROM pg_locks l2 WHERE l2.pid = l.pid AND NOT 
granted)
OR
transactionid IN (SELECT transactionid FROM pg_locks l3 WHERE l3.pid = l.pid 
AND NOT granted))) AS blockedby
from pg_locks l
JOIN pg_class c ON c.oid = l.relation
where l.database in (select oid from pg_database where datname = 'mytstdb');
locktype ¦ relation ¦ relname ¦ virtualtransaction ¦ pid ¦ mode ¦ granted ¦ 
fastpath ¦ blockedby
-+--+---++---+-+-+--+--
relation ¦ 3455     ¦ pg_class_tblspc_relfilenode_index ¦ 3/482 ¦ 24388 ¦ 
AccessShareLock ¦ t ¦ t ¦ NULL
relation ¦ 2663     ¦ pg_class_relname_nsp_index ¦ 3/482 ¦ 24388 ¦ 
AccessShareLock ¦ t ¦ t ¦ NULL
relation ¦ 2662     ¦ pg_class_oid_index ¦ 3/482 ¦ 24388 ¦ AccessShareLock ¦ t 
¦ t ¦ NULL
relation ¦ 1259     ¦ pg_class ¦ 3/482 ¦ 24388 ¦ AccessShareLock ¦ t ¦ t ¦ NULL
relation ¦ 11187    ¦ pg_locks ¦ 3/482 ¦ 24388 ¦ AccessShareLock ¦ t ¦ t ¦ NULL
relation ¦ 28686    ¦ eb_order_details ¦ -1/36974 ¦ NULL ¦ AccessShareLock 
¦ t ¦ f ¦ NULL
relation ¦ 28756    ¦ fki_eb_order_details_acquiring_order_details_guid ¦ 
-1/36974 ¦ NULL ¦ AccessShareLock ¦ t ¦ f ¦ NULL
relation ¦ 28689    ¦ pk_eb_order_details ¦ -1/36974 ¦ NULL ¦ 
AccessShareLock ¦ t ¦ f ¦ NULL
relation ¦ 28756    ¦ fki_eb_order_details_acquiring_order_details_guid ¦ 
11/2284 ¦ 15179 ¦ AccessExclusiveLock ¦ f ¦ f ¦ NULL
(9 rows) So, it seems like no other sessions blocks my session. (If it makes 
sense, the fields page, tuple, virtualxid, transactionid, classid, objid, 
objsubid have NULL values).
There is no activity on cpu/hdd and even no activity inside process
# strace -p 15179
Process 15179 attached - interrupt to quit
recvfrom(9, "Q\0\0\0\200ALTER INDEX ebsm.fki_eb_ord"..., 8192, 0, NULL, NULL) = 
129
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={1, 0}}, NULL) = 0
semop(16973888, {{2, -1, 0}}, 1) = -1 EINTR (Interrupted system call)
--- SIGALRM (Alarm clock) @ 0 (0) ---
rt_sigreturn(0x1c4f7260806a7) = -1 EINTR (Interrupted system call)
semop(16973888, {{2, -1, 0}}, 1 and no further action within hours. Perf and 
top shows the same: no action performing by process.

The situation is reproducible: i can interrupt the process, run index renaming 
again and i'll get the same situation, described above.

Renaming and even creationg other indexes in  mytstdb  takes milliseconds.

Is there any ideas what is happening inside the database?


Re: [GENERAL] hanged session on index creation

2015-10-13 Thread Tom Lane
=?UTF-8?B?TWlraGFpbA==?=  writes:
> I'm running supposedly an easy operation - renaming index and my session 
> hangs on it. pg_stat_activity shows that process is active and is waiting for 
> lock:

> locktype ¦ relation ¦ relname ¦ virtualtransaction ¦ pid ¦ mode ¦ 
> granted ¦ fastpath ¦ blockedby
> -+--+---++---+-+-+--+--
> relation ¦ 28756    ¦ fki_eb_order_details_acquiring_order_details_guid 
> ¦ -1/36974 ¦ NULL ¦ AccessShareLock ¦ t ¦ f ¦ NULL
> relation ¦ 28756    ¦ fki_eb_order_details_acquiring_order_details_guid 
> ¦ 11/2284 ¦ 15179 ¦ AccessExclusiveLock ¦ f ¦ f ¦ NULL

> So, it seems like no other sessions blocks my session.

Huh?  Whatever's holding that AccessShareLock on
fki_eb_order_details_acquiring_order_details_guid is what's blocking your
RENAME.  The pid = NULL bit is interesting; I think that must mean it's
a prepared transaction.  So check for uncommitted prepared transactions...

So one thing wrong with that "blockedby" subselect is that you can't
tell "not blocked" from "blocked by prepared transaction".  Also worth
noting is that that test for matching lock targets is seriously sloppy.
Better versions can be found in our mail list archives --- I seem to
remember concluding that a ROW(...) IS NOT DISTINCT FROM ROW(...) test
that lists locktype and all the other lock-target-defining fields is the
best way to write it.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Re[2]: [GENERAL] hanged session on index creation

2015-10-13 Thread Mikhail
 Thanks for advice about queries, i guess you're talking about those from page  
https://wiki.postgresql.org/wiki/Lock_Monitoring . Previously, I was using 
"blockedby" template from pgAdmin's "Server Status".

And you are absolutely right suspecting distributed transactions. The most 
interesting is that this database has been restored from backup using PITR set 
to 2015-10-11 12:00, and distributed transaction was prepared at '2015-10-11 
12:00:00.130069'. From all that i draw a conclusion, that 1. while restoring 
from backup prepared transactions also restore; 2. they restore with their 
locks on database objects; 3. if you're restoring using PITR, you should 
rollback all the prepared transactions after database start.

Thank you, Tom.

Regards, Mikhail.




>Вторник, 13 октября 2015, 9:29 -04:00 от Tom Lane :
>
>=?UTF-8?B?TWlraGFpbA==?= < bem...@mail.ru > writes:
>> I'm running supposedly an easy operation - renaming index and my session 
>> hangs on it. pg_stat_activity shows that process is active and is waiting 
>> for lock:
>
>> locktype ¦ relation ¦ relname ¦ virtualtransaction ¦ pid ¦ mode ¦ granted ¦ 
>> fastpath ¦ blockedby
>> -+--+---++---+-+-+--+--
>> relation ¦ 28756    ¦ fki_eb_order_details_acquiring_order_details_guid ¦ 
>> -1/36974 ¦ NULL ¦ AccessShareLock ¦ t ¦ f ¦ NULL
>> relation ¦ 28756    ¦ fki_eb_order_details_acquiring_order_details_guid ¦ 
>> 11/2284 ¦ 15179 ¦ AccessExclusiveLock ¦ f ¦ f ¦ NULL
>
>> So, it seems like no other sessions blocks my session.
>
>Huh?  Whatever's holding that AccessShareLock on
>fki_eb_order_details_acquiring_order_details_guid is what's blocking your
>RENAME.  The pid = NULL bit is interesting; I think that must mean it's
>a prepared transaction.  So check for uncommitted prepared transactions...
>
>So one thing wrong with that "blockedby" subselect is that you can't
>tell "not blocked" from "blocked by prepared transaction".  Also worth
>noting is that that test for matching lock targets is seriously sloppy.
>Better versions can be found in our mail list archives --- I seem to
>remember concluding that a ROW(...) IS NOT DISTINCT FROM ROW(...) test
>that lists locktype and all the other lock-target-defining fields is the
>best way to write it.
>
>regards, tom lane





Re: [GENERAL] ID column naming convention

2015-10-13 Thread Rob Sargent

On 10/13/2015 11:36 AM, droberts wrote:

Hi, is there a problem calling ID's different when used as a FK vs table ID?
For example


mydimtable ()
  ID
  name
  description


myfacttable ()
   my_dim_id   # FK to ID above
   total_sales


I 'think' if I don't enforce foreign key constraints, then this practice
prevents tools from being able to generate ERD diagrams right?



--
View this message in context: 
http://postgresql.nabble.com/ID-column-naming-convention-tp5869844.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Pretty sure _all_ ER diagramming tools rely on the DDL of column 
constraints not the names.  And personally I would name the column 
mydim_id. i.e don't inject the extra underscore which is not in your 
table name. (And I assume the table names are fake:  you don't really 
want "table" in table name)


Re: [GENERAL] Understanding "seq scans"

2015-10-13 Thread Alvaro Herrera
Lele Gaifax wrote:
> Alvaro Herrera  writes:
> 
> > So 10% of your rows in the master_l10n table start with "quattro"?
> > That's pretty odd, isn't it?  How did you manufacture these data?
> 
> Well, not a real scenario for sure, but definitely not odd: I just needed an
> "extremely" big dataset to test out several different strategies, both on
> table layout and indexes. The tables are populated by mechanically translating
> the integer primary key into the corresponding "in words" string (1 -> 
> "one")...

I imagined it would be something like that.  It's not the most useful
set of test data, precisely because it doesn't accurately reflect what
you're going to have in practice.  I suggest you enter some actual text,
even if it's just text from Don Camillo or whatever.

> > How often are you going to look for translated text without specifying a
> > language?
> 
> Never. The most frequently used criteria is «LIKE '%word%'» in the context of
> a user session, and thus with a "preferred language".

Be very careful with a % at the left.  The index is not going to work at
all there.  It is not the same as looking for stuff without a % at the
left. 

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] ID column naming convention

2015-10-13 Thread droberts
Hi, is there a problem calling ID's different when used as a FK vs table ID? 
For example


mydimtable ()
 ID
 name
 description


myfacttable ()
  my_dim_id   # FK to ID above
  total_sales


I 'think' if I don't enforce foreign key constraints, then this practice
prevents tools from being able to generate ERD diagrams right?



--
View this message in context: 
http://postgresql.nabble.com/ID-column-naming-convention-tp5869844.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] FOSDEM 2016, 30th and 31st January 2016, Brussels: Call for Participation

2015-10-13 Thread Stefan Keller
Hi,

The sixteenth edition of FOSDEM will take place on Saturday 30th and
Sunday 31st January 2016 at the usual location, the ULB Campus
Solbosch in Brussels.

There's a upcoming deadline for first batch of main track proposals:
16 October 2015.
And there's e.g. a proposal for a Geospatial devroomː
https://titanpad.com/VCAR6DZfHG

Website:
https://fosdem.org/2016/news/2015-09-24-call-for-participation/

:Stefan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ID column naming convention

2015-10-13 Thread Gavin Flower

On 14/10/15 06:36, droberts wrote:

Hi, is there a problem calling ID's different when used as a FK vs table ID?
For example


mydimtable ()
  ID
  name
  description


myfacttable ()
   my_dim_id   # FK to ID above
   total_sales


I 'think' if I don't enforce foreign key constraints, then this practice
prevents tools from being able to generate ERD diagrams right?



--
View this message in context: 
http://postgresql.nabble.com/ID-column-naming-convention-tp5869844.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


My practice is to name the PRIMARY KEY as id, and foreign keys with the 
original table name plus the sufiix_id.


By leaving the table name off the primary key name, and just using id, 
makes it more obvious that it is a primary key (plus it seems redundant 
to prefix the primary key name with its own table name!).


CREATE TABLE house
(
id  int PRIMARY KEY,
address text
);

CREATE TABLE room
(
id   int PRIMARY KEY,
house_id int REFERENCES house(id),
name text
);


There are exceptions like:

CREATE TABLE human
(
idint PRIMARY KEY,
mother_id int REFERENCES human (id),
father_id int REFERENCES human (id),
name  text
);

Cheers,
Gavin


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Understanding "seq scans"

2015-10-13 Thread Lele Gaifax
Alvaro Herrera  writes:

> I imagined it would be something like that.  It's not the most useful
> set of test data, precisely because it doesn't accurately reflect what
> you're going to have in practice.  I suggest you enter some actual text,
> even if it's just text from Don Camillo or whatever.

Sure, of course. The goal was more exercising the hstore type, which I have
never used before, and to find a good index strategy.

>> Never. The most frequently used criteria is «LIKE '%word%'» in the context of
>> a user session, and thus with a "preferred language".
>
> Be very careful with a % at the left.  The index is not going to work at
> all there.  It is not the same as looking for stuff without a % at the
> left. 

Right, I know. I'm indeed surprised how fast it already is to scan the whole
table, in particular in the JSONB and HSTORE cases, where I didn't create an
index on the field!

Just for fun, I will try to learn about full text searches, another thing I
never used on PG, and if it works against an HSTORE field... I don't know yet
if I will need that machinery (current DB is still well under the million
records) though.

Thank you,
ciao, lele.
-- 
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
l...@metapensiero.it  | -- Fortunato Depero, 1929.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Merge join vs merge semi join against primary key

2015-10-13 Thread Tom Lane
Sean Rhea  writes:
> No, the customers table is not 100% the same. This is a live production
> system, so the data is (unfortunately) changing under us a bit here. That
> said, there are still some strange things going on. I just reran
> everything. The query plan time hasn't changed, but as Jeremy, Igor, and
> David all pointed out, there's something funky going on with the apparent
> size of the customers table. These queries were all run within 5 minutes of
> each other:

> production=> explain analyze SELECT ac.* FROM balances ac JOIN customers o
> ON (o.id= ac.customer_id AND o.group_id = 45);
> QUERY
> PLAN
> ---
>  Merge Join  (cost=2475.89..20223.08 rows=7 width=80) (actual
> time=157.437..243670.853 rows=7318 loops=1)
>Merge Cond: (ac.customer_id = o.id)   ->  Index Scan using
> balances_customer_id_index on balances ac  (cost=0.00..727.42 rows=16876
> width=80) (actual time=0.489..30.573 rows=16876 loops=1)
>->  Index Scan using customers_pkey on customers o  (cost=0.00..65080.01
> rows=184 width=8) (actual time=127.266..243582.767 rows=*7672* loops=1)
>  Filter: (group_id = 45)
>  Rows Removed by Filter: *212699113*
>  Total runtime: 243674.288 ms
> (7 rows)

> production=> select count(*) from customers where group_id = 45;
>  count
> ---
>430
> (1 row)

What you're looking at there is rows being read repeatedly as a
consequence of the mergejoin applying mark/restore operations to rescan
portions of its righthand input.  This will happen whenever there are
duplicate keys in the lefthand input.

I think the planner does take the possibility of rescans into account
in its cost estimates, but perhaps it's not weighing it heavily
enough.  It would be interesting to see what you get as a second-choice
plan if you set enable_mergejoin = off.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Merge join vs merge semi join against primary key

2015-10-13 Thread Sean Rhea
All,

No, the customers table is not 100% the same. This is a live production
system, so the data is (unfortunately) changing under us a bit here. That
said, there are still some strange things going on. I just reran
everything. The query plan time hasn't changed, but as Jeremy, Igor, and
David all pointed out, there's something funky going on with the apparent
size of the customers table. These queries were all run within 5 minutes of
each other:

production=> explain analyze SELECT ac.* FROM balances ac JOIN customers o
ON (o.id= ac.customer_id AND o.group_id = 45);
QUERY
PLAN
---
 Merge Join  (cost=2475.89..20223.08 rows=7 width=80) (actual
time=157.437..243670.853 rows=7318 loops=1)
   Merge Cond: (ac.customer_id = o.id)   ->  Index Scan using
balances_customer_id_index on balances ac  (cost=0.00..727.42 rows=16876
width=80) (actual time=0.489..30.573 rows=16876 loops=1)
   ->  Index Scan using customers_pkey on customers o  (cost=0.00..65080.01
rows=184 width=8) (actual time=127.266..243582.767 rows=*7672* loops=1)
 Filter: (group_id = 45)
 Rows Removed by Filter: *212699113*
 Total runtime: 243674.288 ms
(7 rows)

production=> select count(*) from customers where group_id = 45;
 count
---
   430
(1 row)

production=> select count(*) from customers;
 count

 476645
(1 row)

Is it possible for explain analyze to somehow produce bad stats? I can't
figure out where that 212699113 number is coming from at all.

Sean





On Mon, Oct 12, 2015 at 5:43 PM, David Rowley 
wrote:

> On 10 October 2015 at 08:52, Sean Rhea  wrote:
>
>>
>> 1. When I join two tables with "WHERE id IN (...)" versus with an explicit
>> join, and the join column for the inner table is a primary key, I would
>> expect
>> the same behavior in both cases, but the optimizer is choosing a merge
>> join in
>> one case and a merge semi join in the other. There's at most one customer
>> with a given id. Why not do a semi join?
>>
>>
> Unfortunately the 9.2 planner does not make any checks to verify that
> customers.id is unique to perform a semi join. There is a pending patch
> in the 9.6 cycle to add this optimisation.
>
>
>> production=> select count(*) from customers;
>>  count
>> 
>>  473733
>> (1 row)
>>
>> ...
>
>
>>->  Index Scan using customers_pkey on customers o
>>  (cost=0.00..63836.61 rows=836 width=8) (actual time=159.840..201915.765
>> rows=7672 loops=1)
>>  Filter: (group_id = 45)
>>  Rows Removed by Filter: 212699113
>>
>>
> Rows Removed by Filter: 212699113 seems to indicate that your 473733 row
> count for "customers" is incorrect.
>
> If you're doing lots of filtering on group_id, then perhaps you should
> think about adding an index on customers (group_id,id)
>
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
> 
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [GENERAL] Merge join vs merge semi join against primary key

2015-10-13 Thread Sean Rhea
Tom,

Just to clarify, is the lefthand input customers or balances?

And turning off merge joins "fixes" everything, including the runtime:

production=> set enable_mergejoin = off;
SET
production=> explain analyze SELECT ac.* FROM balances ac JOIN customers o
ON (o.id = ac.customer_id AND o.group_id = 45);
   QUERY PLAN

 Hash Join  (cost=20288.24..20722.35 rows=7 width=80) (actual
time=133.202..140.876 rows=7318 loops=1)
   Hash Cond: (ac.customer_id = o.id)
   ->  Seq Scan on balances ac  (cost=0.00..370.76 rows=16876 width=80)
(actual time=0.015..5.853 rows=16876 loops=1)
   ->  Hash  (cost=20285.94..20285.94 rows=184 width=8) (actual
time=126.768..126.768 rows=430 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 12kB
 ->  Seq Scan on customers o  (cost=0.00..20285.94 rows=184
width=8) (actual time=16.901..126.606 rows=430 loops=1)
   Filter: (group_id = 45)
   Rows Removed by Filter: 476221
 Total runtime: 142.089 ms
(9 rows)

Sean




On Tue, Oct 13, 2015 at 11:20 AM, Tom Lane  wrote:

> Sean Rhea  writes:
> > No, the customers table is not 100% the same. This is a live production
> > system, so the data is (unfortunately) changing under us a bit here. That
> > said, there are still some strange things going on. I just reran
> > everything. The query plan time hasn't changed, but as Jeremy, Igor, and
> > David all pointed out, there's something funky going on with the apparent
> > size of the customers table. These queries were all run within 5 minutes
> of
> > each other:
>
> > production=> explain analyze SELECT ac.* FROM balances ac JOIN customers
> o
> > ON (o.id= ac.customer_id AND o.group_id = 45);
> > QUERY
> > PLAN
> >
> ---
> >  Merge Join  (cost=2475.89..20223.08 rows=7 width=80) (actual
> > time=157.437..243670.853 rows=7318 loops=1)
> >Merge Cond: (ac.customer_id = o.id)   ->  Index Scan using
> > balances_customer_id_index on balances ac  (cost=0.00..727.42 rows=16876
> > width=80) (actual time=0.489..30.573 rows=16876 loops=1)
> >->  Index Scan using customers_pkey on customers o
> (cost=0.00..65080.01
> > rows=184 width=8) (actual time=127.266..243582.767 rows=*7672* loops=1)
> >  Filter: (group_id = 45)
> >  Rows Removed by Filter: *212699113*
> >  Total runtime: 243674.288 ms
> > (7 rows)
>
> > production=> select count(*) from customers where group_id = 45;
> >  count
> > ---
> >430
> > (1 row)
>
> What you're looking at there is rows being read repeatedly as a
> consequence of the mergejoin applying mark/restore operations to rescan
> portions of its righthand input.  This will happen whenever there are
> duplicate keys in the lefthand input.
>
> I think the planner does take the possibility of rescans into account
> in its cost estimates, but perhaps it's not weighing it heavily
> enough.  It would be interesting to see what you get as a second-choice
> plan if you set enable_mergejoin = off.
>
> regards, tom lane
>


Re: [GENERAL] ID column naming convention

2015-10-13 Thread David G. Johnston
On Tue, Oct 13, 2015 at 1:36 PM, droberts 
wrote:

> I 'think' if I don't enforce foreign key constraints, then this practice
> prevents tools from being able to generate ERD diagrams right?
>

​Yes, if you don't name them the same and don't setup an actual FK then
there is no reason to assume that two random fields on separate tables are
in any way related.

I suspect Rob is right that even naming them the same isn't enough - but
that would depend upon the tool.  Given the, IMO misguided, prevalence of
naming every PK fields "id" about the only meaningful logic such a tool
could employ would be to look for fields of the form "table_id" and if
"table" has an "id" field assume that they are indeed related.  Since your
example use "my_dim" for a field that exists on the "mydimtable" even this
logic would be insufficient to guess the link you know exists but don't
encode into the schema.

Personally, I don't care whether it is a PK or FK - ID fields are database
unique and when present always refer to the same entity.  In this case I
would call the field "mydim_id" on both tables.  I would expect to have to
define UNIQUE (PRIMARY KEY) and FOREIGN KEY constraints if I wanted tools
to understand the relationship between the two tables even though I
standardized the name of the ID field.​

There are implications to choosing any particular naming convention.  I'm
not familiar with any that are so severe that I would call them problems.

David J.


[GENERAL] Archiving while idle every archive_timeout with wal_level hot_standby

2015-10-13 Thread Laurence Rowe
I'm seeing Postgres 9.4.4 archive while idle every archive_timeout when I
set ``wal_level hot_standby``:

$ initdb testpg
> $ cat << 'EOF' >> testpg/postgresql.conf
> > wal_level = hot_standby
> > archive_mode = on
> > archive_timeout = 10
> > checkpoint_timeout = 1h
> > archive_command = 'echo $(date) archive %p'
> > log_checkpoints = on
> > EOF
> $ postgres -D testpg
> LOG:  database system was shut down at 2015-10-13 11:58:45 PDT
> LOG:  MultiXact member wraparound protections are now enabled
> LOG:  autovacuum launcher started
> LOG:  database system is ready to accept connections
> Tue Oct 13 12:00:47 PDT 2015 archive pg_xlog/00010001
> Tue Oct 13 12:00:57 PDT 2015 archive pg_xlog/00010002
> LOG:  checkpoint starting: xlog
> Tue Oct 13 12:01:07 PDT 2015 archive pg_xlog/00010003
> LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log
> file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s,
> total=0.026 s; sync files=0, longest=0.000 s, average=0.000 s
> Tue Oct 13 12:01:17 PDT 2015 archive pg_xlog/00010004
> Tue Oct 13 12:01:27 PDT 2015 archive pg_xlog/00010005
> LOG:  checkpoint starting: xlog
> Tue Oct 13 12:01:38 PDT 2015 archive pg_xlog/00010006
> LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log
> file(s) added, 0 removed, 3 recycled; write=0.000 s, sync=0.000 s,
> total=0.027 s; sync files=0, longest=0.000 s, average=0.000 s
> Tue Oct 13 12:01:48 PDT 2015 archive pg_xlog/00010007
> Tue Oct 13 12:01:58 PDT 2015 archive pg_xlog/00010008
> LOG:  checkpoint starting: xlog
> LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log
> file(s) added, 0 removed, 3 recycled; write=0.000 s, sync=0.000 s,
> total=0.001 s; sync files=0, longest=0.000 s, average=0.000 s



At ``wal_level archive`` I only see archiving every checkpoint_timeout (see
http://www.postgresql.org/message-id/1407389876762-5813999.p...@n5.nabble.com
):

$ initdb testpg
> $ cat << 'EOF' >> testpg/postgresql.conf
> > wal_level = archive
> > archive_mode = on
> > archive_timeout = 10
> > checkpoint_timeout = 60
> > archive_command = 'echo $(date) archive %p'
> > log_checkpoints = on
> > EOF
> # lrowe@Laurences-iMac ~/scratch
> $ postgres -D testpg
> LOG:  database system was shut down at 2015-10-13 12:25:38 PDT
> LOG:  MultiXact member wraparound protections are now enabled
> LOG:  autovacuum launcher started
> LOG:  database system is ready to accept connections
> Tue Oct 13 12:25:49 PDT 2015 archive pg_xlog/00010001
> LOG:  checkpoint starting: time
> LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log
> file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s,
> total=0.029 s; sync files=0, longest=0.000 s, average=0.000 s
> Tue Oct 13 12:26:39 PDT 2015 archive pg_xlog/00010002
> LOG:  checkpoint starting: time
> LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 transaction log
> file(s) added, 0 removed, 1 recycled; write=0.000 s, sync=0.000 s,
> total=0.030 s; sync files=0, longest=0.000 s, average=0.000 s
> Tue Oct 13 12:27:39 PDT 2015 archive pg_xlog/00010003



Is there some way to avoid the extra archiving while idle at ``wal_level
hot_standby``?


Laurence


Re: [GENERAL] Drop or alter column under load give ERROR #42804 structure of query does not match function result type:

2015-10-13 Thread Adrian Klaver

On 10/12/2015 07:53 PM, Victor Blomqvist wrote:

Do you have some advice how to design my functions to work around this
problem?

If I understand your conversation correct the problem is returning the
rowtype users from the function. If so, I can think of two workarounds
(both quite inconvenient and complex):

1. Use RETURNS TABLE(...) together with not selecting * in the functions.
2. Use RETURNS  also without select * in the functions.


Might want to investigate the record return type:

http://www.postgresql.org/docs/9.4/interactive/plpgsql-overview.html#PLPGSQL-ARGS-RESULTS

40.1.2. Supported Argument and Result Data Types

"It is also possible to declare a PL/pgSQL function as returning record, 
which means that the result is a row type whose columns are determined 
by specification in the calling query, as discussed in Section 7.2.1.4."


The section that explains difference between declared type record and 
returned type record:


http://www.postgresql.org/docs/9.4/interactive/plpgsql-declarations.html#PLPGSQL-DECLARATION-RECORDS


How to use a returned record in query:

http://www.postgresql.org/docs/9.4/interactive/queries-table-expressions.html#QUERIES-TABLEFUNCTIONS

See bottom of section.


Basically all the above leaves it up to the calling query to 'shape' the 
output. Not sure if that will work for you.




What do other people do in this situation? For our system the lowest
load is in the late night, 04 - 06, which might have sufficiently low
load to avoid the issue, but I would much prefer to run schema changes
when there are people in the office.

/Victor

On Mon, Oct 12, 2015 at 10:15 PM, Adrian Klaver
> wrote:

On 10/12/2015 06:53 AM, Tom Lane wrote:

Andres Freund >
writes:

On 2015-10-09 14:32:44 +0800, Victor Blomqvist wrote:

CREATE FUNCTION select_users(id_ integer) RETURNS SETOF
users AS
$$
BEGIN
RETURN QUERY SELECT * FROM users WHERE id = id_;
END;
$$ LANGUAGE plpgsql;


My guess is that the problem here is that table level
locking prevents
modification of the "users" type when the table is used, but
there's no
locking preventing the columns to be dropped while the
function is
used. So what happens is that 1) the function is parsed &
planned 2)
DROP COLUMN is executed 3) the contained statement is
executed 4) a
mismatch between the contained statement and the function
definition is
detected.


The query plan as such does get refreshed, I believe.  The
problem is that
plpgsql has no provision for the definition of a named composite
type to
change after a function's been parsed.  This applies to
variables of named
composite types for sure, and based on this example I think it
must apply
to the function result type as well, though I'm too lazy to go
check the
code right now.


That makes sense. The problem is that I cannot square that with
Albe's example, which I tested also:

"
Session 1:

test=> CREATE TABLE users (id integer PRIMARY KEY, name varchar NOT
NULL, to_be_removed integer NOT NULL);
CREATE TABLE
test=> CREATE FUNCTION select_users(id_ integer) RETURNS SETOF users AS
$$BEGIN RETURN QUERY SELECT * FROM users WHERE id = id_;
END;$$ LANGUAGE plpgsql;
CREATE FUNCTION

Session 2:

test=> SELECT id, name FROM select_users(18);
  id | name
+--
(0 rows)

Ok, now the plan is cached.

Now in Session 1:

test=> ALTER TABLE users DROP COLUMN to_be_removed;
ALTER TABLE

Session2:

test=> SELECT id, name FROM select_users(18);
  id | name
+--
(0 rows)

No error.  This is 9.4.4.
"


We have had past discussions about fixing this.  I believe it would
require getting rid of use of plpgsql's "row" infrastructure for
named
composites, at least in most cases, and going over to the "record"
infrastructure instead.  In the past the conversations have
stalled as
soon as somebody complained that that would probably make some
operations
slower.  I don't entirely understand that objection, since (a)
some other
operations would probably get faster, and (b) performance does
not trump
correctness.  But that's where the discussion stands at the moment.

 regards, tom lane




--
Adrian Klaver
adrian.kla...@aklaver.com 





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing 

Re: [GENERAL] ID column naming convention

2015-10-13 Thread John R Pierce

On 10/13/2015 3:27 PM, droberts wrote:

Thanks.   My only question is how do you create a schema diagram (ERD) then?
The tool won't know what the relationships are unless maybe you put foreign
key constraints on.  BTW does anyone recommend a tool to to that?  I've been
playing with DbVisualizer.



I don't know of any ERD tool that will recognize foreign key references 
without FK constraint definitions, regardless of what things are named.




--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Creating Report for PieChart

2015-10-13 Thread Alex Magnum
Hello,
I need to process some statistics for a pie chart (json) where I only want
to show a max of 8 slices. If I have more data points like in below table I
need to combine all to a slice called others. If there are less or equal 8
i use them as is.

I am currently doing this with a plperl function which works well but was
just wondering out of curiosity if that could be done withing an sql query.

Anyone having done something similar who could point me in the right
direction?


SELECT count(*),country_name FROM stats_archive WHERE id=400 GROUP BY
country_name ORDER BY COUNT DESC;
 count |   country_name
---+---
   302 | Malaysia
65 | Singapore
57 | Thailand
26 | Indonesia
15 | France
14 | United States
14 | India
13 | Philippines
12 | Vietnam
10 | Republic of Korea
10 | Canada
 7 | Australia
 6 | Brazil
 6 | Czech Republic
 5 | Switzerland
 4 | Saudi Arabia
 3 | Ireland
 3 | Japan
 3 | Sweden
 3 | South Africa
 3 | Belarus
 3 | Colombia
 3 | United Kingdom
 1 | Peru


   country_name  | count | perc
-+---+---
 Malaysia|   302 |  51.4
 Singapore   |65 |  11.0
 Thailand|57 |   9.7
 Indonesia   |26 |   4.4
 France  |15 |   2.6
 United States   |14 |   2.4
 India   |14 |   2.4
 Others  |95 |  16.1
 Total   |   588 |   100

Thanks a lot for any suggestions
Alex


Re: [GENERAL] ID column naming convention

2015-10-13 Thread droberts
Gavin Flower-2 wrote
> On 14/10/15 06:36, droberts wrote:
>> Hi, is there a problem calling ID's different when used as a FK vs table
>> ID?
>> For example
>>
>>
>> mydimtable ()
>>   ID
>>   name
>>   description
>>
>>
>> myfacttable ()
>>my_dim_id   # FK to ID above
>>total_sales
>>
>>
>> I 'think' if I don't enforce foreign key constraints, then this practice
>> prevents tools from being able to generate ERD diagrams right?
>>
>>
>>
>> --
>> View this message in context:
>> http://postgresql.nabble.com/ID-column-naming-convention-tp5869844.html
>> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>>
>>
> My practice is to name the PRIMARY KEY as id, and foreign keys with the 
> original table name plus the sufiix_id.
> 
> By leaving the table name off the primary key name, and just using id, 
> makes it more obvious that it is a primary key (plus it seems redundant 
> to prefix the primary key name with its own table name!).
> 
> CREATE TABLE house
> (
>  id  int PRIMARY KEY,
>  address text
> );
> 
> CREATE TABLE room
> (
>  id   int PRIMARY KEY,
>  house_id int REFERENCES house(id),
>  name text
> );
> 
> 
> There are exceptions like:
> 
> CREATE TABLE human
> (
>  idint PRIMARY KEY,
>  mother_id int REFERENCES human (id),
>  father_id int REFERENCES human (id),
>  name  text
> );
> 
> Cheers,
> Gavin
> 
> 
> -- 
> Sent via pgsql-general mailing list (

> pgsql-general@

> )
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Thanks.   My only question is how do you create a schema diagram (ERD) then? 
The tool won't know what the relationships are unless maybe you put foreign
key constraints on.  BTW does anyone recommend a tool to to that?  I've been
playing with DbVisualizer.



--
View this message in context: 
http://postgresql.nabble.com/ID-column-naming-convention-tp5869844p5869881.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Creating Report for PieChart

2015-10-13 Thread Bruce Momjian
On Wed, Oct 14, 2015 at 01:56:11AM +0200, Alex Magnum wrote:
> Hello,
> I need to process some statistics for a pie chart (json) where I only want to
> show a max of 8 slices. If I have more data points like in below table I need
> to combine all to a slice called others. If there are less or equal 8 i use
> them as is.
> 
> I am currently doing this with a plperl function which works well but was just
> wondering out of curiosity if that could be done withing an sql query.
> 
> Anyone having done something similar who could point me in the right 
> direction?

I think you want the HAVING clause, e.g. HAVING COUNT(*) > 8.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Creating Report for PieChart

2015-10-13 Thread Scott Mead


> On Oct 13, 2015, at 19:56, Alex Magnum  wrote:
> 
> Hello,
> I need to process some statistics for a pie chart (json) where I only want to 
> show a max of 8 slices. If I have more data points like in below table I need 
> to combine all to a slice called others. If there are less or equal 8 i use 
> them as is.
> 
> I am currently doing this with a plperl function which works well but was 
> just wondering out of curiosity if that could be done withing an sql query.
> 
> Anyone having done something similar who could point me in the right 
> direction? 
> 
> 
> SELECT count(*),country_name FROM stats_archive WHERE id=400 GROUP BY 
> country_name ORDER BY COUNT DESC;
>  count |   country_name
> ---+---
>302 | Malaysia
> 65 | Singapore
> 57 | Thailand
> 26 | Indonesia
> 15 | France
> 14 | United States
> 14 | India
> 13 | Philippines
> 12 | Vietnam
> 10 | Republic of Korea
> 10 | Canada
>  7 | Australia
>  6 | Brazil
>  6 | Czech Republic
>  5 | Switzerland
>  4 | Saudi Arabia
>  3 | Ireland
>  3 | Japan
>  3 | Sweden
>  3 | South Africa
>  3 | Belarus
>  3 | Colombia
>  3 | United Kingdom
>  1 | Peru
> 
> 
>country_name  | count | perc  
> -+---+---
>  Malaysia|   302 |  51.4 
>  Singapore   |65 |  11.0 
>  Thailand|57 |   9.7 
>  Indonesia   |26 |   4.4 
>  France  |15 |   2.6 
>  United States   |14 |   2.4 
>  India   |14 |   2.4 
>  Others  |95 |  16.1 
>  Total   |   588 |   100 
> 
> Thanks a lot for any suggestions
I would use rank to get a rank number for each record. 


> Alex


Re: [GENERAL] Creating Report for PieChart

2015-10-13 Thread Scott Mead
On Tue, Oct 13, 2015 at 9:14 PM, Scott Mead  wrote:

>
>
> On Oct 13, 2015, at 19:56, Alex Magnum  wrote:
>
> Hello,
> I need to process some statistics for a pie chart (json) where I only want
> to show a max of 8 slices. If I have more data points like in below table I
> need to combine all to a slice called others. If there are less or equal 8
> i use them as is.
>
> I am currently doing this with a plperl function which works well but was
> just wondering out of curiosity if that could be done withing an sql query.
>
> Anyone having done something similar who could point me in the right
> direction?
>
>
> SELECT count(*),country_name FROM stats_archive WHERE id=400 GROUP BY
> country_name ORDER BY COUNT DESC;
>  count |   country_name
> ---+---
>302 | Malaysia
> 65 | Singapore
> 57 | Thailand
> 26 | Indonesia
> 15 | France
> 14 | United States
> 14 | India
> 13 | Philippines
> 12 | Vietnam
> 10 | Republic of Korea
> 10 | Canada
>  7 | Australia
>  6 | Brazil
>  6 | Czech Republic
>  5 | Switzerland
>  4 | Saudi Arabia
>  3 | Ireland
>  3 | Japan
>  3 | Sweden
>  3 | South Africa
>  3 | Belarus
>  3 | Colombia
>  3 | United Kingdom
>  1 | Peru
>
>
>country_name  | count | perc
> -+---+---
>  Malaysia|   302 |  51.4
>  Singapore   |65 |  11.0
>  Thailand|57 |   9.7
>  Indonesia   |26 |   4.4
>  France  |15 |   2.6
>  United States   |14 |   2.4
>  India   |14 |   2.4
>  Others  |95 |  16.1
>  Total   |   588 |   100
>
> Thanks a lot for any suggestions
>
> I would use rank to get a rank number for each record.
>

  Sorry, Sent the last one from my phone, here's an example:



Use 'rank' to generate the rank order of the entry.

postgres=# select country, count(1) num_entries,
rank() over (order by count(1) DESC)
from test GROUP by country ORDER BY num_entries DESC;
 country | num_entries | rank
-+-+--
 US  |  20 |1
 CA  |  15 |2
 SP  |   8 |3
 IT  |   7 |4
(4 rows)

There's probably an easier way to do this without a sub-select, but, it
works.

postgres=# SELECT country, num_entries, rank
   FROM (select country, count(1) num_entries,
   rank() over (order by count(1) DESC)
   FROM test GROUP by country
) foo WHERE rank < 4;

 country | num_entries | rank
-+-+--
 US  |  20 |1
 CA  |  15 |2
 SP  |   8 |3
(3 rows)

postgres=#

--
Scott Mead
OpenSCG
www.openscg.com

>
>
> Alex
>
>


Re: [GENERAL] Creating Report for PieChart

2015-10-13 Thread dinesh kumar
+Adding to Scott

On Tue, Oct 13, 2015 at 6:25 PM, Scott Mead  wrote:

>
>
>
> On Tue, Oct 13, 2015 at 9:14 PM, Scott Mead  wrote:
>
>>
>>
>> On Oct 13, 2015, at 19:56, Alex Magnum  wrote:
>>
>> Hello,
>> I need to process some statistics for a pie chart (json) where I only
>> want to show a max of 8 slices. If I have more data points like in below
>> table I need to combine all to a slice called others. If there are less or
>> equal 8 i use them as is.
>>
>> I am currently doing this with a plperl function which works well but was
>> just wondering out of curiosity if that could be done withing an sql query.
>>
>> Anyone having done something similar who could point me in the right
>> direction?
>>
>>
>> SELECT count(*),country_name FROM stats_archive WHERE id=400 GROUP BY
>> country_name ORDER BY COUNT DESC;
>>  count |   country_name
>> ---+---
>>302 | Malaysia
>> 65 | Singapore
>> 57 | Thailand
>> 26 | Indonesia
>> 15 | France
>> 14 | United States
>> 14 | India
>> 13 | Philippines
>> 12 | Vietnam
>> 10 | Republic of Korea
>> 10 | Canada
>>  7 | Australia
>>  6 | Brazil
>>  6 | Czech Republic
>>  5 | Switzerland
>>  4 | Saudi Arabia
>>  3 | Ireland
>>  3 | Japan
>>  3 | Sweden
>>  3 | South Africa
>>  3 | Belarus
>>  3 | Colombia
>>  3 | United Kingdom
>>  1 | Peru
>>
>>
>>country_name  | count | perc
>> -+---+---
>>  Malaysia|   302 |  51.4
>>  Singapore   |65 |  11.0
>>  Thailand|57 |   9.7
>>  Indonesia   |26 |   4.4
>>  France  |15 |   2.6
>>  United States   |14 |   2.4
>>  India   |14 |   2.4
>>  Others  |95 |  16.1
>>  Total   |   588 |   100
>>
>> Thanks a lot for any suggestions
>>
>> I would use rank to get a rank number for each record.
>>
>
>   Sorry, Sent the last one from my phone, here's an example:
>
>
>
> Use 'rank' to generate the rank order of the entry.
>
> postgres=# select country, count(1) num_entries,
> rank() over (order by count(1) DESC)
> from test GROUP by country ORDER BY num_entries DESC;
>  country | num_entries | rank
> -+-+--
>  US  |  20 |1
>  CA  |  15 |2
>  SP  |   8 |3
>  IT  |   7 |4
> (4 rows)
>
> There's probably an easier way to do this without a sub-select, but, it
> works.
>
> postgres=# SELECT country, num_entries, rank
>FROM (select country, count(1) num_entries,
>rank() over (order by count(1) DESC)
>FROM test GROUP by country
> ) foo WHERE rank < 4;
>
>  country | num_entries | rank
> -+-+--
>  US  |  20 |1
>  CA  |  15 |2
>  SP  |   8 |3
> (3 rows)
>
> postgres=#
>
>

Not sure which PG version you are using, but if you are on 9.4, you may use
filters as below.

postgres=# SELECT * FROM stats_archive ;
 cname
---
 I
 I
 U
 J
 K
(5 rows)

postgres=# WITH total AS
(
SELECT COUNT(*) cnt, cname, ROW_NUMBER() OVER() FROM
stats_archive GROUP BY 2
)
SELECT 'others' as cname, sum(cnt) filter (where row_number >2) FROM total
UNION
SELECT cname, cnt FROM total WHERE row_number<=2;
 cname  | sum
+-
 J  |   1
 I  |   2
 others |   2
(3 rows)

--
> Scott Mead
> OpenSCG
> www.openscg.com
>
>>
>>
>> Alex
>>
>>
>


-- 

Regards,
Dinesh
manojadinesh.blogspot.com