Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-12-01 Thread George Neuner
On Mon, 30 Nov 2015 23:07:36 -0500, "Steve Petrie, P.Eng."
 wrote:

>Instead of using a DELETE command to destroy the row (and a
>resource-intensive AUTOVACUUM / VACUUM process to reclaim the storage
>space), why not instead, simply mark that session management row as "free"
>(with an UPDATE command) ??

Alban beat me to the answer  8-)  
But to expand a little:

Your plan won't work because Postgresql does not update in place - it
inserts a new changed row and marks the old as deleted.  It does not
physically overwrite the old row until the table is vacuumed.
[If even then - a row which lies beyond the logical end-of-table when
vacuum is finished won't be overwritten until its space is recycled.]

This behavior, known as MVCC (multiple version concurrency control),
is integral to transaction isolation: selects which are running
concurrently with the update may already have seen the old row and
must continue to see it until they complete, even if the update
completes first.

Postgresql doesn't support "dirty read" isolation.  A row can't be
physically dropped or its space overwritten while any transaction that
can "see" it is still running.


For more:
https://devcenter.heroku.com/articles/postgresql-concurrency
https://momjian.us/main/writings/pgsql/mvcc.pdf

Actually lots of great stuff in the presentation section on Bruce
Momjian's site:  https://momjian.us


George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-12-01 Thread Alban Hertroys
> Before I start in to implement a DELETE / AUTOVACUUM / VACUUM approach, to
> recycling disk space used for a session management table, I would like to
> propose, for consideration by this forum, an idea for a different approach.
>
> A row in a session management table, represents a significant "sunk cost" in
> both computing time used to create the row, and in disk space allocated.
> Postgres has to use a lot of resources to create that row in the first
> place.
>
> When the session that originally caused that row to be allocated, eventually
> expires -- why delete the associated session managent row ??
>
> Instead of using a DELETE command to destroy the row (and a
> resource-intensive AUTOVACUUM / VACUUM process to reclaim the storage
> space), why not instead, simply mark that session management row as "free"
> (with an UPDATE command) ??

An UPDATE is a combination of an INSERT and a DELETE command.

However, rows marked as deleted will be reused at some point after
autovacuum (or manual VACUUM) has made sure they are no longer in use
by any DB session.

So your approach can still work, as long as you vacuum that table
frequently enough. The actual solution isn't the UPDATE instead of the
DELETE though, but rather the partial index and an increased
autovacuum frequency.

Alban.
-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-30 Thread Steve Petrie, P.Eng.

Thanks to George and Adrian for their latest responses regarding a DELETE /
AUTOVACUUM / VACUUM approach, to recycling disk space allocated for rows in 
a

postgres table, that is used to manage sessions (e.g. HTTP sessions with
visitor web browsers).

My comments are below. In them, I propose an alternative strategy, using a 
partial index, instead of a DELETE /

AUTOVACUUM / VACUUM approach.

The comments are a bit lengthy, but I hope forum members will consider them, 
and maybe help lead to a breakthrough in using a postgres table for 
high-performance session management.


"George Neuner"  wrote in message
news:ukdm5b1ni7lv393coa71vf8d0i1mi0p...@4ax.com...

On Sun, 29 Nov 2015 05:02:58 -0500, "Steve Petrie, P.Eng."
 wrote:


I should have clarified, that I use the the term "fuzzy" to refer to the
probability mechanism, that hooks a small burst of session row deletion
activity, to each one of a randomly-selected portion (presently 1/5) of
the
HTTP requests that cause a new session row to be INSERTed into the session
table.

This means that on average, only every 5th HTTP request that creates a new
session row, will also incur the session deletion workload. When the
session
row deletion process occurs, its (aggressive) limit for deletion workload
is
2X as many expired rows as needed on average, to keep up with the rate of
session row creation (so the 2X DELETE limit presently == 10 rows).

The idea is to make the process of DELETing expired session rows,
automatically scale its activity, to closely and aggressively match the
rate
of session row creation.


There's nothing really wrong with that, but I wouldn't do it that way
... I would bulk delete old records from a separate scheduled task.

Another way to do it would be to have each new session delete exactly
one old session.  1:1 scales perfectly and spreads the delete load
evenly across all users.

Not that deleting a handful of records is a lengthy process, but it
seems unfair to burden some users with it but not others.  I would
burden (or not) everyone equally.



A heavy burst of new sessions being created will
work proportionately more aggressively to DELETE expired session rows.
This
way, the (HTTP request-driven) PHP app will be self-tuning its own session
table space recycling. And there is no process (e.g. cron-driven),
external to the PHP app itself, that is doing session row deletion.

Based on what I've learned from this forum (but before I have studied
AUTOVACUUM in any detail) my thinking is to include an AUTOVACUUM command
(outside of any SQL transaction block) in the HTTP request-driven PHP app,
immediately following any time the PHP app completes a session row DELETE
command.

Or maybe the AUTOVACUUM request should occur less frequently?


Reducing the frequency will result in a larger table space on disk.
Insertions are made at the end of the table so the table keeps growing
in size regardless of deletions until (some kind of) vacuum is run.

Autovacuum doesn't shrink the table space on disk, it merely compacts
the table's live data so that any free space is at the end.

If you want to tightly control the growth of the table space, you need
to run autovacuum _more_ often, not less.



Before I start in to implement a DELETE / AUTOVACUUM / VACUUM approach, to
recycling disk space used for a session management table, I would like to
propose, for consideration by this forum, an idea for a different approach.

A row in a session management table, represents a significant "sunk cost" in
both computing time used to create the row, and in disk space allocated.
Postgres has to use a lot of resources to create that row in the first
place.

When the session that originally caused that row to be allocated, eventually
expires -- why delete the associated session managent row ??

Instead of using a DELETE command to destroy the row (and a
resource-intensive AUTOVACUUM / VACUUM process to reclaim the storage
space), why not instead, simply mark that session management row as "free"
(with an UPDATE command) ??

In brief -- Why not just leave the expired session management row allocated
in place, already "hot" and ready to be reused for a new session ??

But -- when the app needs to initiate a new session, it must be able quickly
to: 1. know if there are any "free" rows available for re-use, and if so 2.
select and allocate a specific "free" row for re-use, (3. and if no "free"
row is available, then the app will INSERT a new session row)

* * *
* * *

Would the postgres Partial Index facility be useful here?

Here are quotes from the excellent postgres documentation (9.3.5):

"11.8. Partial Indexes

A partial index is an index built over a subset of a table; the subset is
defined by a conditional expression (called the predicate of the partial
index). The index contains entries only for those table rows that satisfy
the predicate. ... This reduces the size of the index, which will speed up
those queries that do use the index. It will also speed u

Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-29 Thread George Neuner
On Sun, 29 Nov 2015 05:02:58 -0500, "Steve Petrie, P.Eng."
 wrote:

>I should have clarified, that I use the the term "fuzzy" to refer to the
>probability mechanism, that hooks a small burst of session row deletion
>activity, to each one of a randomly-selected portion (presently 1/5) of the
>HTTP requests that cause a new session row to be INSERTed into the session
>table.
>
>This means that on average, only every 5th HTTP request that creates a new
>session row, will also incur the session deletion workload. When the session
>row deletion process occurs, its (aggressive) limit for deletion workload is
>2X as many expired rows as needed on average, to keep up with the rate of
>session row creation (so the 2X DELETE limit presently == 10 rows).
>
>The idea is to make the process of DELETing expired session rows,
>automatically scale its activity, to closely and aggressively match the rate
>of session row creation. 

There's nothing really wrong with that, but I wouldn't do it that way
... I would bulk delete old records from a separate scheduled task.

Another way to do it would be to have each new session delete exactly
one old session.  1:1 scales perfectly and spreads the delete load
evenly across all users.

Not that deleting a handful of records is a lengthy process, but it
seems unfair to burden some users with it but not others.  I would
burden (or not) everyone equally.


>A heavy burst of new sessions being created will
>work proportionately more aggressively to DELETE expired session rows. This
>way, the (HTTP request-driven) PHP app will be self-tuning its own session
>table space recycling. And there is no process (e.g. cron-driven),
>external to the PHP app itself, that is doing session row deletion.
>
>Based on what I've learned from this forum (but before I have studied
>AUTOVACUUM in any detail) my thinking is to include an AUTOVACUUM command
>(outside of any SQL transaction block) in the HTTP request-driven PHP app,
>immediately following any time the PHP app completes a session row DELETE
>command.
>
>Or maybe the AUTOVACUUM request should occur less frequently?

Reducing the frequency will result in a larger table space on disk.
Insertions are made at the end of the table so the table keeps growing
in size regardless of deletions until (some kind of) vacuum is run.

Autovacuum doesn't shrink the table space on disk, it merely compacts
the table's live data so that any free space is at the end.

If you want to tightly control the growth of the table space, you need
to run autovacuum _more_ often, not less.

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-29 Thread Magnus Hagander
On Nov 29, 2015 18:34, "George Neuner"  wrote:
>
> On Sun, 29 Nov 2015 05:04:42 -0500, "Steve Petrie, P.Eng."
>  wrote:
>
> >"George Neuner"  wrote in message
> >news:kaed5btl92qr4v8ndevlgtv0f28qaae...@4ax.com...
> >
> >> My vote for an email client would be Thunderbird.  It runs on XP or
> >> higher and you can import Outlook's PST files so as to keep your mail
> >> archives.  Importing PST files directly requires Outlook be available
> >> on the same system [there is also a less friendly way to do it via EML
> >> files exported from Outlook where Outlook is not on the same system].
> >>
> >
> >It's a common misconception that MS Outlook Express is compatible with MS
> >Outlook. But in fact the two products are architecturally unrelated.
>
> My understanding was that OE was based on the old (Win9x) Outlook.  I
> know it isn't the same as the "enterprise" version.

This is fantastically of topic but no, it was not. OE was based on the old
"Internet mail and news".  The actual outlook product has always been
separate.

/Magnus


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-29 Thread George Neuner
On Sun, 29 Nov 2015 05:04:42 -0500, "Steve Petrie, P.Eng."
 wrote:

>"George Neuner"  wrote in message
>news:kaed5btl92qr4v8ndevlgtv0f28qaae...@4ax.com...
>
>> My vote for an email client would be Thunderbird.  It runs on XP or
>> higher and you can import Outlook's PST files so as to keep your mail
>> archives.  Importing PST files directly requires Outlook be available
>> on the same system [there is also a less friendly way to do it via EML
>> files exported from Outlook where Outlook is not on the same system].
>>
>
>It's a common misconception that MS Outlook Express is compatible with MS
>Outlook. But in fact the two products are architecturally unrelated.

My understanding was that OE was based on the old (Win9x) Outlook.  I
know it isn't the same as the "enterprise" version.

I wasn't aware that OE used a different file format.  But, AFAIK, it
does still export EML files, so you can move your mailboxes into
Thunderbird (or whatever).


>I am considering Thunderbird as an MS OE replacement, but my understanding
>is that Mozilla has abandoned all but security-related support for
>Thundebird. I have been kicking the (email client functionality) tires of
>SeaMonkey under my Win XP. I believe that much of SeaMonkey is built on a
>Mozilla code base.

Yes and no.  Mozilla has Thunderbird on a slow development track.  It
does occasionally get new features, but mostly now by having some very
popular extension becoming built in.

Seamonkey was a fork from a discontinued Mozilla application suite. It
is not a Mozilla project, although it does incorporate Mozilla code
from Firefox and Thunderbird.

The problem I have with Seamonkey is that it tries to be all things to
all web users.  "Jack of all trades, master of none" is a truism.
YMMV, but I would rather have very reliable purpose specific tools
than an integrated suite which may do more but be less reliable
overall. 

I'm not knocking Seamonkey per se - it seems to be quite well done -
I'm just making a general observation re: integrated application
suites.  Netscape failed in part because it bit off too much, trying
to do mail and news on top of the browser [and not doing them well - I
loved the Netscape browser, but it's mail and news interface was just
bad].  Mozilla discontinued its web application suite because too few
people wanted it.  

George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-29 Thread Adrian Klaver

On 11/29/2015 01:59 AM, Steve Petrie, P.Eng. wrote:

Thanks to Jeff for the helpful response. My remarks are below.

- Original Message - From: "Jeff Janes" 
To: "Steve Petrie, P.Eng." 
Cc: "Tim Uckun" ; "Merlin Moncure"
;
"John R Pierce" ; "PostgreSQL General"

Sent: Thursday, November 26, 2015 2:07 AM
Subject: Re: [GENERAL] using a postgres table as a multi-writer
multi-updater queue



On Wed, Nov 25, 2015 at 8:39 PM, Steve Petrie, P.Eng.
 wrote:



You don't ever want to delete from such a table so you need to set up
something which allows you to truncate the tables when you no longer
need
them.

...

Truncation is far more efficient than deletion + vacuuming.  If you
are running on the edge of your hardware's capabilities, this
efficiency is important.  But if you are not on the edge, then it is
not worth worrying about.


This is what I was hoping to learn -- because I doubt my app workload will
ever approach the edge of hardware capability.


Just make sure your autovacuum settings are
at least as aggressive as the default settings.



I'll keep that in mind. And by a happy coincidence, in another recent forum
thread (26 Nov 2015, subject "Re: [GENERAL] Re: [webmaster] How to
commit/vacuum a batch of delete statements in a postgresql function") there
is advice from Adrian Klaver to about the need to execute VACUUM outside of
a transaction block.


Just to be clear my reference was to someone running VACUUM manually. 
Jeff's reference was to the autovacuum daemon which runs VACUUM and 
ANALYZE as a background processes:


http://www.postgresql.org/docs/9.4/interactive/routine-vacuuming.html#AUTOVACUUM




My plan was always, to avoid eventual exhaustion of the SERIAL sequence
number integer value series, by swapping in during the periodic app
shutdown, a freshly truncated postgres  table.


I'd make the serial column and bigserial, and then forget about it.


I considered using bigint, but decided against doing so for three reasons.
1. int rep is already going to be way more precision than is needed, 2.
avoid the extra resource consumption incurred by bigint as compared to int,
and 3. keep strictly 32-bit compatible with my 32-bit Win XP / PHP
development system. In fact the PHP app is programmed to terminate
abnormally on detecting a postgres SERIAL sequence number that exceeds the
maximum positive value of a strictly 32-bit signed integer.



Cheers,

Jeff


Steve Petrie, P.Eng.

ITS-ETO Consortium
Oakville, Ontario, Canada
apet...@aspetrie.net






--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-29 Thread Adrian Klaver

On 11/29/2015 02:04 AM, Steve Petrie, P.Eng. wrote:

Thanks to George for the Thuderbird email client recommendation. My remarks
are below.

"George Neuner"  wrote in message
news:kaed5btl92qr4v8ndevlgtv0f28qaae...@4ax.com...

On Wed, 25 Nov 2015 23:39:39 -0500, "Steve Petrie, P.Eng."
 wrote:


My stupid email client software (Microsoft Outlook Express on Win XP)
refuses to respect its own "Reply" option settings for inline reply
text.
I've been looking for a replacement email client but so far without
success.)


Without further comment about a 15 year old, unsupported OS ...



(Suddenly, my MS Outlook Express is letting me do inline comments.)

George, your self-restraint is admirable :) And my plan is to move from Win
XP to Linux in the (somewhat near) future.

Before I lose all credibility with this excellent forum -- be assured that
the PHP website app (not yet online) I'm migrating to postgres from mysql,
will ABSOLUTELY NOT be running in prodution under any kind of Microsoft
server software. Right now I'm planning to use as a server o/s,
DragonFlyBSD
with its robust HAMMER filesystem.



My vote for an email client would be Thunderbird.  It runs on XP or
higher and you can import Outlook's PST files so as to keep your mail
archives.  Importing PST files directly requires Outlook be available
on the same system [there is also a less friendly way to do it via EML
files exported from Outlook where Outlook is not on the same system].



It's a common misconception that MS Outlook Express is compatible with MS
Outlook. But in fact the two products are architecturally unrelated. Much
the same way that Java and Javascript are unrelated.

MS OE does not use .PST files, but there are open source utilities that
will
extract the contents of MS OE mail folders for migration to alternate email
clients.

I am considering Thunderbird as an MS OE replacement, but my understanding
is that Mozilla has abandoned all but security-related support for
Thundebird. I have been kicking the (email client functionality) tires of
SeaMonkey under my Win XP. I believe that much of SeaMonkey is built on a
Mozilla code base.


My take is SeaMonkey is Thunderbird with a bunch of unneeded features. I 
use Thunderbird on a variety of machines and it works just fine. To me 
an email client is a solved problem and security fixes are all that are 
required, still Mozilla is making changes. For the latest version 38:


https://support.mozilla.org/en-US/kb/new-thunderbird-38





Hope this helps,
George



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general





--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-29 Thread Steve Petrie, P.Eng.

Thanks to George for the Thuderbird email client recommendation. My remarks
are below.

"George Neuner"  wrote in message
news:kaed5btl92qr4v8ndevlgtv0f28qaae...@4ax.com...

On Wed, 25 Nov 2015 23:39:39 -0500, "Steve Petrie, P.Eng."
 wrote:


My stupid email client software (Microsoft Outlook Express on Win XP)
refuses to respect its own "Reply" option settings for inline reply text.
I've been looking for a replacement email client but so far without
success.)


Without further comment about a 15 year old, unsupported OS ...



(Suddenly, my MS Outlook Express is letting me do inline comments.)

George, your self-restraint is admirable :) And my plan is to move from Win
XP to Linux in the (somewhat near) future.

Before I lose all credibility with this excellent forum -- be assured that
the PHP website app (not yet online) I'm migrating to postgres from mysql,
will ABSOLUTELY NOT be running in prodution under any kind of Microsoft
server software. Right now I'm planning to use as a server o/s, DragonFlyBSD
with its robust HAMMER filesystem.



My vote for an email client would be Thunderbird.  It runs on XP or
higher and you can import Outlook's PST files so as to keep your mail
archives.  Importing PST files directly requires Outlook be available
on the same system [there is also a less friendly way to do it via EML
files exported from Outlook where Outlook is not on the same system].



It's a common misconception that MS Outlook Express is compatible with MS
Outlook. But in fact the two products are architecturally unrelated. Much
the same way that Java and Javascript are unrelated.

MS OE does not use .PST files, but there are open source utilities that will
extract the contents of MS OE mail folders for migration to alternate email
clients.

I am considering Thunderbird as an MS OE replacement, but my understanding
is that Mozilla has abandoned all but security-related support for
Thundebird. I have been kicking the (email client functionality) tires of
SeaMonkey under my Win XP. I believe that much of SeaMonkey is built on a
Mozilla code base.




Hope this helps,
George



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


--
Steve Petrie, P.Eng.

ITS-ETO Consortium
Oakville, Ontario, Canada
(905) 847-3253
apet...@aspetrie.net



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-29 Thread Steve Petrie, P.Eng.

Thanks to George for the helpful comments.  My remarks are below.

"George Neuner"  wrote in message
news:gvad5bllba9slstdhkn6ql2jbplgd78...@4ax.com...

On Wed, 25 Nov 2015 23:39:39 -0500, "Steve Petrie, P.Eng."
 wrote:






My plan was always, to avoid eventual exhaustion of the SERIAL
sequence number integer value series, by swapping in during the
periodic app shutdown, a freshly truncated postgres
 table.


Is there a logical problem with letting the sequence wrap around?



I can't think of any reason why letting the sequence wrap would be a logical
problem, but for operational simplicity I prefer to reset the sequence
number back to one, during each brief daily postgres server shutdown window.

I don't relish the challenge of mentally grappling with gigantic session row
sequence numbers, when I'm investigating some future operational anomaly.



I'm not sure I understand the reason for "fuzzy" deletion.  There are
a number of timestamps in your data ... is it not possible to delete
deterministically based on one of them?



You are correct. The timestamps are there in the session row, and the PHP
app's session row deletion process does use them to select rows for
deletion. There is also a decrementing integer "hit count" limit column
(presently initialized == 25) that kills a session when its "budget" of HTTP
requests is used up.

I should have clarified, that I use the the term "fuzzy" to refer to the
probability mechanism, that hooks a small burst of session row deletion
activity, to each one of a randomly-selected portion (presently 1/5) of the
HTTP requests that cause a new session row to be INSERTed into the session
table.

This means that on average, only every 5th HTTP request that creates a new
session row, will also incur the session deletion workload. When the session
row deletion process occurs, its (aggressive) limit for deletion workload is
2X as many expired rows as needed on average, to keep up with the rate of
session row creation (so the 2X DELETE limit presently == 10 rows).

The idea is to make the process of DELETing expired session rows,
automatically scale its activity, to closely and aggressively match the rate
of session row creation. A heavy burst of new sessions being created will
work proportionately more aggressively to DELETE expired session rows. This
way, the (HTTP request-driven) PHP app will be self-tuning its own session
table space recycling. And there is no process (e.g. cron-driven),
external to the PHP app itself, that is doing session row deletion.

Based on what I've learned from this forum (but before I have studied
AUTOVACUUM in any detail) my thinking is to include an AUTOVACUUM command
(outside of any SQL transaction block) in the HTTP request-driven PHP app,
immediately following any time the PHP app completes a session row DELETE
command.

Or maybe the AUTOVACUUM request should occur less frequently?



Hope this helps,
George



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Steve Petrie, P.Eng.

ITS-ETO Consortium
Oakville, Ontario, Canada
apet...@aspetrie.net



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-29 Thread Steve Petrie, P.Eng.

Thanks to Jeff for the helpful response. My remarks are below.

- Original Message - 
From: "Jeff Janes" 

To: "Steve Petrie, P.Eng." 
Cc: "Tim Uckun" ; "Merlin Moncure" ;
"John R Pierce" ; "PostgreSQL General"

Sent: Thursday, November 26, 2015 2:07 AM
Subject: Re: [GENERAL] using a postgres table as a multi-writer
multi-updater queue



On Wed, Nov 25, 2015 at 8:39 PM, Steve Petrie, P.Eng.
 wrote:



You don't ever want to delete from such a table so you need to set up
something which allows you to truncate the tables when you no longer
need
them.

...

Truncation is far more efficient than deletion + vacuuming.  If you
are running on the edge of your hardware's capabilities, this
efficiency is important.  But if you are not on the edge, then it is
not worth worrying about.


This is what I was hoping to learn -- because I doubt my app workload will
ever approach the edge of hardware capability.


Just make sure your autovacuum settings are
at least as aggressive as the default settings.



I'll keep that in mind. And by a happy coincidence, in another recent forum
thread (26 Nov 2015, subject "Re: [GENERAL] Re: [webmaster] How to
commit/vacuum a batch of delete statements in a postgresql function") there
is advice from Adrian Klaver to about the need to execute VACUUM outside of
a transaction block.


My plan was always, to avoid eventual exhaustion of the SERIAL sequence
number integer value series, by swapping in during the periodic app
shutdown, a freshly truncated postgres  table.


I'd make the serial column and bigserial, and then forget about it.


I considered using bigint, but decided against doing so for three reasons.
1. int rep is already going to be way more precision than is needed, 2.
avoid the extra resource consumption incurred by bigint as compared to int,
and 3. keep strictly 32-bit compatible with my 32-bit Win XP / PHP
development system. In fact the PHP app is programmed to terminate
abnormally on detecting a postgres SERIAL sequence number that exceeds the
maximum positive value of a strictly 32-bit signed integer.



Cheers,

Jeff


Steve Petrie, P.Eng.

ITS-ETO Consortium
Oakville, Ontario, Canada
apet...@aspetrie.net



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-25 Thread George Neuner
On Wed, 25 Nov 2015 23:39:39 -0500, "Steve Petrie, P.Eng."
 wrote:

>My stupid email client software (Microsoft Outlook Express on Win XP) 
>refuses to respect its own "Reply" option settings for inline reply text.
>I've been looking for a replacement email client but so far without success.)

Without further comment about a 15 year old, unsupported OS ...

My vote for an email client would be Thunderbird.  It runs on XP or
higher and you can import Outlook's PST files so as to keep your mail
archives.  Importing PST files directly requires Outlook be available
on the same system [there is also a less friendly way to do it via EML
files exported from Outlook where Outlook is not on the same system].

Thunderbird directly supports net news, so you don't have to get news
mixed with your mail (unless you want to).


Hope this helps,
George



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-25 Thread George Neuner
On Wed, 25 Nov 2015 23:39:39 -0500, "Steve Petrie, P.Eng."
 wrote:


>Now, I'm reading in this forum that in fact, postgres does not efficiently
>automatically recycle storage space released by row DELETion.

Yes and no.  

Space resulting from deleted rows is not given back to the OS unless
you perform a full vacuum.  Autovacuum compacts the table, squeezing
out empty rows and leaving free space at the end for new insertions.

If the table is heavily used, you will reuse the free space quickly
anyway.  The problem you may run into is needing to autovacuum too
often to control the size of the table.

Space from truncated or dropped tables is immediately given back to
the OS.

The issue with heavy deletion usually is performance.  Deleting scans
the rows and fires any relevant triggers ... truncating or dropping
the table does not.  


>My plan was always, to avoid eventual exhaustion of the SERIAL
>sequence number integer value series, by swapping in during the 
>periodic app shutdown, a freshly truncated postgres 
>  table.

Is there a logical problem with letting the sequence wrap around?  


>So my question to this postgres forum is -- should I just remove from 
>the online app the "fuzzy" probability mechanism, that DELETEs expired
>rows from the session table -- because the postgres server is not going 
>to dynamically recycle the released storage space anyway?

I'm not sure I understand the reason for "fuzzy" deletion.  There are
a number of timestamps in your data ... is it not possible to delete
deterministically based on one of them?


Hope this helps,
George 



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-25 Thread Jeff Janes
On Wed, Nov 25, 2015 at 8:39 PM, Steve Petrie, P.Eng.
 wrote:
>
>> You don't ever want to delete from such a table so you need to set up
>> something which allows you to truncate the tables when you no longer need
>> them.
>
> I am migrating a web PHP application (called ITS-ETO) from mysql to
> postgres. The app INSERTs a row into a postgres table
>  to manage each session with a web browser. Here is
> the DDL for the session table:
>
> CREATE TABLE its_eto.eto_sql_tb_session_www
> (
>   session_www_code   char(32)  NOT NULL UNIQUE PRIMARY KEY,
>
>   session_www_type   int   NOT NULL,
>   session_www_state  int   NOT NULL,
>   session_verify_codechar(7)   NOT NULL,
>
>   session_www_serno  SERIALNOT NULL UNIQUE,
>
>   session_target_serno   int   NULL,
>   session_target_datavarchar(1000) NULL,
>
>   session_www_init_utc   timestamp NOT NULL,
>   session_www_last_utc   timestamp NOT NULL,
>   session_www_expiry_utc timestamp NOT NULL,
>   session_www_delete_utc timestamp NOT NULL,
>   session_www_hit_count  int   NOT NULL,
>   session_www_act_seqno  int   NULL
>
> );
>
> CREATE INDEX ON its_eto.eto_sql_tb_session_www (session_www_serno);
>
> Using a "fuzzy" probability mechanism, some randomly-selected fraction of
> the HTTP requests that initiate a new session, also SELECT and DELETE
> expired rows from the session table. I naively assumed that the database
> server would automatically recycle the storage space dynamically released in
> this way.
>
> Now, I'm reading in this forum that in fact, postgres does not efficiently
> automatically recycle storage space released by row DELETion.

> My application is quite simple and will be supporting a modest workload,
> using a small amount of storage space, compared to the massive transaction
> rates and gigantic space usages, I'm reading about in this forum.


Truncation is far more efficient than deletion + vacuuming.  If you
are running on the edge of your hardware's capabilities, this
efficiency is important.  But if you are not on the edge, then it is
not worth worrying about.  Just make sure your autovacuum settings are
at least as aggressive as the default settings.


>
> I do have the luxury of being able to shut down the application for a few
> minutes periodically e.g every 24 hours.
>
> My plan was always, to avoid eventual exhaustion of the SERIAL sequence
> number integer value series, by swapping in during the periodic app
> shutdown, a freshly truncated postgres  table.

I'd make the serial column and bigserial, and then forget about it.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-25 Thread Steve Petrie, P.Eng.
(Kindly forgive my top-post. My stupid email client software (Microsoft Outlook 
Express on Win XP) refuses to respect its own "Reply" option settings for 
inline reply text. I've been looking for a replacement email client but so far 
without success.)

* * *
* * *

> You don't ever want to delete from such a table so you need to set up 
> something which allows you to truncate the tables when you no longer need 
> them.

I am migrating a web PHP application (called ITS-ETO) from mysql to postgres. 
The app INSERTs a row into a postgres table  to manage 
each session with a web browser. Here is the DDL for the session table:
  CREATE TABLE its_eto.eto_sql_tb_session_www
  (
session_www_code   char(32)  NOT NULL UNIQUE PRIMARY KEY,

session_www_type   int   NOT NULL,
session_www_state  int   NOT NULL,
session_verify_codechar(7)   NOT NULL,

session_www_serno  SERIALNOT NULL UNIQUE,

session_target_serno   int   NULL,
session_target_datavarchar(1000) NULL,

session_www_init_utc   timestamp NOT NULL,
session_www_last_utc   timestamp NOT NULL,
session_www_expiry_utc timestamp NOT NULL,
session_www_delete_utc timestamp NOT NULL,
session_www_hit_count  int   NOT NULL,
session_www_act_seqno  int   NULL

  );

  CREATE INDEX ON its_eto.eto_sql_tb_session_www (session_www_serno);
Using a "fuzzy" probability mechanism, some randomly-selected fraction of the 
HTTP requests that initiate a new session, also SELECT and DELETE expired rows 
from the session table. I naively assumed that the database server would 
automatically recycle the storage space dynamically released in this way.

Now, I'm reading in this forum that in fact, postgres does not efficiently 
automatically recycle storage space released by row DELETion.

* * *
* * *

My application is quite simple and will be supporting a modest workload, using 
a small amount of storage space, compared to the massive transaction rates and 
gigantic space usages, I'm reading about in this forum.

I do have the luxury of being able to shut down the application for a few 
minutes periodically e.g every 24 hours.

My plan was always, to avoid eventual exhaustion of the SERIAL sequence number 
integer value series, by swapping in during the periodic app shutdown, a 
freshly truncated postgres  table.

Before going online, the freshly truncated postgres  
table will receive INSERTs of any active session rows copied over from the old 
 table. The copied session rows will get new sequence 
numbers, but that won't matter, because a session row is referenced within each 
incoming HTTP request, not by its row serial number column  
integer value, but by a randomly-generated (MD5) 32-character unique key column 
 value. 

So my question to this postgres forum is -- should I just remove from the 
online app the "fuzzy" probability mechanism, that DELETEs expired rows from 
the session table -- because the postgres server is not going to dynamically 
recycle the released storage space anyway?

Any comments appreciated.

Steve

- Original Message - 
  From: Tim Uckun 
  To: Merlin Moncure 
  Cc: John R Pierce ; PostgreSQL General 
  Sent: Wednesday, November 25, 2015 3:50 PM
  Subject: Re: [GENERAL] using a postgres table as a multi-writer multi-updater 
queue


  I'll add my two cents.


  I set up something similar a while ago. Here are my suggestions for what they 
are worth.


  You don't ever want to delete from such a table so you need to set up 
something which allows you to truncate the tables when you no longer need them. 
 


  One way to accomplish this is with rules (not triggers, rules are blazingly 
fast compared to triggers).Set up a table inheritance scheme based on 
whatever search criteria you have (date, sequence etc).   Set up a cron job to 
create the child tables well ahead and to set up the rule redirecting the 
insert. For example let's say you have a date partitioned table and you want to 
 keep a table for every day.  Your cron job would run once a day and would 
create the next seven days worth of tables (just incase the cron job fails to 
run for some reason) and would rewrite the rule to insert into the table with a 
if then else type of logic.  This method is preferable to the dynamic creation 
of the table name with string concatenation because again it's significantly 
faster.  


  Another method I tried was to have one "primary" child table and "archival" 
child tables and insert directly into the primal child table.  For example say 
you have a table called "Q".  You set up a table called Q_in which inherits 
from Q.  Your code inserts into the Q_in table, you select from the Q table.  
On a periodic basis you do this 


  BEGIN TRANSACTION
  LOCK TABLE Q_IN IN EXC

Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-25 Thread Jeff Janes
On Mon, Nov 23, 2015 at 2:41 AM, Chris Withers  wrote:
> Hi All,
>
> I wondered if any of you could recommend best practices for using a postgres
> table as a queue. Roughly speaking, 100-200 workers will vomit rows and
> rates of a few hundres per second into the table leaving the status as new
> and then as many workers as needed to keep up with the load will plough
> through the queue changing the status to something other than new.

Is that several hundred per second per worker, or just several hundred
per second in total?

What happens if the worker finds the database has crashed when it goes
to insert the records?  That would determine what kind of
transactional system you need.  On the consumer side, what would
happen if a record were processed twice?


>
> My naive implementation would be something along the lines of:
>
> CREATE TABLE event (
> tstimestamp,
> event char(40),
> statuschar(10),
> CONSTRAINT pkey PRIMARY KEY(ts, event)
> );

How long are going you keep these records around for once processed?

Unless you delete them immediately, you will probably want a partial
index on (ts) where status='new'

> ...with writers doing INSERT or COPY to get data into the table and readers
> doing something like:
>
> SELECT FOR UPDATE * FROM event WHERE status='new' LIMIT 1000;
>
> ...so, grabbing batches of 1,000, working on them and then setting their
> status.

Why 1000 at a time?  Also, you probably need an ORDER BY

> But, am I correct in thinking that SELECT FOR UPDATE will not prevent
> multiple workers selecting the same rows?

You are correct, but the way it will prevent multiple workers from
selecting them at the same time is that the next worker will block
until the first one commits.  You would either use need to use SKIP
LOCKED in 9.5 release, or you would need to use
pg_try_advisory_xact_lock on lower versions, to avoid that.

And, how do you unlock the rows?  There are two general approaches.
One is to lock the row using PostgreSQL's FOR UPDATE type locks, and
hold the transaction open while processing, then updating the row to
mark it done and committing to release the lock.  This cleans up after
itself in the case a worker crashes, but there is no visibility into
what is going on.

The other is to claim the row for the worker by updating a status
field (for example, to have a hostname and pid), and committing that.
And then doing the processing, then updating it again to set it as
done, and committing that.  An abnormal terminated worker will need to
have someone or something clean up after it, but it gives you much
better visibility into what is happening.

>
> Anyway, is this approach reasonable? If so, what tweaks/optimisations should
> I be looking to make?

The best optimization would be to not do it at all.  Why can't the 200
inserting worker just do the work themselves immediately (either
client side or server side), instead of queuing it?  If there is any
possibility of making the processing fast enough to do it that way,
I'd surely spend my time optimizing the actual work, rather than
optimizing a queuing system.

And depending on what failure modes you can tolerate, consider a
best-effort dedicated queuing system rather than a perfectly ACID one
built on PostgreSQL.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-25 Thread Tim Uckun
I'll add my two cents.

I set up something similar a while ago. Here are my suggestions for what
they are worth.

You don't ever want to delete from such a table so you need to set up
something which allows you to truncate the tables when you no longer need
them.

One way to accomplish this is with rules (not triggers, rules are blazingly
fast compared to triggers).Set up a table inheritance scheme based on
whatever search criteria you have (date, sequence etc).   Set up a cron job
to create the child tables well ahead and to set up the rule redirecting
the insert. For example let's say you have a date partitioned table and you
want to  keep a table for every day.  Your cron job would run once a day
and would create the next seven days worth of tables (just incase the cron
job fails to run for some reason) and would rewrite the rule to insert into
the table with a if then else type of logic.  This method is preferable to
the dynamic creation of the table name with string concatenation because
again it's significantly faster.

Another method I tried was to have one "primary" child table and "archival"
child tables and insert directly into the primal child table.  For example
say you have a table called "Q".  You set up a table called Q_in which
inherits from Q.  Your code inserts into the Q_in table, you select from
the Q table.  On a periodic basis you do this

BEGIN TRANSACTION
LOCK TABLE Q_IN IN EXCLUSIVE MODE;
ALTER TABLE Q_IN RENAME TO Q_SOME_DATETIME;
CREATE TABLE Q_IN
  (LIKE Q_SOME_DATETIME INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING
INDEXES);
ALTER SEQUENCE q_in_id_seq OWNED BY q_in.id;
 -- Set some constraints so the query optimizer knows what to do
END TRANSACTION


There is one other method which is the Kafka approach( You can use this in
addition to the above methods)

Create N tables for incoming queue, each one has a sequence for their ID
number. N should be determined by how many clients you expect to run.
Create a rule which uses some hash function or round robin or randomly to
insert data into one of the tables.   Create a different table which keeps
track of client connections.   The clients use this table to keep track of
the last id fetched.  For example let's have I have three types of
processes that run on the incoming data p1,p2, p3  The table logs the
highest ID fetched from each table for each process. When a client connects
it connects to the table with the lowest used ID for that process,  and it
locks it for that client (not process because you can multiple clients
running each process), it processes the records, it updates the id, it
unlocks the table and it backs off for a few seconds.  The next client
which woke up goes through the same process and so on.  Both Apache Kafka
and Amazon kinesis use this approach.  One nice thing about this approach
is that you can put each table in it's own tablespace in it's own disk for
higher performance.

One other thing. This is dangerous but you can turn off logging of the
tables, this will make the inserts much faster.

Finally:

There is no need to do any of this.  Kinesis is cheap, Kafka is pretty
awesome, Rabbit is crazy useful.

Hope this helps, ping me offline if you want more details.

Cheers.


On Tue, Nov 24, 2015 at 11:51 AM, Merlin Moncure  wrote:

> On Mon, Nov 23, 2015 at 12:54 PM, John R Pierce 
> wrote:
> > On 11/23/2015 2:41 AM, Chris Withers wrote:
> >>
> >>
> >> If it's totally wrong, how should I be looking to approach the problem?
> >
> > depending on where these queue entries are coming from, I'd considering
> > using a message queueing system like AMS, MQseries, etc, rather than
> trying
> > to use a relational database table as a queue. your external data
> source(s)
> > would write messages to this queue, and you'd have 'subscriber' processes
> > that listen to the queue and process the messages, inserting persistent
> data
> > into the database as needed.
>
> I just don't agree with this generalization.  Keeping the state of the
> queue in the database has a lot of advantages and is a lot easier to
> deal with from a programming perspective especially if SQL is your
> core competency.  Being able to produce and consume in SQL based on
> other relational datasources is...elegant.
>
> Specialized queue systems are a very heavy dependency and adding a new
> server to your platform to mange queues is not something to take
> lightly.  This advice also applies to scheduling systems like quartz,
> specialized search like solr and elastisearch, and distributed data
> platforms like hadoop.  I've used all of these things and have tended
> to wish I had just used the database instead in just about every case.
>
> Also, check out pgq (https://wiki.postgresql.org/wiki/PGQ_Tutorial).
> Personally, I tend to roll my own queues.   It's not difficult.
>
> merlin
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/

Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-23 Thread John R Pierce

On 11/23/2015 2:51 PM, Merlin Moncure wrote:

On Mon, Nov 23, 2015 at 12:54 PM, John R Pierce  wrote:
>
>depending on where these queue entries are coming from, I'd considering
>using a message queueing system like AMS, MQseries, etc, rather than trying
>to use a relational database table as a queue. your external data source(s)
>would write messages to this queue, and you'd have 'subscriber' processes
>that listen to the queue and process the messages, inserting persistent data
>into the database as needed.
I just don't agree with this generalization.  Keeping the state of the
queue in the database has a lot of advantages and is a lot easier to
deal with from a programming perspective especially if SQL is your
core competency.  Being able to produce and consume in SQL based on
other relational datasources is...elegant.


our whole system at $job is message based as its a distributed 
system.clients send messages to middleware servers that talk to the 
database servers.noone talks directly to the database, instead they 
use messaging.   also the several databases in our core cluster talk to 
each other with messaging, where the front end database publishes events 
that the other reporting database servers subscribe to.   its a very 
powerful model for building complex distributed systems and maintaining 
quite a lot of implementation flexibility, as the exact nature of the 
schema only needs to be known by a few publisher and subscriber modules.




--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-23 Thread Merlin Moncure
On Mon, Nov 23, 2015 at 12:54 PM, John R Pierce  wrote:
> On 11/23/2015 2:41 AM, Chris Withers wrote:
>>
>>
>> If it's totally wrong, how should I be looking to approach the problem?
>
> depending on where these queue entries are coming from, I'd considering
> using a message queueing system like AMS, MQseries, etc, rather than trying
> to use a relational database table as a queue. your external data source(s)
> would write messages to this queue, and you'd have 'subscriber' processes
> that listen to the queue and process the messages, inserting persistent data
> into the database as needed.

I just don't agree with this generalization.  Keeping the state of the
queue in the database has a lot of advantages and is a lot easier to
deal with from a programming perspective especially if SQL is your
core competency.  Being able to produce and consume in SQL based on
other relational datasources is...elegant.

Specialized queue systems are a very heavy dependency and adding a new
server to your platform to mange queues is not something to take
lightly.  This advice also applies to scheduling systems like quartz,
specialized search like solr and elastisearch, and distributed data
platforms like hadoop.  I've used all of these things and have tended
to wish I had just used the database instead in just about every case.

Also, check out pgq (https://wiki.postgresql.org/wiki/PGQ_Tutorial).
Personally, I tend to roll my own queues.   It's not difficult.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-23 Thread John R Pierce

On 11/23/2015 2:41 AM, Chris Withers wrote:


If it's totally wrong, how should I be looking to approach the problem?



depending on where these queue entries are coming from, I'd considering 
using a message queueing system like AMS, MQseries, etc, rather than 
trying to use a relational database table as a queue. your external data 
source(s) would write messages to this queue, and you'd have 
'subscriber' processes that listen to the queue and process the 
messages, inserting persistent data into the database as needed.




--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-23 Thread Andy Colson

On 11/23/2015 4:41 AM, Chris Withers wrote:

Hi All,

I wondered if any of you could recommend best practices for using a
postgres table as a queue. Roughly speaking, 100-200 workers will vomit
rows and rates of a few hundres per second into the table leaving the
status as new and then as many workers as needed to keep up with the
load will plough through the queue changing the status to something
other than new.

My naive implementation would be something along the lines of:

CREATE TABLE event (
 tstimestamp,
 event char(40),
 statuschar(10),
 CONSTRAINT pkey PRIMARY KEY(ts, event)
);


...with writers doing INSERT or COPY to get data into the table and
readers doing something like:

SELECT FOR UPDATE * FROM event WHERE status='new' LIMIT 1000;

...so, grabbing batches of 1,000, working on them and then setting their
status.

But, am I correct in thinking that SELECT FOR UPDATE will not prevent
multiple workers selecting the same rows?

Anyway, is this approach reasonable? If so, what tweaks/optimisations
should I be looking to make?

If it's totally wrong, how should I be looking to approach the problem?

cheers,

Chris


Have you tried Redis?  Its really good at that sort of thing.

-Andy


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-23 Thread Jim Nasby

On 11/23/15 6:12 AM, Ladislav Lenart wrote:

I suggest an excellent read on this topic:

http://www.depesz.com/2013/08/30/pick-a-task-to-work-on/

Highly recommended if you haven't read it yet.


One thing it doesn't mention that you need to be aware of is the vacuum 
workload on a queue table. In a busy queue, it will be difficult or even 
impossible for vacuum to keep the amount of dead rows down to something 
manageable. That's why PgQ and Slony don't even attempt it; instead, 
they rotate through a fixed set of tables. Once all the entries in a 
table have been processed, the table is truncated.


If you go the delete route, make sure you don't index any fields in the 
queue that get updated (otherwise you won't get HOT updates), and run a 
very aggressive manual vacuum so the table stays small.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-23 Thread Ladislav Lenart
Hello.


On 23.11.2015 11:41, Chris Withers wrote:
> Hi All,
> 
> I wondered if any of you could recommend best practices for using a postgres
> table as a queue. Roughly speaking, 100-200 workers will vomit rows and rates 
> of
> a few hundres per second into the table leaving the status as new and then as
> many workers as needed to keep up with the load will plough through the queue
> changing the status to something other than new.
> 
> My naive implementation would be something along the lines of:
> 
> CREATE TABLE event (
> tstimestamp,
> event char(40),
> statuschar(10),
> CONSTRAINT pkey PRIMARY KEY(ts, event)
> );
> 
> 
> ...with writers doing INSERT or COPY to get data into the table and readers
> doing something like:
> 
> SELECT FOR UPDATE * FROM event WHERE status='new' LIMIT 1000;
> 
> ...so, grabbing batches of 1,000, working on them and then setting their 
> status.
> 
> But, am I correct in thinking that SELECT FOR UPDATE will not prevent multiple
> workers selecting the same rows?
> 
> Anyway, is this approach reasonable? If so, what tweaks/optimisations should I
> be looking to make?
> 
> If it's totally wrong, how should I be looking to approach the problem?

I suggest an excellent read on this topic:

http://www.depesz.com/2013/08/30/pick-a-task-to-work-on/

Highly recommended if you haven't read it yet.

Also, if you aim on 9.5 (not released yet), it will introduce:

SELECT...
FOR UPDATE
SKIP LOCKED -- this is new

which supports exactly this use-case (i.e. to implement a job queue).


HTH,

Ladislav Lenart



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] using a postgres table as a multi-writer multi-updater queue

2015-11-23 Thread Chris Withers

Hi All,

I wondered if any of you could recommend best practices for using a 
postgres table as a queue. Roughly speaking, 100-200 workers will vomit 
rows and rates of a few hundres per second into the table leaving the 
status as new and then as many workers as needed to keep up with the 
load will plough through the queue changing the status to something 
other than new.


My naive implementation would be something along the lines of:

CREATE TABLE event (
tstimestamp,
event char(40),
statuschar(10),
CONSTRAINT pkey PRIMARY KEY(ts, event)
);


...with writers doing INSERT or COPY to get data into the table and 
readers doing something like:


SELECT FOR UPDATE * FROM event WHERE status='new' LIMIT 1000;

...so, grabbing batches of 1,000, working on them and then setting their 
status.


But, am I correct in thinking that SELECT FOR UPDATE will not prevent 
multiple workers selecting the same rows?


Anyway, is this approach reasonable? If so, what tweaks/optimisations 
should I be looking to make?


If it's totally wrong, how should I be looking to approach the problem?

cheers,

Chris