Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis, amqp, s3 in the future?

2017-05-07 Thread Steve Atkins

> On May 7, 2017, at 9:16 AM, Adam Brusselback  
> wrote:
> 
> there's also pg_agent which is a cron-like extension, usually bundled with 
> pg_admin but also available standalone
> 
> https://www.pgadmin.org/docs4/dev/pgagent.html
> 
> 
> -- 
> john r pierce, recycling bits in santa cruz
> 
> In addition to that, there is also  jpgAgent: 
> https://github.com/GoSimpleLLC/jpgAgent
> 
> It uses the same schema as pgagent in the database, and just replaces the 
> actual agent portion of it with a compatible re-write.  Has been way more 
> stable for us since we switched to it, as well as providing features we 
> needed like email notifications and parallel running of steps.
> 
> Disclosure: I wrote it for my company... started on it well before all the 
> alternatives like pg_cron, pg_bucket, etc came out.

There's also pglater, which is a minimal external process that'll let you 
implement any sort of cron-ish functionality entirely inside the database 
without needing to be woken up every minute by an external cron.

https://github.com/wttw/pgsidekick

More proof-of-concept than anything remotely production-ready.

Cheers,
  Steve



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis, amqp, s3 in the future?

2017-05-07 Thread Adam Brusselback
>
> there's also pg_agent which is a cron-like extension, usually bundled with
> pg_admin but also available standalone
>
> https://www.pgadmin.org/docs4/dev/pgagent.html
>
>
> --
> john r pierce, recycling bits in santa cruz
>

In addition to that, there is also  jpgAgent:
https://github.com/GoSimpleLLC/jpgAgent

It uses the same schema as pgagent in the database, and just replaces the
actual agent portion of it with a compatible re-write.  Has been way more
stable for us since we switched to it, as well as providing features we
needed like email notifications and parallel running of steps.

Disclosure: I wrote it for my company... started on it well before all the
alternatives like pg_cron, pg_bucket, etc came out.


Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis, amqp, s3 in the future?

2017-05-05 Thread Peter J. Holzer
On 2017-05-05 11:46:55 -0700, John R Pierce wrote:
> On 5/5/2017 11:28 AM, Peter J. Holzer wrote:
> 
> On 2017-05-04 23:08:25 +0200, Sven R. Kunze wrote:
> 
> On 03.05.2017 12:57, Thomas Güttler wrote:
> 
> Am 02.05.2017 um 05:43 schrieb Jeff Janes:
> 
> No.  You can certainly use PostgreSQL to store blobs.  But 
> then, you
> need to store the PostgreSQL data **someplace**.
> If you don't store it in S3, you have to store it somewhere 
> else.
> 
> I don't understand what you mean here. AFAIK storing blobs in PG 
> is not
> recommended since it is not very efficient.
> 
> Seems like several people here disagree with this conventional wisdom.
> 
> I think it depends very much on what level of "efficiency" you need. On
> my home server (i5 processor, 32GB RAM, Samsung 850 SSD - not a piece of
> junk, but not super powerful either) I can retrieve a small blob from a
> 100GB table in about 0.1 ms, and for large blobs the speed approaches
> 200MB/s. For just about everything I'd do on that server (or even at
> work) this is easily fast enough.
> 
> 
> S3 is often used for terabyte to petabyte file collections.   I would not want
> to burden my relational database with this.

I repeat the the first sentence I wrote: "I think it depends very much
on what level of 'efficiency' you need." Just because some people need
to store petabytes of blob data doesn't mean everybody does. If you need
to store petabytes of blobs, PostgreSQL may not be the right tool. But
it may be the right tool if you just need to store a few thousand PDFs.
To tell people to never store blobs in PostgreSQL because PostgreSQL is
"not efficient" is just bullshit. There are many factors which determine
how you should store your data, and "efficiency" (however that is
defined, if it's defined at all and not just used as a buzzword) is only
one of them - and rarely, in my experience, the most important one.

hp

-- 
   _  | Peter J. Holzer| A coding theorist is someone who doesn't
|_|_) || think Alice is crazy.
| |   | h...@hjp.at | -- John Gordon
__/   | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html


signature.asc
Description: Digital signature


Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis, amqp, s3 in the future?

2017-05-05 Thread John R Pierce

On 5/5/2017 11:28 AM, Peter J. Holzer wrote:

On 2017-05-04 23:08:25 +0200, Sven R. Kunze wrote:

On 03.05.2017 12:57, Thomas Güttler wrote:

Am 02.05.2017 um 05:43 schrieb Jeff Janes:

No.  You can certainly use PostgreSQL to store blobs.  But then, you
need to store the PostgreSQL data **someplace**.
If you don't store it in S3, you have to store it somewhere else.

I don't understand what you mean here. AFAIK storing blobs in PG is not
recommended since it is not very efficient.

Seems like several people here disagree with this conventional wisdom.

I think it depends very much on what level of "efficiency" you need. On
my home server (i5 processor, 32GB RAM, Samsung 850 SSD - not a piece of
junk, but not super powerful either) I can retrieve a small blob from a
100GB table in about 0.1 ms, and for large blobs the speed approaches
200MB/s. For just about everything I'd do on that server (or even at
work) this is easily fast enough.



S3 is often used for terabyte to petabyte file collections.   I would 
not want to burden my relational database with this.



--
john r pierce, recycling bits in santa cruz



Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis, amqp, s3 in the future?

2017-05-05 Thread Peter J. Holzer
On 2017-05-04 23:08:25 +0200, Sven R. Kunze wrote:
> On 03.05.2017 12:57, Thomas Güttler wrote:
> >Am 02.05.2017 um 05:43 schrieb Jeff Janes:
> >>No.  You can certainly use PostgreSQL to store blobs.  But then, you
> >>need to store the PostgreSQL data **someplace**.
> >>If you don't store it in S3, you have to store it somewhere else.
> >
> >I don't understand what you mean here. AFAIK storing blobs in PG is not
> >recommended since it is not very efficient.
> 
> Seems like several people here disagree with this conventional wisdom.

I think it depends very much on what level of "efficiency" you need. On
my home server (i5 processor, 32GB RAM, Samsung 850 SSD - not a piece of
junk, but not super powerful either) I can retrieve a small blob from a
100GB table in about 0.1 ms, and for large blobs the speed approaches
200MB/s. For just about everything I'd do on that server (or even at
work) this is easily fast enough.

Sure, just telling the kernel "send data from file descriptor A (which
happens to be a file) to file descriptor B (a socket)" is a lot more
efficient than copying data from disk into a postgresql process, then
from that process to an application server, from that to the webserver
and that finally sends it to the socket. But if that just lets my server
be 99.9% idle instead of 99.0% idle, I haven't gained much. Similarly,
if my server spends 90% of it's resources doing other stuff, I won't
gain much by optimizing this (I should better optimize that other stuff
it's spending so much time on).

I am in this regard a firm believer in not optimizing prematurely. Do
whatever makes sense from an application point of view. If the blobs are
logically part of some other data (e.g. PDFs in a literature database),
store them together (either all of them in PostgreSQL, or all in some
NoSQL database, or maybe on stone tablets, if that makes sense for some
reason). Only if you have good reason[1] to believe that physically
separating data which logically belongs together will resolve a
bottleneck, then by all means separate them.

hp

[1] "I read somewhere on the internet" is usually not a good reason.

-- 
   _  | Peter J. Holzer| A coding theorist is someone who doesn't
|_|_) || think Alice is crazy.
| |   | h...@hjp.at | -- John Gordon
__/   | http://www.hjp.at/ |http://downlode.org/Etext/alicebob.html


signature.asc
Description: Digital signature


Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis, amqp, s3 in the future?

2017-05-05 Thread Jeff Janes
On Wed, May 3, 2017 at 3:57 AM, Thomas Güttler  wrote:

> Am 02.05.2017 um 05:43 schrieb Jeff Janes:
>
>> On Sun, Apr 30, 2017 at 4:37 AM, Thomas Güttler <
>> guettl...@thomas-guettler.de >
>> wrote:
>>
>> Is is possible that PostgreSQL will replace these building blocks in
>> the future?
>>
>>  - redis (Caching)
>>
>>
>> PostgreSQL has its own caching.  It might not be quite as effective as
>> redis', but you can us it if you are willing to
>> take those trade offs.
>>
>
> What kind of caching does PG offer?
>

It has shared_buffers to cache the data it needs frequently (not query
results, but the data needed to produce the results), and also uses the
file systems cache.  This is what I am referring to.  I wouldn't recommend
using PostgreSQL simply as a cache for something else, if you don't want
any other features of the database.  But if you want to throw Redis up as a
layer of cache in front of PostgreSQL, maybe you should first see if that
RAM, and a bit of tuning, can be used to make PostgreSQL fast enough to not
need the Redis cache.


>
>>
>>
>>  - s3 (Blob storage)
>>
>>
>>
>
> No.  You can certainly use PostgreSQL to store blobs.  But then, you need
>> to store the PostgreSQL data **someplace**.
>> If you don't store it in S3, you have to store it somewhere else.
>>
>
> I don't understand what you mean here. AFAIK storing blobs in PG is not
> recommended since it is not very efficient.
>

If the metadata is stored in PG and the blobs themselves are stored
individually S3, you have a transaction atomicity problem. Solving that is
not likely to be very efficient, either.  You have to pick your poison.

Cheers,

Jeff


Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis, amqp, s3 in the future?

2017-05-04 Thread Alan Hodgson
On Thursday 04 May 2017 14:47:54 John R Pierce wrote:
> On 5/4/2017 2:28 PM, Alan Hodgson wrote:
> > On Thursday 04 May 2017 14:21:00 John R Pierce wrote:
> >> or EBS, and I've heard from more than a few people that EBS can be
> >> something of a sand trap.
> > 
> > Sorry for following up off-topic, but EBS has actually improved
> > considerably in the last few years. You can get guaranteed (and very
> > high) IOPS on SSD storage, and many instance types come with high-speed
> > throughput to EBS. It's much much better for databases than it was 5
> > years ago.
> 
> has it become more stable when Amazon has their occasional major hiccups?

No ... when they have a major problem it generally cascades across the region, 
and several outages have hit EBS in particular. In us-east-1, at least. Other 
regions seem more reliable.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis, amqp, s3 in the future?

2017-05-04 Thread John R Pierce

On 5/4/2017 2:50 PM, John R Pierce wrote:

But there's an extension - pg_cron:
https://www.citusdata.com/blog/2016/09/09/pgcron-run-periodic-jobs-in-postgres/ 




there's also pg_agent which is a cron-like extension, usually bundled 
with pg_admin but also available standalone


https://www.pgadmin.org/docs4/dev/pgagent.html



oh, it should be made clear... both of these extensions require an OS 
level cron/scheduler job to be run every minute or whatever, this job 
invokes some SQL stuff which checks the pg_cron or pg_agent tables and 
runs whatever sql tasks are due.



--
john r pierce, recycling bits in santa cruz



Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis, amqp, s3 in the future?

2017-05-04 Thread John R Pierce

On 5/4/2017 2:08 PM, Sven R. Kunze wrote:
After searching the web, it seems to me that PostgreSQL doesn't offer 
a cron-like background job for cleanup tasks.


http://stackoverflow.com/questions/18187490/postgresql-delete-old-rows-on-a-rolling-basis 




But there's an extension - pg_cron:
https://www.citusdata.com/blog/2016/09/09/pgcron-run-periodic-jobs-in-postgres/



there's also pg_agent which is a cron-like extension, usually bundled 
with pg_admin but also available standalone


https://www.pgadmin.org/docs4/dev/pgagent.html


--
john r pierce, recycling bits in santa cruz



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis, amqp, s3 in the future?

2017-05-04 Thread John R Pierce

On 5/4/2017 2:28 PM, Alan Hodgson wrote:

On Thursday 04 May 2017 14:21:00 John R Pierce wrote:

or EBS, and I've heard from more than a few people that EBS can be
something of a sand trap.


Sorry for following up off-topic, but EBS has actually improved considerably
in the last few years. You can get guaranteed (and very high) IOPS on SSD
storage, and many instance types come with high-speed throughput to EBS. It's
much much better for databases than it was 5 years ago.



has it become more stable when Amazon has their occasional major hiccups?


--
john r pierce, recycling bits in santa cruz



Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis, amqp, s3 in the future?

2017-05-04 Thread Alan Hodgson
On Thursday 04 May 2017 14:21:00 John R Pierce wrote:
> or EBS, and I've heard from more than a few people that EBS can be
> something of a sand trap.
> 

Sorry for following up off-topic, but EBS has actually improved considerably 
in the last few years. You can get guaranteed (and very high) IOPS on SSD 
storage, and many instance types come with high-speed throughput to EBS. It's 
much much better for databases than it was 5 years ago.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis, amqp, s3 in the future?

2017-05-04 Thread David G. Johnston
On Thu, May 4, 2017 at 2:08 PM, Sven R. Kunze  wrote:

> On 03.05.2017 12:57, Thomas Güttler wrote:
>
>> Am 02.05.2017 um 05:43 schrieb Jeff Janes:
>>
>>> ​​
>>>
>>>
>> No.  You can certainly use PostgreSQL to store blobs.  But then, you need
>>> to store the PostgreSQL data **someplace**.
>>> If you don't store it in S3, you have to store it somewhere else.
>>>
>>
>> I don't understand what you mean here. AFAIK storing blobs in PG is not
>> recommended since it is not very efficient.
>>
>
> Seems like several people here disagree with this conventional wisdom.
>
> I think what he was talking about the data itself. You have to store the
> bits and bytes somewhere (e.g. on S3).


​I'm not sure I'd be comfortable placing my PostgreSQL data directory on an
S3-based mount...

​I would look at any general recommendation in this area with suspicion.
Learn how different alternatives works, evaluate your needs, and make an
informed decision for your specific scenario.  Reading other's experiences
helps with the education aspect but unless their scenario is exactly like
yours you should probably refrain from directly accepting their conclusions.

David J.


Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis, amqp, s3 in the future?

2017-05-04 Thread John R Pierce

On 5/4/2017 2:08 PM, Sven R. Kunze wrote:
No.  You can certainly use PostgreSQL to store blobs.  But then, you 
need to store the PostgreSQL data **someplace**.

If you don't store it in S3, you have to store it somewhere else.


I don't understand what you mean here. AFAIK storing blobs in PG is 
not recommended since it is not very efficient.


Seems like several people here disagree with this conventional wisdom.

I think what he was talking about the data itself. You have to store 
the bits and bytes somewhere (e.g. on S3). 



afaik, S3 is not suitable for the $PGDATA directory, its more of an 
archival block file store for sequential access.for the actual 
database storage in the AWS world, you'd either use EC2 local storage, 
or EBS, and I've heard from more than a few people that EBS can be 
something of a sand trap.


re: storing blobs in postgres, I would be very hesitant to storage LARGE 
amounts of bulk data directly in postgres


--
john r pierce, recycling bits in santa cruz



Re: [GENERAL] Caching and Blobs in PG? Was: Can PG replace redis, amqp, s3 in the future?

2017-05-04 Thread Sven R. Kunze

On 03.05.2017 12:57, Thomas Güttler wrote:

Am 02.05.2017 um 05:43 schrieb Jeff Janes:
On Sun, Apr 30, 2017 at 4:37 AM, Thomas Güttler 
> 
wrote:


Is is possible that PostgreSQL will replace these building blocks 
in the future?


 - redis (Caching)


PostgreSQL has its own caching.  It might not be quite as effective 
as redis', but you can us it if you are willing to

take those trade offs.


What kind of caching does PG offer?

I would use a table with a mtime-column and delete the content after N 
days.


After searching the web, it seems to me that PostgreSQL doesn't offer a 
cron-like background job for cleanup tasks.


http://stackoverflow.com/questions/18187490/postgresql-delete-old-rows-on-a-rolling-basis


But there's an extension - pg_cron:
https://www.citusdata.com/blog/2016/09/09/pgcron-run-periodic-jobs-in-postgres/



No.  You can certainly use PostgreSQL to store blobs.  But then, you 
need to store the PostgreSQL data **someplace**.

If you don't store it in S3, you have to store it somewhere else.


I don't understand what you mean here. AFAIK storing blobs in PG is 
not recommended since it is not very efficient.


Seems like several people here disagree with this conventional wisdom.

I think what he was talking about the data itself. You have to store the 
bits and bytes somewhere (e.g. on S3).


Sven


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Caching and Blobs in PG? Was: Can PG replace redis, amqp, s3 in the future?

2017-05-03 Thread Thomas Güttler

Am 02.05.2017 um 05:43 schrieb Jeff Janes:

On Sun, Apr 30, 2017 at 4:37 AM, Thomas Güttler > wrote:

Is is possible that PostgreSQL will replace these building blocks in the 
future?

 - redis (Caching)


PostgreSQL has its own caching.  It might not be quite as effective as redis', 
but you can us it if you are willing to
take those trade offs.


What kind of caching does PG offer?

I would use a table with a mtime-column and delete the content after N days.




 - rabbitmq (amqp)


PostgreSQL has its own system for this, and other ones can be layered on top of 
fully transactional tables.
Again, you can use one or the other, depending on your needs, if you are 
willing to deal with the trade offs.



 - s3 (Blob storage)






No.  You can certainly use PostgreSQL to store blobs.  But then, you need to 
store the PostgreSQL data **someplace**.
If you don't store it in S3, you have to store it somewhere else.


I don't understand what you mean here. AFAIK storing blobs in PG is not 
recommended since it is not very efficient.


Regards,
  Thomas Güttler




--
Thomas Guettler http://www.thomas-guettler.de/


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general