Fwd: persistent read cache

2018-02-11 Thread Sand Stone
Posted to the wrong alias. Just wonder if anyone else feels such a use
case.  Thanks.

-- Forwarded message --
From: Sand Stone 
Date: Sun, Feb 11, 2018 at 11:34 AM
Subject: persistent read cache
To: pgsql-hack...@lists.postgresql.org


Hi. I wonder if there is such a thing or extension in the PG world.

Here is my use case. I am using PG (PG10 to be more specific) in a
cloud VM environment. The tables are stored in RAID0 managed SSD
backed attached storage. Depending on the VM I am using, I usually
have 256GB local SSD unused.

I wonder if PG could leverage this local SSD as a read (page/block)
cache, to complement/extend  the DRAM by used by shared_buffer today.

Thanks.



Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Tom Lane
Andres Freund  writes:
> On February 11, 2018 2:48:13 PM PST, Tom Lane  wrote:
>> (Any such patch should manage
>> to turn COPY-CSV into an extension, at least so far as copy.c is
>> concerned, even if we don't package it as one.)

> Yea, I was thinking we should move all three (default, csv, binary)
> supported formats to using such a facility.

Hm, yeah, probably.  That opens up a different dimension of variation:
is the raw data text or binary?  I'm thinking we shouldn't make the format
handlers reinvent e.g. encoding conversion if they don't have to, so it's
likely that some of the infrastructure for text data would remain in core
copy.c.

regards, tom lane



RE: Migrating Postgresql from Linux x86

2018-02-11 Thread Clive Anglin


Thanks for the information

Clive A.

-Original Message-
From: Carlos Martinez [mailto:cama...@gmail.com]
Sent: Friday, February 9, 2018 10:53 PM
To: pgsql-gene...@postgresql.org
Cc: Clive Anglin 
Subject: Re: Migrating Postgresql from Linux x86

Hi.

We do that migration from Postgres 9.0 (x86_64, Centos 6) to Postgres
9.6 (power, SUSE Enterprise 12).

The best way to us was to do a backup (pg_dump -Fc ...) and restore using
several cores (pg_restore -j N ..., where N equals to the cores/jobs to be
used). How many jobs launch to restore depends on:
the number of cores available and the I/O throughput. Make some test to get
the right number of jobs.

Our restore task took around four hours. We had a downtime of seven hours.

Other options are:

*  You can try a logical replication (maybe pg_logical).

*  Backup/restore without stop and then, make a downtime and synchronize
the changed data between tables.

We preferred the failsafe option and made a full downtime and we done the
job from Saturday to Sunday with a holiday Monday (It is 24x7x365 business,
and get a downtime is somewhat difficult).

Best regards.

Carlos M.

On Fri, Feb 9, 2018 at 7:44 PM, Clive Anglin  wrote:
>
> Hi
>
> Anyone have migrated PostgreSQL running on Linux x86_64 to running on
Linux on Power platform.  Interested in a checklist of things to be done
and if there is a proper document with the steps to completing would also
be of interest.
>
>
>
> Thanks
>
>
>
> Clive A.
>
>
>
>



Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Andres Freund


On February 11, 2018 2:48:13 PM PST, Tom Lane  wrote:
> (Any such patch should manage
>to turn COPY-CSV into an extension, at least so far as copy.c is
>concerned, even if we don't package it as one.)

Yea, I was thinking we should move all three (default, csv, binary) supported 
formats to using such a facility. I guess we'd just install the respective 
format handlers by default in the respective catalogs, rather than doing so in 
an extension. The handler function should easily be able to return pointers to 
functions in the main binary, not just additional shlibs.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Tom Lane
Andres Freund  writes:
> So, I think making COPY extensible would be quite beneficial. I'm
> however quite doubtful that we want to add core code to handle all of
> the above. I think we should make the COPY input/output formatting
> extensible by extensions.

+1.  I can't see carrying code for these formats in-core, but I've
no objection to making it possible for someone else to maintain them.

> I imagine we'd have callbacks for
> - start copy in / out
> - output row, with a an array of values/nulls
> - parse row, with a input buffer as argument, returning values / nulls arrays
> - finish copy in / out

Also something to allow absorbing format-specific options, if the
precedent of CSV is anything to go by.  (Any such patch should manage
to turn COPY-CSV into an extension, at least so far as copy.c is
concerned, even if we don't package it as one.)

regards, tom lane



Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Nicolas Paris
Le 11 févr. 2018 à 22:19, Adrian Klaver écrivait :
> On 02/11/2018 12:57 PM, Nicolas Paris wrote:
> > Le 11 févr. 2018 à 21:53, Andres Freund écrivait :
> > > On 2018-02-11 21:41:26 +0100, Nicolas Paris wrote:
> > > > I have also the storage and network transfers overhead in mind:
> > > > All those new formats are compressed; this is not true for current
> > > > postgres BINARY format and obviously text based format. By experience,
> > > > the binary format is 10 to 30% larger than the text one. On the
> > > > contrary, an ORC file can be up to 10 times smaller than a text base
> > > > format.
> > > 
> > > That seems largely irrelevant when arguing about using PROGRAM though,
> > > right?
> > > 
> > 
> > Indeed those storage and network transfers are only considered versus
> > CSV/BINARY format. No link with PROGRAM aspect.
> > 
> 
> Just wondering what your time frame is on this? Asking because this would be
> considered a new feature and so would need to be added to a major release of
> Postgres. Currently work is going on for Postgres version 11 to be
> released(just a guess) late Fall 2018/early Winter 2019. The
> CommitFest(https://commitfest.postgresql.org/) for this release is currently
> approximately 3/4 of the way through. Not sure that new code could make it
> in at this point. This means it would be bumped to version 12 for 2019/2020.
> 

Right now, exporting (bilions rows * hundred columns) from postgres to
distributed tools such spark is feasible while beeing based on parsing,
transfers, tooling and workaround overhead.

Waiting until 2020 to get the oportunity to write COPY extensions would
mean using this feature around 2022. I mean, writing the ORC COPY
extension, extending the postgres JDBC driver, extending the spark jdbc
connector, all from different communities: this will be a long process.

But again, posgres would be the most advanced RDBMS because AFAIK not
any DB deal with those distributed format for the moment. Having in mind
that such feature will be released one day, make think the place of
postgres in a datawarehouse architecture accordingly.



Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Adrian Klaver

On 02/11/2018 12:57 PM, Nicolas Paris wrote:

Le 11 févr. 2018 à 21:53, Andres Freund écrivait :

On 2018-02-11 21:41:26 +0100, Nicolas Paris wrote:

I have also the storage and network transfers overhead in mind:
All those new formats are compressed; this is not true for current
postgres BINARY format and obviously text based format. By experience,
the binary format is 10 to 30% larger than the text one. On the
contrary, an ORC file can be up to 10 times smaller than a text base
format.


That seems largely irrelevant when arguing about using PROGRAM though,
right?



Indeed those storage and network transfers are only considered versus
CSV/BINARY format. No link with PROGRAM aspect.



Just wondering what your time frame is on this? Asking because this 
would be considered a new feature and so would need to be added to a 
major release of Postgres. Currently work is going on for Postgres 
version 11 to be released(just a guess) late Fall 2018/early Winter 
2019. The CommitFest(https://commitfest.postgresql.org/) for this 
release is currently approximately 3/4 of the way through. Not sure that 
new code could make it in at this point. This means it would be bumped 
to version 12 for 2019/2020.



--
Adrian Klaver
adrian.kla...@aklaver.com



Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Andres Freund
On 2018-02-10 16:13:04 +0100, Nicolas Paris wrote:
> Hello
> 
> I d'found useful to be able to import/export from postgres to those modern 
> data
> formats:
> - avro (c writer=https://avro.apache.org/docs/1.8.2/api/c/index.html)
> - parquet (c++ writer=https://github.com/apache/parquet-cpp)
> - orc (all writers=https://github.com/apache/orc)
> 
> Something like :
> COPY table TO STDOUT ORC;
> 
> Would be lovely.
> 
> This would greatly enhance how postgres integrates in big-data ecosystem.
> 
> Any thought ?

So, I think making COPY extensible would be quite beneficial. I'm
however quite doubtful that we want to add core code to handle all of
the above. I think we should make the COPY input/output formatting
extensible by extensions.  I think we'd have something like
CREATE COPY HANDLER name HANDLER somefunction;
somefunction would have to be a function that returns type
pg_copy_handler, which in turn basically is a struct of function
pointers.
I imagine we'd have callbacks for
- start copy in / out
- output row, with a an array of values/nulls
- parse row, with a input buffer as argument, returning values / nulls arrays
- finish copy in / out

we'd also need to expose a few more utility functions from copy.c
externally.

I think this'd require a good bit of cleanup in copy.c...

Greetings,

Andres Freund



Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Nicolas Paris
Le 11 févr. 2018 à 21:53, Andres Freund écrivait :
> On 2018-02-11 21:41:26 +0100, Nicolas Paris wrote:
> > I have also the storage and network transfers overhead in mind:
> > All those new formats are compressed; this is not true for current
> > postgres BINARY format and obviously text based format. By experience,
> > the binary format is 10 to 30% larger than the text one. On the
> > contrary, an ORC file can be up to 10 times smaller than a text base
> > format.
> 
> That seems largely irrelevant when arguing about using PROGRAM though,
> right?
> 

Indeed those storage and network transfers are only considered versus
CSV/BINARY format. No link with PROGRAM aspect.



Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Andres Freund
On 2018-02-11 21:41:26 +0100, Nicolas Paris wrote:
> I have also the storage and network transfers overhead in mind:
> All those new formats are compressed; this is not true for current
> postgres BINARY format and obviously text based format. By experience,
> the binary format is 10 to 30% larger than the text one. On the
> contrary, an ORC file can be up to 10 times smaller than a text base
> format.

That seems largely irrelevant when arguing about using PROGRAM though,
right?

Greetings,

Andres Freund



Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Nicolas Paris
Le 11 févr. 2018 à 21:03, Andres Freund écrivait :
> 
> 
> On February 11, 2018 12:00:12 PM PST, Nicolas Paris  
> wrote:
> >> > That is true, but the question is how significant the overhead is.
> >If
> >> > it's 50% then reducing it would make perfect sense. If it's 1% then
> >no
> >> > one if going to be bothered by it.
> >> 
> >> I think it's pretty clear that it's going to be way way much more
> >than
> >> 1%. 
> >
> >Good news but not sure to anderstand why.
> 
> I think you might have misunderstood my reply? I'm saying that going through 
> PROGRAM will have significant overhead. I can't quite make sense of the rest 
> of your reply otherwise?

True, I misunderstood. Then I agree the computation overhead should be
non-negligible.

I have also the storage and network transfers overhead in mind:
All those new formats are compressed; this is not true for current
postgres BINARY format and obviously text based format. By experience,
the binary format is 10 to 30% larger than the text one. On the
contrary, an ORC file can be up to 10 times smaller than a text base
format.



Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Andres Freund


On February 11, 2018 12:00:12 PM PST, Nicolas Paris  wrote:
>> > That is true, but the question is how significant the overhead is.
>If
>> > it's 50% then reducing it would make perfect sense. If it's 1% then
>no
>> > one if going to be bothered by it.
>> 
>> I think it's pretty clear that it's going to be way way much more
>than
>> 1%. 
>
>Good news but not sure to anderstand why.

I think you might have misunderstood my reply? I'm saying that going through 
PROGRAM will have significant overhead. I can't quite make sense of the rest of 
your reply otherwise?

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Nicolas Paris
> > That is true, but the question is how significant the overhead is. If
> > it's 50% then reducing it would make perfect sense. If it's 1% then no
> > one if going to be bothered by it.
> 
> I think it's pretty clear that it's going to be way way much more than
> 1%. 

Good news but not sure to anderstand why.

> It's trivial to construct cases where input parsing / output
> formatting takes the majority of the time. 

Binary -> ORC
^
|
   PROGRAM parsing/output formating on the fly

> And a lot of that you're going to be able to avoid with binary formats.

Still the above diagram shows both parsing/formating step, isn't it ?






Re: New Copy Formats - avro/orc/parquet

2018-02-11 Thread Andres Freund
Hi,

On 2018-02-10 18:21:37 +0100, Tomas Vondra wrote:
> That is true, but the question is how significant the overhead is. If
> it's 50% then reducing it would make perfect sense. If it's 1% then no
> one if going to be bothered by it.

I think it's pretty clear that it's going to be way way much more than
1%. It's trivial to construct cases where input parsing / output
formatting takes the majority of the time. And a lot of that you're
going to be able to avoid with binary formats.

Greetings,

Andres Freund



Re: execute block like Firebird does

2018-02-11 Thread Thiemo Kellner



On 02/11/18 06:57, PegoraroF10 wrote:

I know we could create a function but we have several hundred of these
blocks running, so ... it would be a huge work to do.
So, there is a way to run a dynamic sql which returns a set of records ?


But don't you need to touch the code of those blocks anyway? Could you 
write automating code to prepend the function header?


--
+49 (0)1578-772 37 37
+41 (0)78 947 36 21
SIP/iptel.org: thiemo.kellner
Öffentlicher PGP-Schlüssel:
http://pgp.mit.edu/pks/lookup?op=get&search=0xCA167FB0E717AFFC
<>

Re: execute block like Firebird does

2018-02-11 Thread Fabrízio de Royes Mello
Em dom, 11 de fev de 2018 às 13:54, Steven Lembark 
escreveu:

>
> > > you can use a DO - block:
> > >
> > >
> > > https://www.postgresql.org/docs/current/static/sql-do.html
> > >
> >
> > But DO blocks returns "void", I mean you can't return values/records
> > from this statement.
>
> Insert the necessary records into a temporary table, process them,
> post-process them into variables or destination table?
>
>
Can be a solution, but it can lead to a catalog bloat.

Regards,


-- 
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Re: execute block like Firebird does

2018-02-11 Thread Steven Lembark

> > you can use a DO - block:
> >
> >
> > https://www.postgresql.org/docs/current/static/sql-do.html
> >  
> 
> But DO blocks returns "void", I mean you can't return values/records
> from this statement.

Insert the necessary records into a temporary table, process them,
post-process them into variables or destination table?

-- 
Steven Lembark   1505 National Ave
Workhorse Computing Rockford, IL 61103
lemb...@wrkhors.com+1 888 359 3508



Re: execute block like Firebird does

2018-02-11 Thread Pavel Stehule
2018-02-11 14:50 GMT+01:00 PegoraroF10 :

> but DO doesn´t return values, or it does ?
>
> execute block returns(ID Integer, Name varchar(50), LastInvoice Date, ...)
> as
> begin
>   for select ID, Name from Customers where ... into ID, Name do begin
> select bla, bla, bla from functionX(ID) into ...;
> if ... then
>   bla = X
> else
>   bla = Y;
> if bla = XXX then
>   suspend; -- here we return a record and as we are inside a loop we
> will return several records;
>   end
> end
>
> As you can see, this entire block can be called from client dynamically,
> their result fields are defined when it runs and we can return a set of
> records. So, DO doen´t work this way, does it ?
>

You can use temporary function in PostgreSQL. DO command has not result.
Theoretically, you can use a cursor with transaction scope. It can be
filled in DO command and outer can be read by FETCH command.

Regards

Pavel


>
>
>
> --
> Sent from: http://www.postgresql-archive.org/PostgreSQL-general-
> f1843780.html
>
>


Re: execute block like Firebird does

2018-02-11 Thread PegoraroF10
but DO doesn´t return values, or it does ?

execute block returns(ID Integer, Name varchar(50), LastInvoice Date, ...) 
as 
begin 
  for select ID, Name from Customers where ... into ID, Name do begin
select bla, bla, bla from functionX(ID) into ...; 
if ... then
  bla = X
else 
  bla = Y;
if bla = XXX then
  suspend; -- here we return a record and as we are inside a loop we
will return several records;
  end
end

As you can see, this entire block can be called from client dynamically,
their result fields are defined when it runs and we can return a set of
records. So, DO doen´t work this way, does it ?



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html



Re: execute block like Firebird does

2018-02-11 Thread Fabrízio de Royes Mello
Em dom, 11 de fev de 2018 às 06:47, Andreas Kretschmer <
andr...@a-kretschmer.de> escreveu:

>
>
> Am 11.02.2018 um 06:57 schrieb PegoraroF10:
> > We are migrating our databases from Firebird to PostGres. A useful
> feature
> > Firebird has is Execute Block.
> > What it does is just return a record set from that dynamic SQL, just
> like a
> > PostGres function, but without creating it.
> > It sound like ...
> > execute block returns(ID Integer, Name varchar(50), LastInvoice Date,
> ...)
> > as
> > begin
> >select bla, bla, bla into ...;
> >select bla, bla into ...;
> >suspend;
> > end
> > I know we could create a function but we have several hundred of these
> > blocks running, so ... it would be a huge work to do.
> > So, there is a way to run a dynamic sql which returns a set of records ?
>
> you can use a DO - block:
>
>
> https://www.postgresql.org/docs/current/static/sql-do.html
>

But DO blocks returns "void", I mean you can't return values/records from
this statement.

Regards,
-- 
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Re: execute block like Firebird does

2018-02-11 Thread Andreas Kretschmer



Am 11.02.2018 um 06:57 schrieb PegoraroF10:

We are migrating our databases from Firebird to PostGres. A useful feature
Firebird has is Execute Block.
What it does is just return a record set from that dynamic SQL, just like a
PostGres function, but without creating it.
It sound like ...
execute block returns(ID Integer, Name varchar(50), LastInvoice Date, ...)
as
begin
   select bla, bla, bla into ...;
   select bla, bla into ...;
   suspend;
end
I know we could create a function but we have several hundred of these
blocks running, so ... it would be a huge work to do.
So, there is a way to run a dynamic sql which returns a set of records ?


you can use a DO - block:


https://www.postgresql.org/docs/current/static/sql-do.html


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com