Re: [HACKERS] QSoC proposal: Rewrite pg_dump and pg_restore

2014-03-21 Thread Marcin Mańk
On Fri, Mar 21, 2014 at 4:09 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Craig Ringer cr...@2ndquadrant.com writes:
  Here's how I think it needs to look:
  [ move all the functionality to the backend ]

 Of course, after you've done all that work, you've got something that is
 of exactly zero use to its supposed principal use-case, pg_dump.  pg_dump
 will still have to support server versions that predate all these fancy
 new dump functions, and that pretty much ensures that most of pg_dump's
 core functionality will still be on the client side.  Or, if you try to
 finesse that problem by making sure the new server APIs correspond to
 easily-identified pieces of pg_dump code, you'll probably end up with APIs
 that nobody else wants to use :-(.


Or you should mandate that new server versions should be able to consume
_old_ pg_dump version output. This would change the recommended when
upgrading, dump using the new pg_dump to when upgrading, dump using the
old pg_dump.

This would be necessary policy going forward anyway, if most of the pg_dump
functionality was server-side, because it would be generating dumps in the
server-version dump format, not the client-version format.

'Regards
Marcin Mańk
(goes back to lurker cave...)


Re: [HACKERS] QSoC proposal: Rewrite pg_dump and pg_restore

2014-03-21 Thread Robert Haas
On Thu, Mar 20, 2014 at 11:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
 Here's how I think it needs to look:
 [ move all the functionality to the backend ]

 Of course, after you've done all that work, you've got something that is
 of exactly zero use to its supposed principal use-case, pg_dump.  pg_dump
 will still have to support server versions that predate all these fancy
 new dump functions, and that pretty much ensures that most of pg_dump's
 core functionality will still be on the client side.  Or, if you try to
 finesse that problem by making sure the new server APIs correspond to
 easily-identified pieces of pg_dump code, you'll probably end up with APIs
 that nobody else wants to use :-(.

It's worse than that.  If you put all the logic in the server, then a
dump taken on an older version won't be able to quote keywords added
in the newer version.  Go directly to fail.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] QSoC proposal: Rewrite pg_dump and pg_restore

2014-03-21 Thread Andrew Dunstan


On 03/21/2014 09:38 AM, Robert Haas wrote:

On Thu, Mar 20, 2014 at 11:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:

Craig Ringer cr...@2ndquadrant.com writes:

Here's how I think it needs to look:
[ move all the functionality to the backend ]

Of course, after you've done all that work, you've got something that is
of exactly zero use to its supposed principal use-case, pg_dump.  pg_dump
will still have to support server versions that predate all these fancy
new dump functions, and that pretty much ensures that most of pg_dump's
core functionality will still be on the client side.  Or, if you try to
finesse that problem by making sure the new server APIs correspond to
easily-identified pieces of pg_dump code, you'll probably end up with APIs
that nobody else wants to use :-(.

It's worse than that.  If you put all the logic in the server, then a
dump taken on an older version won't be able to quote keywords added
in the newer version.  Go directly to fail.




Yeah. This tantalizing project has been looked at several times and 
found to be a viper's nest.


What would be useful for many purposes, and is a long-standing project 
of mine that I still haven't found time to make progress on, is that the 
server should contain functions to produce the creation SQL for all its 
own objects, free of the locks that pg_dump requires for consistency.


That would be a great SoC project, incidentally. I'd even volunteer to 
mentor that one.



cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] QSoC proposal: Rewrite pg_dump and pg_restore

2014-03-21 Thread Alvaro Herrera
Andrew Dunstan escribió:

 What would be useful for many purposes, and is a long-standing
 project of mine that I still haven't found time to make progress on,
 is that the server should contain functions to produce the creation
 SQL for all its own objects, free of the locks that pg_dump requires
 for consistency.

Maybe you'd like my DDL deparse project, then.  Right now it's only of
use for event triggers during DDL execution, but I don't see any strong
reason it cannot be used to reconstruct object creation commands from
only their identifying OID.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] QSoC proposal: Rewrite pg_dump and pg_restore

2014-03-20 Thread Robert Haas
On Tue, Mar 18, 2014 at 8:41 PM, Alexandr askel...@gmail.com wrote:
 Rewrite (add) pg_dump and pg_restore utilities as libraries (.so, .dll 
 .dylib)

This strikes me as (1) pretty vague and (2) probably too hard for a
summer project.

I mean, getting the existing binaries to build libraries that you can
call with some trivial interface that mimics the existing command-line
functionality of pg_dump might be doable, but that's not all that
interesting.  What people are really going to want is a library with a
sophisticated API that lets you do interesting things
programmatically.  But that's going to be hard.  AFAIK, nobody's even
tried to figure out what that API should look like.  Even if we had
that worked out, a non-trivial task, the pg_dump source code is a
mess, so refactoring it to provide such an API is likely to be a job
and a half.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] QSoC proposal: Rewrite pg_dump and pg_restore

2014-03-20 Thread Craig Ringer
On 03/21/2014 09:28 AM, Robert Haas wrote:
 On Tue, Mar 18, 2014 at 8:41 PM, Alexandr askel...@gmail.com wrote:
 Rewrite (add) pg_dump and pg_restore utilities as libraries (.so, .dll 
 .dylib)
 
 This strikes me as (1) pretty vague and (2) probably too hard for a
 summer project.
 
 I mean, getting the existing binaries to build libraries that you can
 call with some trivial interface that mimics the existing command-line
 functionality of pg_dump might be doable, but that's not all that
 interesting.  What people are really going to want is a library with a
 sophisticated API that lets you do interesting things
 programmatically.  But that's going to be hard.  AFAIK, nobody's even
 tried to figure out what that API should look like.  Even if we had
 that worked out, a non-trivial task, the pg_dump source code is a
 mess, so refactoring it to provide such an API is likely to be a job
 and a half.

... and still wouldn't solve one of the most frequently requested things
for pg_dump / pg_restore, which is the ability to use them *server-side*
over a regular PostgreSQL connection. It'd be useful progress toward
that, though.

Right now, we can't even get the PostgreSQL server to emit DDL for a
table, let alone do anything more sophisticated.

Here's how I think it needs to look:

- Design a useful API for pg_dump and pg_restore that is practical to
  use for pg_dump and pg_restore's current tasks (fast database
  dump/restore) and also useful for extracting specific objects
  from the database. When designing, consider that we'll want to
  expose this API or functions that use it over SQL later.

- Create a new libpqdump library.

- Implement the designed API in the new library, moving and
  adjusting code from pg_dump / pg_restore where possible, writing
  new code where not.

- Refactor (closer to rewrite) pg_dump and pg_restore to use libpqdump,
  removing as much knowledge of the system catalogs etc as possible from
  them.

- Make sure the result still performs OK

THEN, once that's settled in:

- Modify libpqdump to support compilation as a backend extension, with
  use of the SPI for queries and use of syscaches or direct scans
  where possible.

- Write a pg_dump extension that uses libpqdump in SPI mode
  to expose its API over SQL, or at least uses it to provide SQL
  functions to describe database objects. So you can dump a DB,
  or a subset of it, over SQL.

After all, a libpgdump won't do much good for the large proportion of
PostgreSQL users who use Java/JDBC, who can't use a native library
(without hideous hacks with JNI). For the very large group who use libpq
via language-specific client interfaces like the Pg gem for Ruby,
psycopg2 for Python, DBD::Pg for Perl, etc, it'll require a lot of work
to wrap the API and maintain it. Wheras a server-side SQL-callable
interface would be useful and immediately usable for all of them.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] QSoC proposal: Rewrite pg_dump and pg_restore

2014-03-20 Thread Tom Lane
Craig Ringer cr...@2ndquadrant.com writes:
 Here's how I think it needs to look:
 [ move all the functionality to the backend ]

Of course, after you've done all that work, you've got something that is
of exactly zero use to its supposed principal use-case, pg_dump.  pg_dump
will still have to support server versions that predate all these fancy
new dump functions, and that pretty much ensures that most of pg_dump's
core functionality will still be on the client side.  Or, if you try to
finesse that problem by making sure the new server APIs correspond to
easily-identified pieces of pg_dump code, you'll probably end up with APIs
that nobody else wants to use :-(.

In any case, I quite agree with the sentiment that this is not a suitable
problem for a GSOC project.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] QSoC proposal: Rewrite pg_dump and pg_restore

2014-03-20 Thread Craig Ringer
On 03/21/2014 11:09 AM, Tom Lane wrote:
 Craig Ringer cr...@2ndquadrant.com writes:
 Here's how I think it needs to look:
 [ move all the functionality to the backend ]
 
 Of course, after you've done all that work, you've got something that is
 of exactly zero use to its supposed principal use-case, pg_dump.  pg_dump
 will still have to support server versions that predate all these fancy
 new dump functions, and that pretty much ensures that most of pg_dump's
 core functionality will still be on the client side.  Or, if you try to
 finesse that problem by making sure the new server APIs correspond to
 easily-identified pieces of pg_dump code, you'll probably end up with APIs
 that nobody else wants to use :-(.

Yeah, that's why it's necessary to create a libpqdump that's usable
client-side even if you want server-side dump support.

So it's allow the functionality to be used from the backend as well,
not just move all the functionality to the backend.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] QSoC proposal: Rewrite pg_dump and pg_restore

2014-03-18 Thread Alexandr

Hello!
Here is the text of my proposal which I've applied to GSoC.
(and link 
https://docs.google.com/document/d/1s-Q4rzEysPxo-dINsk_eKFJOBoVjNYDrQ-Oh75gtYEM/edit?usp=sharing)

Any suggestions and comments are welcome.
*

PostgreSQL GSoC 2014 proposal



Project name

Rewrite (add) pg_dump and pg_restore utilities as libraries (.so, .dll  
.dylib)




Short description

pg_dump is a utility for backing up a PostgreSQL database. It makes 
consistent backups even if the database is being used concurrently. 
pg_restore is a utility for restoring a PostgreSQL database from an 
archive created by pg_dump in one of the non-plain-text formats. I think 
it will be more comforatable to use these tools as libraries.



Name: Alexander Shvidchenko

E-mail: askel...@gmail.com mailto:askel...@gmail.com

Location: Rostov-on-Don, Russia (UTC +04.00)


Benefits to the PostgreSQL Community

This feature will expand opportunities to work with backups of 
databases. Especially, it’s important for third-party developers. It 
will be easier to use the functionality of the tools in applications 
when they will be libraries.



Quantifiable results

Static and dynamic libraries with the functionality of pg_dump and 
pg_restore tools (.so, .dll and .dylib files)



Project Schedule

until May 31

Make code review and solve architecture questions with help of community

1 June – 30 June

Detailed implementation of libraries.

1 July – 31 July

Finish Implementation of libraires and begin testing.

1 August -15 August

Final refactoring, testing and commit.


Academic experience

I entered the university in 2013. Before entering the university I 
finished the college in 2012. My graduate work in the college was the 
client-server application. It was a realization of XMPP. The client was 
realized in Qt. The client worked with SQLite database and the server 
worked with MySQL database.



Why is PostgreSQL?

- I’m intereted in this idea and believe this project would be useful 
for the community;


- PostgreSQL is a very respected community. I would be proud to be a 
part of it;


- PostgreSQL is one of the best DBMS and I would like to make it better.


Links

1) PostgreSQL 9.3.3 Documentation, pg_dump

http://www.postgresql.org/docs/9.3/static/app-pgdump.html

2) PostgreSQL 9.3.3 Documentation, pg_restore

http://www.postgresql.org/docs/9.3/static/app-pgrestore.html

3) Static library

http://en.wikipedia.org/wiki/Static_library

4) Dynamic library

http://en.wikipedia.org/wiki/Dynamic-link_library

*

With best wishes,
Alexander S.