Re: [HACKERS] Tips/advice for implementing integrated RESTful HTTP API

2014-09-14 Thread Björn Harrtell
FYI, got an initial implementation of
http://wiki.postgresql.org/wiki/HTTP_API done in Java (intended to run as a
servlet) at https://github.com/bjornharrtell/jdbc-http-server. Feedback is
welcome :)

Regards,

Björn

2014-09-03 1:19 GMT+02:00 Álvaro Hernández Tortosa a...@nosys.es:


 On 02/09/14 04:47, Dobes Vandermeer wrote:




 Same idea as PgBouncer or PgPool. The advantage over hacking
 PgBouncer/PgPool for the job is that Tomcat can already do a lot of what
 you want using built-in, pre-existing functionality. Connection pool
 management, low level REST-style HTTP processing, JSON handling etc are
 all done for you.


  Yeah, those are nice conveniences but I still think installing Java and
 getting something to run on startup is a bit more of a hurdle.  Better maek
 life easier up front by having a simple standalone proxy you can compile
 and run with just whatever is already available on a typical AWS ubuntu
 environment.


 If instead of Tomcat you use Jetty, you can embed the whole
 app+Jetty+dependencies in a single executable JAR, which easies deployment
 a lot. Installing a JVM in a Ubuntu environment is just one apt-get and
 even easier if you use CloudFormation for automation. I don't think is a
 bad choice at all... you get most of the functionality you want already
 there, as Craig said, and it's lightweight.

 Hope it helps,

 Álvaro




Re: [HACKERS] Tips/advice for implementing integrated RESTful HTTP API

2014-09-02 Thread Álvaro Hernández Tortosa


On 02/09/14 04:47, Dobes Vandermeer wrote:


Same idea as PgBouncer or PgPool. The advantage over hacking
PgBouncer/PgPool for the job is that Tomcat can already do a lot
of what
you want using built-in, pre-existing functionality. Connection pool
management, low level REST-style HTTP processing, JSON handling
etc are
all done for you.


Yeah, those are nice conveniences but I still think installing Java 
and getting something to run on startup is a bit more of a hurdle. 
 Better maek life easier up front by having a simple standalone proxy 
you can compile and run with just whatever is already available on a 
typical AWS ubuntu environment.


If instead of Tomcat you use Jetty, you can embed the whole 
app+Jetty+dependencies in a single executable JAR, which easies 
deployment a lot. Installing a JVM in a Ubuntu environment is just one 
apt-get and even easier if you use CloudFormation for automation. I 
don't think is a bad choice at all... you get most of the functionality 
you want already there, as Craig said, and it's lightweight.


Hope it helps,

Álvaro



Re: [HACKERS] Tips/advice for implementing integrated RESTful HTTP API

2014-09-01 Thread Dobes Vandermeer
Hmm yes I am learning that the BG worker system isn't as helpful as I had
hoped due to the single database restriction.

As for a writing a frontend this might be the best solution.

A java frontend would be easy but pointless because the whole point here is
to provide a lightweight access method to the database for environments
that don't have the ability to use the jdbc or libpq libraries.  Deploying
a java setup would be too much trouble.

I do see now that PG uses one worker per connection rather than a worker
pool as I had thought before. So there's nothing already in there to help
me dispatch requests and making my own worker pool that distributes
requests using sockets wouldn't be any better than connecting back using
libpq.

A C frontend using libevent would be easy enough to make and deploy for
this I guess.

But... Maybe nobody really wants this thing anyway, there seem to be some
other options out there already.

Thanks for the feedback.
On Aug 31, 2014 8:46 PM, Craig Ringer cr...@2ndquadrant.com wrote:

 On 08/31/2014 12:40 PM, Dobes Vandermeer wrote:
  1. Connecting to multiple databases
 
  The background workers can apparently only connect to a single database
  at a time, but I want to expose all the databases via the API.

 bgworkers are assigned a database at launch time (if SPI is enabled),
 and this database may not change during the worker's lifetime, same as a
 normal backend.

 Sometimes frustrating, but that's how it is.

  I think I could use libpq to connect to PostgreSQL on localhost but this
  might have weird side-effects in terms of authentication, pid use, stuff
  like that.

 If you're going to do that, why use a bgworker at all?

 In general, what do you gain from trying to do this within the database
 server its self, not as an app in front of the DB?

  I could probably manage a pool of dynamic workers (as of 9.4), one per
  user/database combination or something along those lines.  Even one per
  request?  Is there some kind of IPC system in place to help shuttle the
  requests and responses between dynamic workers?  Or do I need to come up
  with my own?

 The dynamic shmem code apparently has some queuing functionality. I
 haven't used it yet.

  It seems like PostgreSQL itself has a way to shuttle requests out to
  workers, is it possible to tap into that system instead?  Basically some
  way to send the requests to a PostgreSQL backend from the background
 worker?

 It does?

 It's not the SPI, that executes work directly within the bgworker,
 making it behave like a normal backend for the purpose of query execution.

  Or perhaps I shouldn't do this as a worker but rather modify PostgreSQL
  itself and do it in a more integrated/destructive manner?

 Or just write a front-end.

 The problem you'd have attempting to modify PostgreSQL its self for this
 is that connection dispatch occurs via the postmaster, which is a
 single-threaded process that already needs to do a bit of work to keep
 an eye on how things are running. You don't want it constantly busy
 processing and dispatching millions of tiny HTTP requests. It can't just
 hand a connection off to a back-end immediately after accepting it,
 either; it'd have to read the HTTP headers to determine what database to
 connect to. Then launch a new backend for the connection, which is
 horribly inefficient when doing tiny short-lived connections. The
 postmaster has no concept of a pool of backends (unfortunately, IMO) to
 re-use.

 I imagine (it's not something I've investigated, really) that you'd want
 a connection accepter process that watched the listening http request
 socket. It'd hand connections off to dispatcher processes that read the
 message content to get the target DB and dispatch the request to a
 worker backend for the appropriate user/db combo, then collect the
 results and return them on the connection. Hopefully at this point
 you're thinking that sounds a lot like a connection pool... because it
 is. An awfully complicated one, probably, as you'd have to manage
 everything using shared memory segments and latches.

 In my view it's unwise to try to do this in the DB with PostgreSQL's
 architecture. Hack PgBouncer or PgPool to do what you want. Or write a
 server with Tomcat/Jetty using JAX-RS and PgJDBC and the built in
 connection pool facilities - you won't *believe* how easy it is.

  3. Parallelism
 
  The regular PostgreSQL server can run many queries in parallel

 Well, one PostgreSQL instance (postmaster) may have many backends, each
 of which may run queries in series but not in parallel. Any given
 process may only run one query at once.

  but it
  seems like if I am using SPI I could only run one query at a time - it's
  not an asynchronous API.

 Correct.

  Any help, sage advice, tips, and suggestions how to move forward in
  these areas would be muchly appreciated!

 Don't do it with bgworkers.

 --
  Craig Ringer   http://www.2ndQuadrant.com/
  PostgreSQL 

Re: [HACKERS] Tips/advice for implementing integrated RESTful HTTP API

2014-09-01 Thread Craig Ringer
On 09/02/2014 12:50 AM, Dobes Vandermeer wrote:
 Hmm yes I am learning that the BG worker system isn't as helpful as I
 had hoped due to the single database restriction.
 
 As for a writing a frontend this might be the best solution.
 
 A java frontend would be easy but pointless because the whole point here
 is to provide a lightweight access method to the database for
 environments that don't have the ability to use the jdbc or libpq
 libraries.  Deploying a java setup would be too much trouble.

If you can't run libpq, you can't run *anything* really, it's very
lightweight. I think you misunderstood what I was saying; I'm talking
about it acting as a proxy for HTTP-based requests, running on or in
front of the PostgreSQL server like a server-side connection pool would.

You can have:

[client machine][PostgreSQL server machine   ]
client --- HTTP --- Tomcat/JAX-RS  -- pgjdbc -- PostgreSQL

and despite what people often say, a .war deployed on jetty or tomcat
isn't exactly heavyweight.

Same idea as PgBouncer or PgPool. The advantage over hacking
PgBouncer/PgPool for the job is that Tomcat can already do a lot of what
you want using built-in, pre-existing functionality. Connection pool
management, low level REST-style HTTP processing, JSON handling etc are
all done for you.

 I do see now that PG uses one worker per connection rather than a worker
 pool as I had thought before. So there's nothing already in there to
 help me dispatch requests and making my own worker pool that distributes
 requests using sockets wouldn't be any better than connecting back using
 libpq.

Yep.

 A C frontend using libevent would be easy enough to make and deploy for
 this I guess.
 
 But... Maybe nobody really wants this thing anyway, there seem to be
 some other options out there already.

It's something I think would be interesting to have, but IMO to be
really useful it'd need to support composing object graphs as json, a
json query format, etc. So you can say get me this customer with all
their addresses and contact records without having to issue a bunch of
queries (round trips) or use ORM-style left-join-and-deduplicate hacks
that waste bandwidth and are messy and annoying.

Just sending chunks of SQL and getting back arrays of tuples isn't very
interesting or, IMO, hugely useful for the kind of apps that want to
avoid JDBC/libpq. Especially webapps.

Close care to security and auth would also need to be taken. You don't
want to be sending a username/password with each request; you need a
reasonable authentication token system, request signing to prevent
replay attacks, idempotent requests, etc.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tips/advice for implementing integrated RESTful HTTP API

2014-09-01 Thread Dobes Vandermeer
On Mon, Sep 1, 2014 at 7:00 PM, Craig Ringer cr...@2ndquadrant.com wrote:

 On 09/02/2014 12:50 AM, Dobes Vandermeer wrote:
  Hmm yes I am learning that the BG worker system isn't as helpful as I
  had hoped due to the single database restriction.
 
  As for a writing a frontend this might be the best solution.
 
  A java frontend would be easy but pointless because the whole point here
  is to provide a lightweight access method to the database for
  environments that don't have the ability to use the jdbc or libpq
  libraries.  Deploying a java setup would be too much trouble.

 If you can't run libpq, you can't run *anything* really, it's very
 lightweight. I think you misunderstood what I was saying; I'm talking
 about it acting as a proxy for HTTP-based requests, running on or in
 front of the PostgreSQL server like a server-side connection pool would.


I was just referring to an environment that doesn't have a binding to libpq
or JSBC, for example node.js for a long time had no postgresql client so I
didn't use PostgreSQL when I used node.js.


 Same idea as PgBouncer or PgPool. The advantage over hacking
 PgBouncer/PgPool for the job is that Tomcat can already do a lot of what
 you want using built-in, pre-existing functionality. Connection pool
 management, low level REST-style HTTP processing, JSON handling etc are
 all done for you.


Yeah, those are nice conveniences but I still think installing Java and
getting something to run on startup is a bit more of a hurdle.  Better maek
life easier up front by having a simple standalone proxy you can compile
and run with just whatever is already available on a typical AWS ubuntu
environment.


  A C frontend using libevent would be easy enough to make and deploy for
  this I guess.
 
  But... Maybe nobody really wants this thing anyway, there seem to be
  some other options out there already.

 It's something I think would be interesting to have, but IMO to be
 really useful it'd need to support composing object graphs as json, a
 json query format, etc. So you can say get me this customer with all
 their addresses and contact records without having to issue a bunch of
 queries (round trips) or use ORM-style left-join-and-deduplicate hacks
 that waste bandwidth and are messy and annoying.


If the SQL outputs rows with ARRAY and JSON type columns in them then that
may be sufficient to construct whatever kind of JSON structure you want for
the query result.  I'm not sure why ORMs don't take better advantage of
this; maybe they're just too cross-DB or maybe this feature isn't as
powerful as I think it is?

PostgreSQL also allows you to query and index fields inside of a json
value, so at least initially you can get all this power without inventing
any new query language.

But later a translator could be made, like an ORM-ish thingy, that might
have less clutter than the SQL one because some shorthand could be used for
peeking inside the JSON structures.


 Close care to security and auth would also need to be taken. You don't
 want to be sending a username/password with each request; you need a
 reasonable authentication token system, request signing to prevent
 replay attacks, idempotent requests, etc.


Well, these would be needed for use cases where the DB is exposed to
untrusted parties, which has never been the case on projects I've worked
on.  I wouldn't be against these sorts of improvements if people want to
make them, but wouldn't matter much to me.  I was hoping to re-use postgres
built-in password/ident security system.

Cheers,

Dobes


Re: [HACKERS] Tips/advice for implementing integrated RESTful HTTP API

2014-08-31 Thread Peter Eisentraut
On 8/31/14 12:40 AM, Dobes Vandermeer wrote:
 The background workers can apparently only connect to a single database
 at a time, but I want to expose all the databases via the API. 

I think the term background worker should be taken as a hint that
foreground protocol endpoint was not one of the envisioned use cases.
 I think this sort of thing should be done as a separate process in
front of the postmaster.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tips/advice for implementing integrated RESTful HTTP API

2014-08-31 Thread Fabrízio de Royes Mello
Em domingo, 31 de agosto de 2014, Peter Eisentraut pete...@gmx.net
escreveu:

 On 8/31/14 12:40 AM, Dobes Vandermeer wrote:
  The background workers can apparently only connect to a single database
  at a time, but I want to expose all the databases via the API.

 I think the term background worker should be taken as a hint that
 foreground protocol endpoint was not one of the envisioned use cases.
  I think this sort of thing should be done as a separate process in
 front of the postmaster.


+1

Do you know the PGRest project?

Look at http://pgre.st

Regards,

Fabrízio


-- 
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


Re: [HACKERS] Tips/advice for implementing integrated RESTful HTTP API

2014-08-31 Thread Craig Ringer
On 08/31/2014 12:40 PM, Dobes Vandermeer wrote:
 1. Connecting to multiple databases
 
 The background workers can apparently only connect to a single database
 at a time, but I want to expose all the databases via the API. 

bgworkers are assigned a database at launch time (if SPI is enabled),
and this database may not change during the worker's lifetime, same as a
normal backend.

Sometimes frustrating, but that's how it is.

 I think I could use libpq to connect to PostgreSQL on localhost but this
 might have weird side-effects in terms of authentication, pid use, stuff
 like that.

If you're going to do that, why use a bgworker at all?

In general, what do you gain from trying to do this within the database
server its self, not as an app in front of the DB?

 I could probably manage a pool of dynamic workers (as of 9.4), one per
 user/database combination or something along those lines.  Even one per
 request?  Is there some kind of IPC system in place to help shuttle the
 requests and responses between dynamic workers?  Or do I need to come up
 with my own?

The dynamic shmem code apparently has some queuing functionality. I
haven't used it yet.

 It seems like PostgreSQL itself has a way to shuttle requests out to
 workers, is it possible to tap into that system instead?  Basically some
 way to send the requests to a PostgreSQL backend from the background worker?

It does?

It's not the SPI, that executes work directly within the bgworker,
making it behave like a normal backend for the purpose of query execution.

 Or perhaps I shouldn't do this as a worker but rather modify PostgreSQL
 itself and do it in a more integrated/destructive manner?

Or just write a front-end.

The problem you'd have attempting to modify PostgreSQL its self for this
is that connection dispatch occurs via the postmaster, which is a
single-threaded process that already needs to do a bit of work to keep
an eye on how things are running. You don't want it constantly busy
processing and dispatching millions of tiny HTTP requests. It can't just
hand a connection off to a back-end immediately after accepting it,
either; it'd have to read the HTTP headers to determine what database to
connect to. Then launch a new backend for the connection, which is
horribly inefficient when doing tiny short-lived connections. The
postmaster has no concept of a pool of backends (unfortunately, IMO) to
re-use.

I imagine (it's not something I've investigated, really) that you'd want
a connection accepter process that watched the listening http request
socket. It'd hand connections off to dispatcher processes that read the
message content to get the target DB and dispatch the request to a
worker backend for the appropriate user/db combo, then collect the
results and return them on the connection. Hopefully at this point
you're thinking that sounds a lot like a connection pool... because it
is. An awfully complicated one, probably, as you'd have to manage
everything using shared memory segments and latches.

In my view it's unwise to try to do this in the DB with PostgreSQL's
architecture. Hack PgBouncer or PgPool to do what you want. Or write a
server with Tomcat/Jetty using JAX-RS and PgJDBC and the built in
connection pool facilities - you won't *believe* how easy it is.

 3. Parallelism
 
 The regular PostgreSQL server can run many queries in parallel

Well, one PostgreSQL instance (postmaster) may have many backends, each
of which may run queries in series but not in parallel. Any given
process may only run one query at once.

 but it
 seems like if I am using SPI I could only run one query at a time - it's
 not an asynchronous API.

Correct.

 Any help, sage advice, tips, and suggestions how to move forward in
 these areas would be muchly appreciated!

Don't do it with bgworkers.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Tips/advice for implementing integrated RESTful HTTP API

2014-08-30 Thread Dobes Vandermeer
A while back I was working on a little proposal to create a RESTful HTTP
front-end for PostgreSQL and recently I had the inspiration to work on
this.  So I successfully created a background worker for PostgreSQL 9.3
that can use the SPI to list off the databases in a JSON response.

WIP on github: https://github.com/dobesv/restgres/

Now I'm getting into murkier waters and I'm wonder if I can get some
helpful tips to guide my RD here.

1. Connecting to multiple databases

The background workers can apparently only connect to a single database at
a time, but I want to expose all the databases via the API.

I think I could use libpq to connect to PostgreSQL on localhost but this
might have weird side-effects in terms of authentication, pid use, stuff
like that.

I could probably manage a pool of dynamic workers (as of 9.4), one per
user/database combination or something along those lines.  Even one per
request?  Is there some kind of IPC system in place to help shuttle the
requests and responses between dynamic workers?  Or do I need to come up
with my own?

It seems like PostgreSQL itself has a way to shuttle requests out to
workers, is it possible to tap into that system instead?  Basically some
way to send the requests to a PostgreSQL backend from the background worker?

Or perhaps I shouldn't do this as a worker but rather modify PostgreSQL
itself and do it in a more integrated/destructive manner?

2. Authentication

I was trying to use a function md5_crypt_verify to authenticate the user
using their password, and I believe I am providing the right password but
it's not being accepted.

Any tips on authenticating users in a background worker?   Where should I
be looking for examples?

3. Parallelism

The regular PostgreSQL server can run many queries in parallel, but it
seems like if I am using SPI I could only run one query at a time - it's
not an asynchronous API.

This seems related to the multiple databases issue - either I could use
libpq to translate/forward requests onto PostgreSQL's own worker system or
setup my own little worker pool to run the requests in parallel and have a
way to send the request/response data to/from those workers.



Any help, sage advice, tips, and suggestions how to move forward in these
areas would be muchly appreciated!

Regards,

Dobes