Thanks, Merlin - lots of good information here, and I had not yet stumbled 
across pg-large-object - I will look into it.

Eric

-----Original Message-----
From: Merlin Moncure [mailto:mmonc...@gmail.com] 
Sent: Thursday, May 18, 2017 9:49 AM
To: Eric Hill <eric.h...@jmp.com>
Cc: Thomas Kellerer <spam_ea...@gmx.net>; PostgreSQL General 
<pgsql-general@postgresql.org>
Subject: Re: storing large files in database - performance

EXTERNAL

On Thu, May 18, 2017 at 7:34 AM, Eric Hill <eric.h...@jmp.com> wrote:
> I would be thrilled to get 76 MB per second, and it is comforting to know 
> that we have that as a rough upper bound on performance.  I've got work to do 
> to figure out how to approach that upper bound from Node.js.
>
> In the meantime, I've been looking at performance on the read side.  For 
> that, I can bypass all my Node.js layers and just run a query from pgAdmin 4. 
>  I ran this query, where indexFile.contents for the row in question is 25MB 
> in size.  The query itself took 4 seconds in pgAdmin 4.  Better than the 12 
> seconds I'm getting in Node.js, but still on the order of 6MB per second, not 
> 76.  Do you suppose pgAdmin 4 and I are doing similarly inefficient things in 
> querying bytea values?

Probably.  I haven't spent a lot of time with pgadmin 4 so I'm not entirely 
sure.  If you want a quick and dirty comparison, try using running your query 
in psql unaligned mode for a comaprison point.  You can also do \copy BINARY in 
the case of byte transfers.

The basic problem is not really the database, it's that database interaction 
APIs tend not to be directed to this kind of problem.
The big picture issues are:

*) Driver overhead marshaling from wire format to managed types

*) Driver overhead for memory management

*) Wire format issues.  Certain types are *much* faster with the binary wire 
format and are additionally much more memory efficient.
Your bytea transfers are probably being serialized to text and back in both 
directions which is very wasteful, especially for very large transfers since 
it's wasteful in terms of memory.

If I were to seriously look at node.js performance, my rough thinking is that 
I'd want to be setting up the javascript variables directly in C somehow using 
plv8 internal routines.  Short of that, I would probably be querying all data 
out of postgres in json rather than serializing individual fields (which is 
what I generally do in practice).

Another point, some googling turned up
https://www.npmjs.com/package/pg-large-object which is definitely something to 
consider trying.

merlin

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to