Re: Loading 500m json files to database

pinker Mon, 23 Mar 2020 17:19:08 -0700

Christopher Browne-3 wrote
> Well, you're paying for a lot of overhead in that, as you're
> establishing a psql command, connecting to a database, spawning a backend
> process, starting a transactions, committing a transaction, closing the
> backend
> process, disconnecting from the database, and cleaning up after the
> launching
> of the psql command.  And you're doing that 500 million times.
> 
> The one thing I left off that was the loading of a single tuple into
> json_parts.
> 
> What you could do to improve things quite a lot would be to group some
> number
> of those files together, so that each time you pay for the overhead, you
> at
> least
> get the benefit of loading several entries into json_parts.
> 
> So, loosely, I'd commend using /bin/cat (or similar) to assemble several
> files together
> into one, and then \copy that one file in.
> 
> Having 2 tuples loaded at once drops overhead by 50%
> Having 10 tuples loaded at once drops overhead by 90%
> Having 100 tuples loaded at once drops overhead by 99%
> Having 1000 tuples loaded at once drops overhead by 99.9%
> 
> There probably isn't too much real value to going past 1000 tuples per
> batch; the
> overhead, by that point, is getting pretty immaterial.
> 
> Reducing that overhead is the single most important thing you can do.


Yes, I was thinking about that but no idea now how to do it right now. like
some kind of outer loop to concatenate those files? and adding delimiter
between them?


Christopher Browne-3 wrote
> It is also quite likely that you could run such streams in parallel,
> although
> it would require quite a bit more information about the I/O capabilities
> of
> your
> hardware to know if that would do any good.

I can spin up every size of instance.





--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html

Re: Loading 500m json files to database

Reply via email to