On Mon, Jun 25, 2018 at 3:44 AM, Tim Cross <theophil...@gmail.com> wrote:
> > Anto Aravinth <anto.aravinth....@gmail.com> writes: > > > Thanks for the response. I'm not sure, how long does this tool takes for > > the 70GB data. > > > > I used node to stream the xml files into inserts.. which was very slow.. > > Actually the xml contains 40 million records, out of which 10Million took > > around 2 hrs using nodejs. Hence, I thought will use COPY command, as > > suggested on the internet. > > > > Definitely, will try the code and let you know.. But looks like it uses > the > > same INSERT, not copy.. interesting if it runs quick on my machine. > > > > On Sun, Jun 24, 2018 at 9:23 PM, Adrien Nayrat < > adrien.nay...@anayrat.info> > > wrote: > > > >> On 06/24/2018 05:25 PM, Anto Aravinth wrote: > >> > Hello Everyone, > >> > > >> > I have downloaded the Stackoverflow posts xml (contains all SO > questions > >> till > >> > date).. the file is around 70GB.. I wanna import the data in those xml > >> to my > >> > table.. is there a way to do so in postgres? > >> > > >> > > >> > Thanks, > >> > Anto. > >> > >> Hello Anto, > >> > >> I used this tool : > >> https://github.com/Networks-Learning/stackexchange-dump-to-postgres > >> > > If you are using nodejs, then you can easily use the pg-copy-streams > module to insert the records into your database. I've been using this > for inserting large numbers of records from NetCDF files. Takes between > 40 to 50 minutes to insert 60 Million+ records and we are doing > additional calculations on the values, not just inserting them, > plus we are inserting into a database over the network and into a database > which is > also performing other processing. > > We found a significant speed improvement with COPY over blocks of insert > transactions, which was faster than just individual inserts. The only > downside with using COPY is that it either completely works or > completely fails and when it fails, it can be tricky to work out which > record is causing the failure. A benefit of using blocks of transactions > is that you have more fine grained control, allowing you to recover from > some errors or providing more specific detail regarding the cause of the > error. > Sure, let me try that.. I have a question here, COPY usually works when you move data from files to your postgres instance, right? Now in node.js, processing the whole file, can I use COPY programmatically like COPY Stackoverflow <calculated value at run time>? Because from doc: https://www.postgresql.org/docs/9.2/static/sql-copy.html I don't see its possible. May be I need to convert the files to copy understandable first? Anto. > > Be wary of what indexes your defining on your table. Depending on the > type and number, these can have significant impact on insert times as > well. > > > -- > Tim Cross >