On Mon, Jun 25, 2018 at 3:44 AM, Tim Cross <theophil...@gmail.com> wrote:

>
> Anto Aravinth <anto.aravinth....@gmail.com> writes:
>
> > Thanks for the response. I'm not sure, how long does this tool takes for
> > the 70GB data.
> >
> > I used node to stream the xml files into inserts.. which was very slow..
> > Actually the xml contains 40 million records, out of which 10Million took
> > around 2 hrs using nodejs. Hence, I thought will use COPY command, as
> > suggested on the internet.
> >
> > Definitely, will try the code and let you know.. But looks like it uses
> the
> > same INSERT, not copy.. interesting if it runs quick on my machine.
> >
> > On Sun, Jun 24, 2018 at 9:23 PM, Adrien Nayrat <
> adrien.nay...@anayrat.info>
> > wrote:
> >
> >> On 06/24/2018 05:25 PM, Anto Aravinth wrote:
> >> > Hello Everyone,
> >> >
> >> > I have downloaded the Stackoverflow posts xml (contains all SO
> questions
> >> till
> >> > date).. the file is around 70GB.. I wanna import the data in those xml
> >> to my
> >> > table.. is there a way to do so in postgres?
> >> >
> >> >
> >> > Thanks,
> >> > Anto.
> >>
> >> Hello Anto,
> >>
> >> I used this tool :
> >> https://github.com/Networks-Learning/stackexchange-dump-to-postgres
> >>
>
> If you are using nodejs, then you can easily use the pg-copy-streams
> module to insert the records into your database. I've been using this
> for inserting large numbers of records from NetCDF files. Takes between
> 40 to 50 minutes to insert 60 Million+ records and we are doing
> additional calculations on the values, not just inserting them,
> plus we are inserting into a database over the network and into a database
> which is
> also performing other processing.
>
> We found a significant speed improvement with COPY over blocks of insert
> transactions, which was faster than just individual inserts. The only
> downside with using COPY is that it either completely works or
> completely fails and when it fails, it can be tricky to work out which
> record is causing the failure. A benefit of using blocks of transactions
> is that you have more fine grained control, allowing you to recover from
> some errors or providing more specific detail regarding the cause of the
> error.
>

Sure, let me try that.. I have a question here, COPY usually works when you
move data from files to your postgres instance, right? Now in node.js,
processing the whole file, can I use COPY
programmatically like COPY Stackoverflow <calculated value at run time>?
Because from doc:

https://www.postgresql.org/docs/9.2/static/sql-copy.html

I don't see its possible. May be I need to convert the files to copy
understandable first?

Anto.

>
> Be wary of what indexes your defining on your table. Depending on the
> type and number, these can have significant impact on insert times as
> well.
>
>
> --
> Tim Cross
>

Reply via email to