[web2py] Re: CSV Import performance improvement idea

kenji4569 Tue, 14 Feb 2012 17:08:48 -0800

@ Niphlod: The situation for me is that non-programmer operators
upload 10,000-100,000 product records mainly for inventory updates via
an admin interface. It's not so huge but large enoght to exceed a
server resposne time limit. The upload is not a bulk copy operation,
but insert/update operations for each record which has a pkey. I think
the upload keeps consistency when it aborted halfway. I am concerned
that the single commit would cause high memory usage which should be
avoided for live environment.


Regards,
Kenji

On 2月15日, 午前8:23, Niphlod <[email protected]> wrote:
> you need to make differences between methods .....
>
> web2py stand in the middle, not all databases have facilities like the COPY
> command of postgresql.
>
> Also, there are methods to export/import from csv at the database level and
> at the table level. If you need to restore a table from the admin app, you
> have to take into account the time to phisically transfer the csv file.
> web2py methods also can use uuid fields to do "upserts" , deals with
> "reference", "list:*" types, etc.
>
> If you have huge tables, or machines not performant enough, you shouldn't
> use web2py methods and code something tailored to your needs....I don't
> have issues for normal tables with 500000 records (file uploaded to remote,
> import from local file directly from the code, using DAL) .
> Of course, if your table contains several columns or large blobs, YMMV.
>
> PS @ Kenji: normally if you restore a table or import something, you want
> to know what records were committed and what records weren't, if you're not
> using pkeys or uuids for every single record.... committing 100 records at
> a time is normally considered a bad behaviour: if your file contain a bad
> record, you can't assure the consistency of the table....
> The time taken to import 8000 records with a commit every 100 is more than
> importing 8000 records and doing a single final commit, and you'll retain
> consistency, e.g. if something falls apart, you have a functioning table
> instead of a broken one.
>
> Again, @ all, YMMV: if your csv's are millions of records and you don't
> mind database consistency (e.g., all records in the tables are then
> "polished" or filtered in a smart way), you can commit every n records and
> store the last committed line in some place else, so if your import
> crashes, you know where to start (be aware, writing performant checks and
> logics to import "incrementally" something to dbs and make it work in every
> possible condition is a quite daunting task).
> My point is: don't be shy using transactions, commits and rollbacks, they
> are simpler to use, very performant and maybe the most wonderful features
> in a relational database!

[web2py] Re: CSV Import performance improvement idea

Reply via email to