Re: Bulk inserts

rcs_comp Wed, 11 Jun 2008 18:59:32 -0700

One other option might be to have each request dump its data to a
random file with a .tmp extension to a queue directory.  When the
process is done, it can rename the file (which should be atomic) to
a .csv file.  Then the controller exits.


You then have a second "worker" process checking the queue directory
for .csv files and either a) process them into a larger holding file,
which would then be bulk imported or b) just send them straight to the
DB using the bulk insert command.  You would just have to test to see
which method was faster for you.  Also, I am pretty sure MySQL's bulk
import option allows you to specify more than one file to be imported,
which might be a third option.  Once the import is complete, the .csv
files that have been imported get deleted.  Since you have one worker
process that isn't multi-threaded then a lot of the angst about file
integrity should be alleviated.

On Jun 11, 9:15 pm, Pete Wright <[EMAIL PROTECTED]> wrote:
> That's an awesome tip - thanks so much guys for pointing me in a far
> less painful direction. My only concern now is speed in working with
> the file and minimizing wait states. For example, I presumably don't
> want anything writing to the text file while I'm loading it into the
> database, and similarly I don't want to take time on every single
> request to check the size of the text file to see if I should run a
> LOAD DATA on it.
>
> So, I think what I'm going to end up doing here is converting time
> into seconds since midnight and then mod 60 it to come up with a file
> name suffix. I'll also have another process running separately
> processing and deleting unecessary files. Do you guys see any problems
> with that method?
>
> On Jun 11, 6:58 pm, Jonathan Vanasco <[EMAIL PROTECTED]> wrote:
>
> > write to a .txt log in some sort of standardized format , when it hits
> > 10k lines run a batch query
>
> > as mentioned above, mysql has its own insert format; postgres does
> > too.  both insert formats are FAR faster than going through individual
> > inserts.
>
> > when i still used mysql about 4 years ago, i remember benching an
> > insert of geographic data.  it was something like 6 hours via inserts,
> > 4hours via grouped inserts, and 25 seconds using the native import
> > format.
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"pylons-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Bulk inserts

Reply via email to