Beware of multiple threads and/or processes writing to the file at the
same time. It may make sense to use a mutex.
I agree with the earlier recommendations to use mysql load data in
file. It really is a lot faster.
You can completely remove Python from the picture when importing the
file. Here's an example:
mysqlimport \
--user=user \
--password=password \
--columns=id,name \
--compress \
--fields-optionally-enclosed-by='"' \
--fields-terminated-by='\t' \
--lines-terminated-by='\n' \
--local \
--lock-tables \
--replace \
--verbose \
mydatabase mylocalfile.tsv
To write that tsv file, I used the csv module with:
DEFAULT_KARGS = dict(dialect='excel-tab', lineterminator='\n')
def create_default_reader(iterable):
"""Return a csv.reader with our default options."""
return csv.reader(iterable, **DEFAULT_KARGS)
def create_default_writer(iterable):
"""Return a csv.writer with our default options."""
return csv.writer(iterable, **DEFAULT_KARGS)
There's one more thing to keep in mind. If you do a load data from
file using Python, make sure you pass local_infile=True to MySQLdb's
connect function.
Happy Hacking!
-jj
On Wed, Jun 11, 2008 at 6:15 PM, Pete Wright <[EMAIL PROTECTED]> wrote:
>
> That's an awesome tip - thanks so much guys for pointing me in a far
> less painful direction. My only concern now is speed in working with
> the file and minimizing wait states. For example, I presumably don't
> want anything writing to the text file while I'm loading it into the
> database, and similarly I don't want to take time on every single
> request to check the size of the text file to see if I should run a
> LOAD DATA on it.
>
> So, I think what I'm going to end up doing here is converting time
> into seconds since midnight and then mod 60 it to come up with a file
> name suffix. I'll also have another process running separately
> processing and deleting unecessary files. Do you guys see any problems
> with that method?
>
>
> On Jun 11, 6:58 pm, Jonathan Vanasco <[EMAIL PROTECTED]> wrote:
>> write to a .txt log in some sort of standardized format , when it hits
>> 10k lines run a batch query
>>
>> as mentioned above, mysql has its own insert format; postgres does
>> too. both insert formats are FAR faster than going through individual
>> inserts.
>>
>> when i still used mysql about 4 years ago, i remember benching an
>> insert of geographic data. it was something like 6 hours via inserts,
>> 4hours via grouped inserts, and 25 seconds using the native import
>> format.
> >
>
--
I, for one, welcome our new Facebook overlords!
http://jjinux.blogspot.com/
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups
"pylons-discuss" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at
http://groups.google.com/group/pylons-discuss?hl=en
-~----------~----~----~----~------~----~------~--~---