Re: How to safely maintain a status file

2012-07-14 Thread Christian Heimes
Am 13.07.2012 03:52, schrieb Steven D'Aprano:
 And some storage devices (e.g. hard drives, USB sticks) don't actually 
 write data permanently even when you sync the device. They just write to 
 a temporary cache, then report that they are done (liar liar pants on 
 fire). Only when the cache is full, or at some random time at the 
 device's choosing, do they actually write data to the physical media. 
 
 The result of this is that even when the device tells you that the data 
 is synched, it may not be.

Yes, that's another issue. Either you have to buy expensive enterprise
hardware with UPS batteries or you need to compensate for failures on
software level (e.g. Hadoop cluster).

We have big storage devices with double redundant controllers, on board
buffer batteries, triple redundant power supplies, special RAID disks,
multipath IO fiber channel links and external backup solution to keep
our data reasonable safe.

Christian


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-13 Thread Chris Angelico
On Fri, Jul 13, 2012 at 2:26 PM,  rantingrickjohn...@gmail.com wrote:
 On Thursday, July 12, 2012 10:13:47 PM UTC-5, Steven D#39;Aprano wrote:
 Rick has obviously never tried to open a file for reading when somebody
 else has it opened, also for reading, and discovered that despite Windows
 being allegedly a multi-user operating system, you can#39;t actually have
 multiple users read the same files at the same time.

 You misread my response. My comment was direct result of Christian stating:

 (paraphrase) On some systems you are not permitted to delete a file whilst 
 the file is open 

 ...which seems to be consistent to me. Why would *anybody* want to delete a 
 file whilst the file is open?

POSIX doesn't let you delete files. It lets you dispose of filenames.
Python does the same with its 'del'. The object (file) exists until
the system decides otherwise.

Here's a simpler example: Hardlinks. Suppose you have two names
pointing to the same file; are you allowed to unlink one of them while
you have the other open?

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-13 Thread Steven D'Aprano
On Thu, 12 Jul 2012 21:26:20 -0700, rantingrickjohnson wrote:

 On Thursday, July 12, 2012 10:13:47 PM UTC-5, Steven D#39;Aprano wrote:
 Rick has obviously never tried to open a file for reading when somebody
 else has it opened, also for reading, and discovered that despite
 Windows being allegedly a multi-user operating system, you can#39;t
 actually have multiple users read the same files at the same time.
 
 You misread my response. My comment was direct result of Christian
 stating:
 
 (paraphrase) On some systems you are not permitted to delete a file
 whilst the file is open 
 
 ...which seems to be consistent to me. Why would *anybody* want to
 delete a file whilst the file is open? 

Because it is useful and a sensible thing to do.

Why should one misbehaved application, keeping a file open, be allowed to 
hold every other application, and the file system, hostage?

This is one of the many poor decisions which makes Windows so vulnerable 
to viruses and malware. If malware can arrange to keep itself open, you 
can't delete it. Thanks guys!


 Bringing back the car analogy
 again: Would you consider jumping from a moving vehicle a consistent
 interaction with the interface of a vehicle? Of course not. The
 interface for a vehicle is simple and consistent:
 
  1. You enter the vehicle at location A 
  2. The vehicle transports you to location B 
  3. You exit the vehicle

Amusingly, you neglected to specify the vehicle stops -- and rightly 
so, because of course having to stop the vehicle is not a *necessary* 
condition for exiting it, as tens of thousands of stunt men and women can 
attest.

Not to mention people parachuting out of an airplane, pirates or 
commandos boarding a moving ship, pedestrians transferring from a slow 
moving walkway to a faster moving walkway, farmers jumping off a trailer 
while it is still being towed behind a tractor (and jumping back on 
again), and Bruce Willis in Red in very possibly the best slow-motion 
action sequence in the history of Hollywood.

http://www.youtube.com/watch?v=xonMpj2YyDU


 At no time during the trip would anyone expect you to leap from the
 vehicle. 

Expected or not, you can do so.


 But when you delete open files, you are essentially leaping
 from the moving vehicle! This behavior goes against all expectations of
 consistency in an API -- and against all sanity when riding in a
 vehicle!

Fortunately, files on a file system are not cars, and deleting open files 
is a perfectly reasonable thing to do, no more frightening than in Python 
deleting a reference to an object using the del statement. Imagine how 
stupid it would be if this happened:


py x = 42
py y = x
py del y
Traceback (most recent call last):
  File stdin, line 1, in module
DeleteError: cannot delete reference to object '42' until no other 
references to it exist


Fortunately, Python doesn't do that -- it tracks when the object is no 
longer being accessed, and only then physically reclaims the memory used. 
And so it is on POSIX file systems: the file system keeps track of when 
the file on disk is no longer being accessed, and only then physically 
reclaims the blocks being used. Until then, deleting the file merely 
unlinks the file name from the blocks on disk, in the same way that 
del y merely unlinks the name y from the object 42.


 Opening files for exclusive read *by default* is a pointless and silly
 limitation. It#39;s also unsafe: if a process opens a file for
 exclusive read, and then dies, *no other process* can close that file.
 
 Oh come on. Are you actually going to use errors or unintended
 consequences, or even Acts of God to defend your argument? 

Features have to be judged by their actual consequences, not some 
unrealistic sense of theoretical purity. The actual consequences of 
mandatory exclusive file locking is, *it sucks*.

Windows users are used to having to reboot their server every few days 
because something is broken, so they might not mind rebooting it because 
some file is locked in a mandatory open state and not even the operating 
system can unlock it. But for those with proper operating systems who 
expect months of uninterrupted service, mandatory locking is a problem to 
be avoided, not a feature.


 Okay.
 Okay. I suppose IF the car spontaneously combusted THEN the
 passengers would be wise to jump out, leaving the vehicle to the whims
 of inertia.

In this analogy, is the car the file name, the inode, or the directory? 
Are the passengers the file name(s), or the file contents, or the inode? 
Is the driver meant to be the file system? If I have a hard link to the 
file, does that mean the passengers are in two cars at once, or two lots 
of passengers in the same car?


 One neat trick is to open a file, then delete it from disk while it is
 still open. So long as your process is still running, you can write to
 this ghost file, as normal, but no other process can (easily) see it.
 And when your process ends, the file contents is 

RE: How to safely maintain a status file

2012-07-13 Thread Prasad, Ramit
 Well neat tricks aside, I am of the firm belief that deleting files should
 never be possible whilst they are open.

This is one of the few instances I think Windows does something better 
than OS X. Windows will check before you attempt to delete (i.e. move
to Recycling Bin) while OS X will move a file to Trash quite happily
only tell me it cannot remove the file when I try to empty the Trash.

Ramit


Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology
712 Main Street | Houston, TX 77002
work phone: 713 - 216 - 5423

--


This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [Python] RE: How to safely maintain a status file

2012-07-13 Thread Chris Gonnerman

On 07/13/2012 11:00 AM, Prasad, Ramit wrote:

Well neat tricks aside, I am of the firm belief that deleting files should
never be possible whilst they are open.

This is one of the few instances I think Windows does something better
than OS X. Windows will check before you attempt to delete (i.e. move
to Recycling Bin) while OS X will move a file to Trash quite happily
only tell me it cannot remove the file when I try to empty the Trash.
While I was trained in the Unix way, and believe it is entirely 
appropriate to delete an open file.  Even if I my program is the opener. 
 It's just too handy to have temp files that disappear on their own.


As opposed to periodically going to %TEMP% and deleting them manually.  Gah.

-- Chris.
--
http://mail.python.org/mailman/listinfo/python-list


RE: [Python] RE: How to safely maintain a status file

2012-07-13 Thread Prasad, Ramit
  Well neat tricks aside, I am of the firm belief that deleting files
 should
  never be possible whilst they are open.
  This is one of the few instances I think Windows does something better
  than OS X. Windows will check before you attempt to delete (i.e. move
  to Recycling Bin) while OS X will move a file to Trash quite happily
  only tell me it cannot remove the file when I try to empty the Trash.
 While I was trained in the Unix way, and believe it is entirely
 appropriate to delete an open file.  Even if I my program is the opener.
   It's just too handy to have temp files that disappear on their own.
 
 As opposed to periodically going to %TEMP% and deleting them manually.  Gah.

In my experience things that are too handy are usually breaking
what I consider right. That being said, I am not entirely sure
what I think is right in this circumstance. I suppose it depends
on if I am the person deleting or the person who is looking at
a file that is being deleted. Or the user who just wants the stupid
computer to just Work.

I lean slightly towards the POSIX handling with the addition that 
any additional write should throw an error. You are now saving to 
a file that will not exist the moment you close it and that is probably 
not expected.





Ramit
This email is confidential and subject to important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.  
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [Python] RE: How to safely maintain a status file

2012-07-13 Thread Chris Angelico
On Sat, Jul 14, 2012 at 3:59 AM, Prasad, Ramit
ramit.pra...@jpmorgan.com wrote:
 I lean slightly towards the POSIX handling with the addition that
 any additional write should throw an error. You are now saving to
 a file that will not exist the moment you close it and that is probably
 not expected.

There are several different possible right behaviors here, but they
depend more on the application than anything else. With a log file,
for instance, the act of deleting it is more a matter of truncating it
(dispose of the old history), so the right thing to do is to start a
fresh file. Solution: Close the file and re-open it periodically. But
I don't know of an efficient way to do that with Windows semantics.
Renaming/moving an open file in order to perform log rotation isn't
all that easy.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [Python] RE: How to safely maintain a status file

2012-07-13 Thread Hans Mulder
On 13/07/12 19:59:59, Prasad, Ramit wrote:

 I lean slightly towards the POSIX handling with the addition that 
 any additional write should throw an error. You are now saving to 
 a file that will not exist the moment you close it and that is
 probably not expected.

I'd say: it depends.

If the amount of data your script needs to process does not fit
in RAM, then you may want to write some of it to a temporary file.
On a Posix system, it's entirely normal to unlink() a temp file
first thing after you've created it.  The expectation is that the
file will continue to exists, and be writeable, until you close it.

In fact, there's a function in the standard library named
tempfile.TemporaryFile that does exactly that: create a file
and unlink it immediately.  This function would be useless
if you couldn't write to your temporary file.

Hope this helps,

-- HansM


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [Python] RE: How to safely maintain a status file

2012-07-13 Thread MRAB

On 13/07/2012 19:28, Hans Mulder wrote:

On 13/07/12 19:59:59, Prasad, Ramit wrote:


I lean slightly towards the POSIX handling with the addition that
any additional write should throw an error. You are now saving to
a file that will not exist the moment you close it and that is
probably not expected.



Strictly speaking, the file does exist, it's just that there are no
names referring to it. When any handles to it are also closed, the file
_can_ truly be deleted.

As has been said before, in the *nix world, unlink _doesn't_ delete
a file, it deletes a name.


I'd say: it depends.

If the amount of data your script needs to process does not fit
in RAM, then you may want to write some of it to a temporary file.
On a Posix system, it's entirely normal to unlink() a temp file
first thing after you've created it.  The expectation is that the
file will continue to exists, and be writeable, until you close it.

In fact, there's a function in the standard library named
tempfile.TemporaryFile that does exactly that: create a file
and unlink it immediately.  This function would be useless
if you couldn't write to your temporary file.


It's possible to create a temporary file even in Windows.
--
http://mail.python.org/mailman/listinfo/python-list


RE: How to safely maintain a status file

2012-07-13 Thread Chris Gonnerman

On 07/13/2012 12:59 PM, Prasad, Ramit wrote:
I lean slightly towards the POSIX handling with the addition that any 
additional write should throw an error. You are now saving to a file 
that will not exist the moment you close it and that is probably not 
expected. Ramit
But if I created, then deleted it while holding an open file descriptor, 
it is entirely likely that I intend to write to it. I'll admit, these 
days there are those in the Unix/Linux community that consider using an 
anonymous file a bad idea; I'm just not one of them.


-- Chris.


--
http://mail.python.org/mailman/listinfo/python-list


Re: [Python] RE: How to safely maintain a status file

2012-07-13 Thread Christian Heimes
Am 13.07.2012 21:57, schrieb MRAB:
 It's possible to create a temporary file even in Windows.

Windows has a open() flag named O_TEMPORARY for temporary files. With
O_TEMPORARY the file is removed from disk as soon as the file handle is
closed. On POSIX OS it's common practice to unlink temporary files
immediately after the open() call.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-13 Thread Steven D'Aprano
On Fri, 13 Jul 2012 15:15:13 -0500, Chris Gonnerman wrote:

 On 07/13/2012 12:59 PM, Prasad, Ramit wrote:
 I lean slightly towards the POSIX handling with the addition that any
 additional write should throw an error. You are now saving to a file
 that will not exist the moment you close it and that is probably not
 expected. Ramit
 But if I created, then deleted it while holding an open file descriptor,
 it is entirely likely that I intend to write to it. I'll admit, these
 days there are those in the Unix/Linux community that consider using an
 anonymous file a bad idea; I'm just not one of them.

A badly-behaved application can write oodles and oodles of data to an 
unlinked file, which has the result of temporarily using up disk space 
that doesn't show up when you do an ls. For an inexperienced system 
administrator, this may appear mysterious.

The solution is to us lsof to identify the unlinked file, which gives you 
the process id of the application, which you can then kill. As soon as 
you do that, the space is freed up again.

Like all powerful tools, unlinked files can be abused. Underpowered tools 
can't be abused, but nor can they be used.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-12 Thread Laszlo Nagy



You are contradicting yourself. Either the OS is providing a fully
atomic rename or it doesn't. All POSIX compatible OS provide an atomic
rename functionality that renames the file atomically or fails without
loosing the target side. On POSIX OS it doesn't matter if the target exists.
This is not a contradiction. Although the rename operation is atomic, 
the whole change status process is not. It is because there are two 
operations: #1 delete old status file and #2. rename the new status 
file. And because there are two operations, there is still a race 
condition. I see no contradiction here.


You don't need locks or any other fancy stuff. You just need to make
sure that you flush the data and metadata correctly to the disk and
force a re-write of the directory inode, too. It's a standard pattern on
POSIX platforms and well documented in e.g. the maildir RFC.
It is not entirely true. We are talking about two processes. One is 
reading a file, another one is writting it. They can run at the same 
time, so flushing disk cache forcedly won't help.


--
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-12 Thread Laszlo Nagy



Renaming files is the wrong way to synchronize a
crawler.  Use a database that has ACID properties, such as
SQLite.  Far fewer I/O operations are required for small updates.
It's not the 1980s any more.
I agree with this approach. However, the OP specifically asked about 
how to update status file.

--
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-12 Thread Christian Heimes
Am 12.07.2012 14:30, schrieb Laszlo Nagy:
 This is not a contradiction. Although the rename operation is atomic,
 the whole change status process is not. It is because there are two
 operations: #1 delete old status file and #2. rename the new status
 file. And because there are two operations, there is still a race
 condition. I see no contradiction here.

Sorry, but you are wrong. It's just one operation that boils down to
point name to a different inode. After the rename op the file name
either points to a different inode or still to the old name in case of
an error. The OS guarantees that all processes either see the first or
second state (in other words: atomic).

POSIX has no operation that actually deletes a file. It just has an
unlink() syscall that removes an associated name from an inode. As soon
as an inode has no names and is not references by a file descriptor, the
file content and inode is removed by the operating system. rename() is
more like a link() followed by an unlink() wrapped in a system wide
global lock.

 It is not entirely true. We are talking about two processes. One is
 reading a file, another one is writting it. They can run at the same
 time, so flushing disk cache forcedly won't help.

You need to flush the data to disk as well as the metadata of the file
and its directory in order to survive a system crash. The close()
syscall already makes sure that all data is flushed into the IO layer of
the operating system.

With POSIX semantics the reading process will either see the full
content before the rename op or the full content after the rename op.
The writing process can replace the name (rename op) while the reading
process reads the status file because its file descriptor still points
to the old status file.

Christian
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-12 Thread Hans Mulder
On 12/07/12 14:30:41, Laszlo Nagy wrote:
 You are contradicting yourself. Either the OS is providing a fully
 atomic rename or it doesn't. All POSIX compatible OS provide an atomic
 rename functionality that renames the file atomically or fails without
 loosing the target side. On POSIX OS it doesn't matter if the target
 exists.

 This is not a contradiction. Although the rename operation is atomic,
 the whole change status process is not. It is because there are two
 operations: #1 delete old status file and #2. rename the new status
 file. And because there are two operations, there is still a race
 condition. I see no contradiction here.

On Posix systems, you can avoid the race condition.  The trick is to
skip step #1.  The rename will implicitly delete the old file, and
it will still be atomic.  The whole process now consists of a single
stop, so the whole process is now atomic.

 You don't need locks or any other fancy stuff. You just need to make
 sure that you flush the data and metadata correctly to the disk and
 force a re-write of the directory inode, too. It's a standard pattern on
 POSIX platforms and well documented in e.g. the maildir RFC.

 It is not entirely true. We are talking about two processes. One is
 reading a file, another one is writting it. They can run at the same
 time, so flushing disk cache forcedly won't help.

On Posix systems, it will work, and be atomic, even if one process is
reading the old status file while another process is writing the new
one.  The old file will be atomically removed from the directory by
the rename operation; it will continue to exists on the hard drive, so
that the reading process can continue reading it.  The old file will
be deleted when the reader closes it.  Or, if the system crashed before
the old file is closed, it will deleted when the system is restarted.

On Windows, things are very different.

Hope this helps,

-- HansM


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-12 Thread Ross Ridge
Laszlo Nagy:
 This is not a contradiction. Although the rename operation is atomic,
 the whole change status process is not. It is because there are two
 operations: #1 delete old status file and #2. rename the new status
 file. And because there are two operations, there is still a race
 condition. I see no contradiction here.

Christian Heimes  li...@cheimes.de wrote:
Sorry, but you are wrong. It's just one operation that boils down to
point name to a different inode.

For some reason you're assuming POSIX semantics, an assumption that
Laszlo Nagy did not make.

Ross Ridge

-- 
 l/  //   Ross Ridge -- The Great HTMU
[oo][oo]  rri...@csclub.uwaterloo.ca
-()-/()/  http://www.csclub.uwaterloo.ca/~rridge/ 
 db  //   
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-12 Thread Laszlo Nagy



Sorry, but you are wrong. It's just one operation that boils down to
point name to a different inode. After the rename op the file name
either points to a different inode or still to the old name in case of
an error. The OS guarantees that all processes either see the first or
second state (in other words: atomic).

POSIX has no operation that actually deletes a file. It just has an
unlink() syscall that removes an associated name from an inode. As soon
as an inode has no names and is not references by a file descriptor, the
file content and inode is removed by the operating system. rename() is
more like a link() followed by an unlink() wrapped in a system wide
global lock.

Then please help me understand this.

Good case:

process #1:  unlink(old status file)
process #1: rename(new status file)
process#2: open(new status file)
process#2: read(new status file)

Bad case:

process #1:  unlink(old status file)
process#2: open(???) -- there is no file on disk here, this system call 
returns with an error!

process #1: rename(new status file)

If it would be possible to rename + unlink in one step, then it would be 
okay. Can you please explain what am I missing?


--
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-12 Thread Laszlo Nagy



This is not a contradiction. Although the rename operation is atomic,
the whole change status process is not. It is because there are two
operations: #1 delete old status file and #2. rename the new status
file. And because there are two operations, there is still a race
condition. I see no contradiction here.

On Posix systems, you can avoid the race condition.  The trick is to
skip step #1.  The rename will implicitly delete the old file, and
it will still be atomic.  The whole process now consists of a single
stop, so the whole process is now atomic.
Well, I didn't know that this is going to work. At least it does not 
work on Windows 7 (which should be POSIX compatible?)


 f = open(test.txt,wb+)
 f.close()
 f2 = open(test2.txt,wb+)
 f2.close()
 import os
 os.rename(test2.txt,test.txt)
Traceback (most recent call last):
  File stdin, line 1, in module
WindowsError: [Error 183] File already exists


I have also tried this on FreeBSD and it worked.

Now, let's go back to the original question:


This works well on Linux but Windows raises an error when status_file already 
exists.


It SEEMS that the op wanted a solution for Windows


--
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-12 Thread Laszlo Nagy



Windows doesn't suppport atomic renames if the right side exists.  I
suggest that you implement two code paths:

if os.name == posix:
 rename = os.rename
else:
 def rename(a, b):
 try:
 os.rename(a, b)
 except OSError, e:
 if e.errno != 183:
 raise
 os.unlink(b)
 os.rename(a, b)


Problem is if the process is stopped between unlink and rename there
would no status file.
Yes, and actually it does not need to be an abnormal termination. It is 
enough if the OS scheduler puts this process on hold for some time...


But using a lock file, the problem can be solved. However in that case, 
reading a status file can be a blocking operation.

--
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-12 Thread Christian Heimes
Am 12.07.2012 19:43, schrieb Laszlo Nagy:
 Well, I didn't know that this is going to work. At least it does not
 work on Windows 7 (which should be POSIX compatible?)

Nope, Windows's file system layer is not POSIX compatible. For example
you can't remove or replace a file while it is opened by a process.
Lot's of small things work slightly differently on Windows or not at all.

Christian

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-12 Thread Rick Johnson
On Jul 12, 2:39 pm, Christian Heimes li...@cheimes.de wrote:
 Windows's file system layer is not POSIX compatible. For example
 you can't remove or replace a file while it is opened by a process.

Sounds like a reasonable fail-safe to me. Not much unlike a car
ignition that will not allow starting the engine if the transmission
is in any *other* gear besides park or neutral, OR a governor (be
it mechanical or electrical) that will not allow the engine RPMs to
exceed a maximum safe limit, OR even, ABS systems which pulse the
brakes to prevent overzealous operators from loosing road-to-tire
traction when decelerating the vehicle.

You could say: Hey, if someone is dumb enough to shoot themselves in
the foot then let them... however, sometimes fail-safes not only save
the dummy from a life of limps, they also prevent catastrophic
collateral damage to rest of us.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-12 Thread Steven D'Aprano
On Thu, 12 Jul 2012 15:05:26 +0200, Christian Heimes wrote:

 You need to flush the data to disk as well as the metadata of the file
 and its directory in order to survive a system crash. The close()
 syscall already makes sure that all data is flushed into the IO layer of
 the operating system.

And some storage devices (e.g. hard drives, USB sticks) don't actually 
write data permanently even when you sync the device. They just write to 
a temporary cache, then report that they are done (liar liar pants on 
fire). Only when the cache is full, or at some random time at the 
device's choosing, do they actually write data to the physical media. 

The result of this is that even when the device tells you that the data 
is synched, it may not be.


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-12 Thread Chris Angelico
On Fri, Jul 13, 2012 at 11:20 AM, Rick Johnson
rantingrickjohn...@gmail.com wrote:
 On Jul 12, 2:39 pm, Christian Heimes li...@cheimes.de wrote:
 Windows's file system layer is not POSIX compatible. For example
 you can't remove or replace a file while it is opened by a process.

 Sounds like a reasonable fail-safe to me.

POSIX says that files and file names are independent. I can open a
file based on its name, delete the file based on its name, and still
have the open file there. When it's closed, it'll be wiped from the
disk.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-12 Thread Steven D'Aprano
On Fri, 13 Jul 2012 12:12:01 +1000, Chris Angelico wrote:

 On Fri, Jul 13, 2012 at 11:20 AM, Rick Johnson
 rantingrickjohn...@gmail.com wrote:
 On Jul 12, 2:39 pm, Christian Heimes li...@cheimes.de wrote:
 Windows's file system layer is not POSIX compatible. For example you
 can't remove or replace a file while it is opened by a process.

 Sounds like a reasonable fail-safe to me.

Rick has obviously never tried to open a file for reading when somebody 
else has it opened, also for reading, and discovered that despite Windows 
being allegedly a multi-user operating system, you can't actually have 
multiple users read the same files at the same time.

(At least not unless the application takes steps to allow it.)

Or tried to back-up files while some application has got them opened. Or 
open a file while an anti-virus scanner is oh-so-slowly scanning it.

Opening files for exclusive read *by default* is a pointless and silly 
limitation. It's also unsafe: if a process opens a file for exclusive 
read, and then dies, *no other process* can close that file.

At least on POSIX systems, not even root can override a mandatory 
exclusive lock (it would be pretty pointless if it could), so a rogue or 
buggy program could wreck havoc with mandatory exclusive file locks. 
That's why Linux, by default, treats exclusive file locks as advisory 
(cooperative), not mandatory.

In general, file locking is harder than it sounds, with many traps for 
the unwary, and of course the semantics are dependent on both the 
operating system and the file system.

https://en.wikipedia.org/wiki/File_locking


 POSIX says that files and file names are independent. I can open a file
 based on its name, delete the file based on its name, and still have the
 open file there. When it's closed, it'll be wiped from the disk.

One neat trick is to open a file, then delete it from disk while it is 
still open. So long as your process is still running, you can write to 
this ghost file, as normal, but no other process can (easily) see it. And 
when your process ends, the file contents is automatically deleted.

This is remarkably similar to what Python does with namespaces and dicts:

# create a fake file system
ns = {'a': [], 'b': [], 'c': []}
# open a file
myfile = ns['a']
# write to it
myfile.append('some data')
# delete it from the file system
del ns['a']
# but I can still read and write to it
myfile.append('more data')
print(myfile[0])
# but anyone else will get an error if they try
another_file = ns['a']


-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-12 Thread Gene Heskett
On Thursday 12 July 2012 23:21:16 Steven D'Aprano did opine:

 On Fri, 13 Jul 2012 12:12:01 +1000, Chris Angelico wrote:
  On Fri, Jul 13, 2012 at 11:20 AM, Rick Johnson
  
  rantingrickjohn...@gmail.com wrote:
  On Jul 12, 2:39 pm, Christian Heimes li...@cheimes.de wrote:
  Windows's file system layer is not POSIX compatible. For example you
  can't remove or replace a file while it is opened by a process.
  
  Sounds like a reasonable fail-safe to me.
 
 Rick has obviously never tried to open a file for reading when somebody
 else has it opened, also for reading, and discovered that despite
 Windows being allegedly a multi-user operating system, you can't
 actually have multiple users read the same files at the same time.
 
Chuckle.  That was one of the 'features' that os9 on the trs-80 color 
computer had back in the 80's, and it was clean and well done because of 
the locking model the random block file manager had in OS9 for 6809 cpu's, 
no relation to the Mac OS9 other than a similar name.  That color computer 
has a separate, text only video card I could plug in and display on an 80 
column amber screen monitor.

When I wanted to impress the visiting frogs, I often did something I have 
never been able to do on any other operating system since, start assembling 
a long assembly language file on one of the screens on the color monitor, 
hit the clear key to advance to the amber screen and start a listing on it 
of the assemblers output listing file.

Because the file locking was applied only to the sector (256 bytes on that 
machine) being written at the instant, the listing would fly by till it 
caught up with the assemblers output, running into the lock and then 
dutifully following along, one sector behind the assemblers output, until 
the assembly was finished.  That was in 1986 folks, and in the year of our 
Lord 2012, 26 years later, I still cannot do that in linux.  When I ask why 
not, the replies seem to think I'm from outer space.  Its apparently a 
concept that is not even attempted to be understood by the linux code 
carvers.

Something is drastically wrong with that picture IMO.

 (At least not unless the application takes steps to allow it.)
 
 Or tried to back-up files while some application has got them opened.

That in fact, ran me out of the amiga business in 1999, a 30Gb drive failed 
on my full blown 040 + 64 megs of dram A2000.  When the warranty drive 
arrived is when I found that due to file locks on the startup files, all of 
them involved with the booting of that machine, my high priced Diavolo Pro 
backup tapes didn't contain a single one of those files.  The linux box 
with Red Hat 5.0 on it that I had built in late 1998 to see what linux was 
all about found space under that desk yet that evening and I never looked 
back.

 Or
 open a file while an anti-virus scanner is oh-so-slowly scanning it.
 
 Opening files for exclusive read *by default* is a pointless and silly
 limitation. It's also unsafe: if a process opens a file for exclusive
 read, and then dies, *no other process* can close that file.
 
 At least on POSIX systems, not even root can override a mandatory
 exclusive lock (it would be pretty pointless if it could), so a rogue or
 buggy program could wreck havoc with mandatory exclusive file locks.
 That's why Linux, by default, treats exclusive file locks as advisory
 (cooperative), not mandatory.
 
 In general, file locking is harder than it sounds, with many traps for
 the unwary, and of course the semantics are dependent on both the
 operating system and the file system.
 
 https://en.wikipedia.org/wiki/File_locking
 
  POSIX says that files and file names are independent. I can open a
  file based on its name, delete the file based on its name, and still
  have the open file there. When it's closed, it'll be wiped from the
  disk.
 
 One neat trick is to open a file, then delete it from disk while it is
 still open. So long as your process is still running, you can write to
 this ghost file, as normal, but no other process can (easily) see it.
 And when your process ends, the file contents is automatically deleted.
 
 This is remarkably similar to what Python does with namespaces and
 dicts:
 
 # create a fake file system
 ns = {'a': [], 'b': [], 'c': []}
 # open a file
 myfile = ns['a']
 # write to it
 myfile.append('some data')
 # delete it from the file system
 del ns['a']
 # but I can still read and write to it
 myfile.append('more data')
 print(myfile[0])
 # but anyone else will get an error if they try
 another_file = ns['a']

Cheers, Gene
-- 
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
My web page: http://coyoteden.dyndns-free.com:85/gene is up!
You just wait, I'll sin till I blow up!
-- Dylan Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-12 Thread Steven D'Aprano
On Thu, 12 Jul 2012 23:49:02 -0400, Gene Heskett wrote:

 When I wanted to impress the visiting frogs, I often did something I
 have never been able to do on any other operating system since, start
 assembling a long assembly language file on one of the screens on the
 color monitor, hit the clear key to advance to the amber screen and
 start a listing on it of the assemblers output listing file.
 
 Because the file locking was applied only to the sector (256 bytes on
 that machine) being written at the instant, the listing would fly by
 till it caught up with the assemblers output, running into the lock and
 then dutifully following along, one sector behind the assemblers output,
 until the assembly was finished.  That was in 1986 folks, and in the
 year of our Lord 2012, 26 years later, I still cannot do that in linux. 

Um, what you are describing sounds functionally equivalent to what 
tail -f does.


 When I ask why not, the replies seem to think I'm from outer space.  Its
 apparently a concept that is not even attempted to be understood by the
 linux code carvers.

You could certainly create a pair of cooperative programs, one which 
keeps a lock on only the last block of the file, and a tail-like reader 
which honours that lock. But why bother? Just have the assembler append 
to the file, and let people use any reader they like, such as tail.

Or have I misunderstood you?



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-12 Thread rantingrickjohnson
On Thursday, July 12, 2012 10:13:47 PM UTC-5, Steven D#39;Aprano wrote:
 Rick has obviously never tried to open a file for reading when somebody 
 else has it opened, also for reading, and discovered that despite Windows 
 being allegedly a multi-user operating system, you can#39;t actually have 
 multiple users read the same files at the same time.

You misread my response. My comment was direct result of Christian stating:

(paraphrase) On some systems you are not permitted to delete a file whilst the 
file is open 

...which seems to be consistent to me. Why would *anybody* want to delete a 
file whilst the file is open? Bringing back the car analogy again: Would you 
consider jumping from a moving vehicle a consistent interaction with the 
interface of a vehicle? Of course not. The interface for a vehicle is simple 
and consistent:

 1. You enter the vehicle at location A
 2. The vehicle transports you to location B
 3. You exit the vehicle

At no time during the trip would anyone expect you to leap from the vehicle. 
But when you delete open files, you are essentially leaping from the moving 
vehicle! This behavior goes against all expectations of consistency in an API 
-- and against all sanity when riding in a vehicle!

 Opening files for exclusive read *by default* is a pointless and silly 
 limitation. It#39;s also unsafe: if a process opens a file for exclusive 
 read, and then dies, *no other process* can close that file.

Oh come on. Are you actually going to use errors or unintended 
consequences, or even Acts of God to defend your argument? Okay. Okay. I 
suppose IF the car spontaneously combusted THEN the passengers would be 
wise to jump out, leaving the vehicle to the whims of inertia.

 One neat trick is to open a file, then delete it from disk while it is 
 still open. So long as your process is still running, you can write to 
 this ghost file, as normal, but no other process can (easily) see it. And 
 when your process ends, the file contents is automatically deleted.

Well neat tricks aside, I am of the firm belief that deleting files should 
never be possible whilst they are open. 

 * Opening files requires that data exist on disk
 * Reading and writing files requires an open file obj
 * Closing files requires an open file object
 * And deleting files requires that the file NOT be open

Would you also entertain the idea of reading or writing files that do not 
exist? (not including pseudo file objs like StringIO of course!).

Summary: Neat tricks and Easter eggs are real hoot, but consistency in APIs is 
the key.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-09 Thread Plumo
  and then on startup read from tmp_file if status_file does not exist.
  But this seems awkward.

         It also violates your requirement -- since the crash could take
 place with a partial temp file.

Can you explain why?
My thinking was if crash took place when writing the temp file this
would not matter because the status file would still exist and be read
from. The temp file would only be renamed when fully written.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-09 Thread Plumo
 Windows doesn't suppport atomic renames if the right side exists.  I
 suggest that you implement two code paths:

 if os.name == posix:
     rename = os.rename
 else:
     def rename(a, b):
         try:
             os.rename(a, b)
         except OSError, e:
             if e.errno != 183:
                 raise
             os.unlink(b)
             os.rename(a, b)


Problem is if the process is stopped between unlink and rename there
would no status file.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-09 Thread Christian Heimes
Am 09.07.2012 07:50, schrieb Plumo:
 Windows doesn't suppport atomic renames if the right side exists.  I
 suggest that you implement two code paths:

 Problem is if the process is stopped between unlink and rename there
 would no status file.

Yeah, you have to suffer all of Windows' design flaws. You could add a
backup status file or use a completely different approach.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-09 Thread Nobody
On Sun, 08 Jul 2012 22:57:56 +0200, Laszlo Nagy wrote:

 Yes, this is much better. Almost perfect. Don't forget to consult your
 system documentation, and check if the rename operation is atomic or not.
 (Most probably it will only be atomic if the original and the renamed file
 are on the same physical partition and/or mount point).

On Unix, rename() is always atomic, and requires that source and
destination are on the same partition (if you want to move a file across
partitions, you have to copy it then delete the original).

 But even if the rename operation is atomic, there is still a race
 condition. Your program can be terminated after the original status file
 has been deleted, and before the temp file was renamed. In this case, you
 will be missing the status file (although your program already did
 something just it could not write out the new status).

In the event of abnormal termination, losing some data is to be expected.
The idea is to only lose the most recent data while keeping the old copy,
rather than losing everything. Writing to a temp file then rename()ing
achieves that.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-09 Thread Duncan Booth
Richard Baron Penman richar...@gmail.com wrote:

 Is there a better way? Or do I need to use a database?

Using a database would seem to meet a lot of your needs. Don't forget that 
Python comes with a sqlite database engine included, so it shouldn't take 
you more than a few lines of code to open the database once and then write 
out your status every few seconds.

import sqlite3

con = sqlite3.connect('status.db')

...
with con:
cur = con.cursor()
cur.execute('UPDATE ...', ...)

and similar code to restore the status or create required tables on 
startup.

-- 
Duncan Booth http://kupuguy.blogspot.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-09 Thread John Nagle

On 7/8/2012 2:52 PM, Christian Heimes wrote:

You are contradicting yourself. Either the OS is providing a fully
atomic rename or it doesn't. All POSIX compatible OS provide an atomic
rename functionality that renames the file atomically or fails without
loosing the target side. On POSIX OS it doesn't matter if the target exists.


Rename on some file system types (particularly NFS) may not be atomic.


You don't need locks or any other fancy stuff. You just need to make
sure that you flush the data and metadata correctly to the disk and
force a re-write of the directory inode, too. It's a standard pattern on
POSIX platforms and well documented in e.g. the maildir RFC.

You can use the same pattern on Windows but it doesn't work as good.


  That's because you're using the wrong approach. See how to use
ReplaceFile under Win32:

http://msdn.microsoft.com/en-us/library/aa365512%28VS.85%29.aspx

Renaming files is the wrong way to synchronize a
crawler.  Use a database that has ACID properties, such as
SQLite.  Far fewer I/O operations are required for small updates.
It's not the 1980s any more.

I use a MySQL database to synchronize multiple processes
which crawl web sites.  The tables of past activity are InnoDB
tables, which support transactions.  The table of what's going
on right now is a MEMORY table.  If the database crashes, the
past activity is recovered cleanly, the MEMORY table comes back
empty, and all the crawler processes lose their database
connections, abort, and are restarted.  This allows multiple
servers to coordinate through one database.

John Nagle




--
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-09 Thread Michael Hrivnak
Please consider batching this data and doing larger writes.  Thrashing
the hard drive is not a good plan for performance or hardware
longevity.  For example, crawl an entire FQDN and then write out the
results in one operation.  If your job fails in the middle and you
have to start that FQDN over, no big deal.  If that's too big of a
chunk for your purposes, perhaps break each FQDN up into top-level
directories and crawl each of those in one operation before writing to
disk.

There are existing solutions for managing job queues, so you can
choose what you like.  If you're unfamiliar, maybe start by looking at
celery.

Michael

On Mon, Jul 9, 2012 at 1:52 AM, Plumo richar...@gmail.com wrote:
 What are you keeping in this status file that needs to be saved
 several times per second?  Depending on what type of state you're
 storing and how persistent it needs to be, there may be a better way
 to store it.

 Michael

 This is for a threaded web crawler. I want to cache what URL's are
 currently in the queue so if terminated the crawler can continue next
 time from the same point.
 --
 http://mail.python.org/mailman/listinfo/python-list
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-09 Thread Dan Stromberg
On Mon, Jul 9, 2012 at 8:24 PM, John Nagle na...@animats.com wrote:

 On 7/8/2012 2:52 PM, Christian Heimes wrote:

 You are contradicting yourself. Either the OS is providing a fully
 atomic rename or it doesn't. All POSIX compatible OS provide an atomic
 rename functionality that renames the file atomically or fails without
 loosing the target side. On POSIX OS it doesn't matter if the target
 exists.


 Rename on some file system types (particularly NFS) may not be atomic.


Actually, ISTR that rename() is one of the few things on NFS that is
atomic.

http://bugs.python.org/issue8828
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-09 Thread Christian Heimes
Am 09.07.2012 22:24, schrieb John Nagle:
 Rename on some file system types (particularly NFS) may not be atomic.

The actual operation is always atomic but the NFS server may not notify
you about success or failure atomically.

See http://linux.die.net/man/2/rename, section BUGS.

   That's because you're using the wrong approach. See how to use
 ReplaceFile under Win32:
 
 http://msdn.microsoft.com/en-us/library/aa365512%28VS.85%29.aspx

The page doesn't say that ReplaceFile is an atomic op.

Christian

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-09 Thread alex23
On Jul 10, 6:24 am, John Nagle na...@animats.com wrote:
 That's because you're using the wrong approach. See how to use
 ReplaceFile under Win32:

 http://msdn.microsoft.com/en-us/library/aa365512%28VS.85%29.aspx

I'm not convinced ReplaceFile is atomic:

The ReplaceFile function combines several steps within a single
function. An application can call ReplaceFile instead of calling
separate functions to save the data to a new file, rename the original
file using a temporary name, rename the new file to have the same name
as the original file, and delete the original file.

About the best you can get in Windows, I think, is MoveFileTransacted,
but you need to be running Vista or later:

http://msdn.microsoft.com/en-us/library/windows/desktop/aa365241(v=vs.85).aspx

I agree with your suggestion of using something transactional that
isn't bound to later Window versions, though.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-08 Thread Christian Heimes
Am 08.07.2012 13:29, schrieb Richard Baron Penman:
 My initial solution was a thread that writes status to a tmp file
 first and then renames:
 
 open(tmp_file, 'w').write(status)
 os.rename(tmp_file, status_file)

You algorithm may not write and flush all data to disk. You need to do
additional work. You must also store the tmpfile on the same partition
(better: same directory) as the status file

with open(tmp_file, w) as f:
f.write(status)
# flush buffer and write data/metadata to disk
f.flush()
os.fsync(f.fileno())

# now rename the file
os.rename(tmp_file, status_file)

# finally flush metadata of directory to disk
dirfd = os.open(os.path.dirname(status_file), os.O_RDONLY)
try:
os.fsync(dirfd)
finally:
os.close(dirfd)


 This works well on Linux but Windows raises an error when status_file
 already exists.
 http://docs.python.org/library/os.html#os.rename

Windows doesn't suppport atomic renames if the right side exists.  I
suggest that you implement two code paths:

if os.name == posix:
rename = os.rename
else:
def rename(a, b):
try:
os.rename(a, b)
except OSError, e:
if e.errno != 183:
raise
os.unlink(b)
os.rename(a, b)

Christian

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-08 Thread Michael Hrivnak
What are you keeping in this status file that needs to be saved
several times per second?  Depending on what type of state you're
storing and how persistent it needs to be, there may be a better way
to store it.

Michael

On Sun, Jul 8, 2012 at 7:53 AM, Christian Heimes li...@cheimes.de wrote:
 Am 08.07.2012 13:29, schrieb Richard Baron Penman:
 My initial solution was a thread that writes status to a tmp file
 first and then renames:

 open(tmp_file, 'w').write(status)
 os.rename(tmp_file, status_file)

 You algorithm may not write and flush all data to disk. You need to do
 additional work. You must also store the tmpfile on the same partition
 (better: same directory) as the status file

 with open(tmp_file, w) as f:
 f.write(status)
 # flush buffer and write data/metadata to disk
 f.flush()
 os.fsync(f.fileno())

 # now rename the file
 os.rename(tmp_file, status_file)

 # finally flush metadata of directory to disk
 dirfd = os.open(os.path.dirname(status_file), os.O_RDONLY)
 try:
 os.fsync(dirfd)
 finally:
 os.close(dirfd)


 This works well on Linux but Windows raises an error when status_file
 already exists.
 http://docs.python.org/library/os.html#os.rename

 Windows doesn't suppport atomic renames if the right side exists.  I
 suggest that you implement two code paths:

 if os.name == posix:
 rename = os.rename
 else:
 def rename(a, b):
 try:
 os.rename(a, b)
 except OSError, e:
 if e.errno != 183:
 raise
 os.unlink(b)
 os.rename(a, b)

 Christian

 --
 http://mail.python.org/mailman/listinfo/python-list
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-08 Thread Laszlo Nagy
On Sun, 8 Jul 2012 21:29:41 +1000, Richard Baron Penman 
richar...@gmail.com declaimed the following in gmane.comp.python.general:

and then on startup read from tmp_file if status_file does not exist.
But this seems awkward.


It also violates your requirement -- since the crash could take
place with a partial temp file.

I'd suggest that, rather than deleting the old status file, you
rename IT -- and only delete it IF you successfully rename the temp
file.
Yes, this is much better. Almost perfect. Don't forget to consult your 
system documentation, and check if the rename operation is atomic or 
not. (Most probably it will only be atomic if the original and the 
renamed file are on the same physical partition and/or mount point).


But even if the rename operation is atomic, there is still a race 
condition. Your program can be terminated after the original status file 
has been deleted, and before the temp file was renamed. In this case, 
you will be missing the status file (although your program already did 
something just it could not write out the new status).


Here is an algorithm that can always write and read a status (but it 
might not be the latest one). You can keep the last two status files.


Writer:
*create temp file, write new status info
* create lock file if needed
* flock it
* try:
*delete older status file
*   rename temp file to new status file
* finally: unlock the lock file

Reader:

* flock the lock file
* try:
*select the newer status file
*   read status info
* finally: unlock the lock file

It is guaranteed that you will always have a status to read, and in most 
cases this will be the last one (because the writer only locks for a 
short time). However, it is still questionable, because your writer may 
be waiting for the reader to unlock, so the new status info may not be 
written immediatelly.


It would really help if you could tell us what are you trying to do that 
needs status.


Best,

   Laszlo

--
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-08 Thread Christian Heimes
Am 08.07.2012 22:57, schrieb Laszlo Nagy:
 But even if the rename operation is atomic, there is still a race
 condition. Your program can be terminated after the original status file
 has been deleted, and before the temp file was renamed. In this case,
 you will be missing the status file (although your program already did
 something just it could not write out the new status).

You are contradicting yourself. Either the OS is providing a fully
atomic rename or it doesn't. All POSIX compatible OS provide an atomic
rename functionality that renames the file atomically or fails without
loosing the target side. On POSIX OS it doesn't matter if the target exists.

You don't need locks or any other fancy stuff. You just need to make
sure that you flush the data and metadata correctly to the disk and
force a re-write of the directory inode, too. It's a standard pattern on
POSIX platforms and well documented in e.g. the maildir RFC.

You can use the same pattern on Windows but it doesn't work as good and
doesn't guaranteed file integrity for two reasons:

1) Windows's rename isn't atomic if the right side exists.

2) Windows locks file when a program opens a file. Other programs can't
rename or overwrite the file. (You can get around the issue with some
extra work, though.)

Christian

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to safely maintain a status file

2012-07-08 Thread Plumo
 What are you keeping in this status file that needs to be saved
 several times per second?  Depending on what type of state you're
 storing and how persistent it needs to be, there may be a better way
 to store it.

 Michael

This is for a threaded web crawler. I want to cache what URL's are
currently in the queue so if terminated the crawler can continue next
time from the same point.
-- 
http://mail.python.org/mailman/listinfo/python-list