In article <mailman.506.1365751267.3114.python-l...@python.org>, Rob Schneider <rmsc...@gmail.com> wrote:
> Source (correct one) is 47,970 bytes. Target after copy of 45,056 bytes. > I've tried changing what gets written to change the file size. It is usually > this sort of difference. > > The file system is Mac OS Extended Journaled (default as out of the box). Is it always the tail end of the file that gets truncated, or is it missing (or mutating) data in the middle of the file? I'm just grasping at straws here, but maybe it's somehow messing up line endings (turning CRLF pairs into just LF), or using some other kind of encoding for unicode characters? If you compare the files with cmp, does it say: $ cmp original truncated cmp: EOF on truncated that's what I would expect if it's a strict truncation. If it says anything else, you've got a data munging problem. What I would normally do around this time is run a system call trace on the process to watch all the descriptor related (i.e. open, create, write) system calls. On OSX, that means dtruss. Unfortunately, I'm not that familiar with the OSX variant so I can't give you specific advice about which options to use. When you can see the system calls, you know exactly what your process is doing. You should be able to see the output file being opened and a descriptor returned, then find all the write() calls to that descriptor. You'll also be able to find any other system calls on that pathname after the descriptor is closed. Please report back what you find! Oh, another trick you might want to try is making the output file path /dev/stdout and redirecting the output into a file with the shell. See if that makes any difference. Or, try something like (assuming the -o option to your script sets the output filename): python my_prog.py -o /dev/stdout | dd bs=1 of=xxx That will do a couple of things. First, dd will report how many bytes it read and wrote, so you can see if that's the correct number. Also, since your process will no longer be writing to a real file, if anything is doing something weird like a seek() after you're done writing, that will fail since you can't seek() on a pipe. -- http://mail.python.org/mailman/listinfo/python-list