Re: writing large files quickly

2006-01-29 Thread Fredrik Lundh
Bengt Richter wrote:

  How the heck does that make a 400 MB file that fast? It literally takes
  a second or two while every other solution takes at least 2 - 5 minutes.
  Awesome... thanks for the tip!!!
 
 Because it isn't really writing the zeros.   You can make these
 files all day long and not run out of disk space, because this
 kind of file doesn't take very many blocks.   The blocks that
 were never written are virtual blocks, inasmuch as read() at
 that location will cause the filesystem to return a block of NULs.
 
 I wonder if it will also write virtual blocks when it gets real
 zero blocks to write from a user, or even with file system copy utils?

I've seen this behaviour on big iron Unix systems, in a benchmark that
repeatedly copied data from a memory mapped section to an output file.

but for the general case, I doubt that adding is this block all zeros or
does this block match something we recently wrote to disk checks will
speed things up, on average...

/F



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-29 Thread Thomas Bellman
Grant Edwards [EMAIL PROTECTED] writes:

 $ dd if=/dev/zero of=zeros bs=64k count=1024
 1024+0 records in
 1024+0 records out
 $ ls -l zeros
 -rw-r--r--  1 grante users 67108864 Jan 28 14:49 zeros
 $ du -h zeros
 65M zeros

 In my book that's 64MB not 65MB, but that's an argument for
 another day.

You should be aware that the size that 'du' and 'ls -s' reports,
include any indirect blocks needed to keep track of the data
blocks of the file.  Thus, you get the amount of space that the
file actually uses in the file system, and would become free if
you removed it.  That's why it is larger than 64 Mbyte.  And 'du'
(at least GNU du) rounds upwards when you use -h.

Try for instance:

$ dd if=/dev/zero of=zeros bs=4k count=16367
16367+0 records in
16367+0 records out
$ ls -ls zeros
65536 -rw-rw-r--  1 bellman bellman 67039232 Jan 29 13:57 zeros
$ du -h zeros
64M zeros

$ dd if=/dev/zero of=zeros bs=4k count=16368
16368+0 records in
16368+0 records out
$ ls -ls zeros
65540 -rw-rw-r--  1 bellman bellman 67043328 Jan 29 13:58 zeros
$ du -h zeros
65M zeros

(You can infer from the above that my file system has a block
size of 4 Kbyte.)


-- 
Thomas Bellman,   Lysator Computer Club,   Linköping University,  Sweden
There are many causes worth dying for, but  !  bellman @ lysator.liu.se
 none worth killing for. -- Gandhi  !  Make Love -- Nicht Wahr!
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: writing large files quickly

2006-01-28 Thread Jens Theisen
Donn wrote:

 How the heck does that make a 400 MB file that fast? It literally takes
 a second or two while every other solution takes at least 2 - 5 minutes.
 Awesome... thanks for the tip!!!

 Because it isn't really writing the zeros.   You can make these
 files all day long and not run out of disk space, because this
 kind of file doesn't take very many blocks.   The blocks that
 were never written are virtual blocks, inasmuch as read() at
 that location will cause the filesystem to return a block of NULs.

Under which operating system/file system?

As far as I know this should be file system dependent at least under  
Linux, as the calls to open and seek are served by the file system driver.

Jens

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-28 Thread Jens Theisen
Ivan wrote:

 Steven D'Aprano wrote:

 Isn't this a file system specific solution though? Won't your file system
 need to have support for sparse files, or else it won't work?

 Yes, but AFAIK the only modern (meaning: in wide use today) file
 system that doesn't have this support is FAT/FAT32.

I don't think ext2fs does this either. At least the du and df commands  
tell something different.

Actually I'm not sure what this optimisation should give you anyway. The  
only circumstance under which files with only zeroes are meaningful is  
testing, and that's exactly when you don't want that optimisation.

On compressing filesystems such as ntfs you will get this behaviour as a  
special case of compression and compression makes more sense.

Jens

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-28 Thread Jens Theisen
Donn wrote:

 Because it isn't really writing the zeros.   You can make these
 files all day long and not run out of disk space, because this
 kind of file doesn't take very many blocks.   The blocks that
 were never written are virtual blocks, inasmuch as read() at
 that location will cause the filesystem to return a block of NULs.

Are you sure that's not just a case of asynchronous writing that can be  
done in a particularly efficient way? df quite clearly tells me that I'm  
running out of disk space on my ext2fs linux when I dump it full of  
zeroes.

Jens


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-28 Thread Ivan Voras
Jens Theisen wrote:
 Ivan wrote:

Yes, but AFAIK the only modern (meaning: in wide use today) file
system that doesn't have this support is FAT/FAT32.
 
 I don't think ext2fs does this either. At least the du and df commands  
 tell something different.

ext2 is a reimplementation of BSD UFS, so it does. Here:

f = file('bigfile', 'w')
f.seek(1024*1024)
f.write('a')

$ l afile
-rw-r--r--  1 ivoras  wheel  1048577 Jan 28 14:57 afile
$ du afile
8   afile

 Actually I'm not sure what this optimisation should give you anyway. The  
 only circumstance under which files with only zeroes are meaningful is  
 testing, and that's exactly when you don't want that optimisation.

I read somewhere that it has a use in database software, but the only 
thing I can imagine for this is when using heap queues 
(http://python.active-venture.com/lib/node162.html).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-28 Thread Jens Theisen
Ivan wrote:

 ext2 is a reimplementation of BSD UFS, so it does. Here:

 f = file('bigfile', 'w')
 f.seek(1024*1024)
 f.write('a')

 $ l afile
 -rw-r--r--  1 ivoras  wheel  1048577 Jan 28 14:57 afile
 $ du afile
 8 afile

Interesting:

cp bigfile bigfile2

cat bigfile  bigfile3

du bigfile*
8   bigfile2
1032bigfile3

So it's not consumings 0's. It's just doesn't store unwritten data. And I  
can think of an application for that: An application might want to write  
the biginning of a file at a later point, so this makes it more efficient.

I wonder how other file systems behave.

 I read somewhere that it has a use in database software, but the only
 thing I can imagine for this is when using heap queues
 (http://python.active-venture.com/lib/node162.html).

That's an article about the heap efficient data structure. Was it your  
intention to link this?

Jens


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-28 Thread Scott David Daniels
Jens Theisen wrote:
 Ivan wrote:
 I read somewhere that it has a use in database software, but the only
 thing I can imagine for this is when using heap queues
 (http://python.active-venture.com/lib/node162.html).

I've used this feature eons ago where the file was essentially a single
large address space (memory mapped array) that was expected to never
fill all that full.  I was tracking data from a year of (thousands? of)
students seeing Drill-and-practice questions from a huge database of
questions.  The research criticism we got was that our analysis did not
rule out any kid seeing the same question more than once, and getting
practice that would improve performance w/o learning.  I built a
bit-filter and copied tapes dropping any repeats seen by students.
We then just ran the same analysis we had on the raw data, and found
no significant difference.

The nice thing is that file size grew over time, so (for a while) I
could run on the machine with other users.  By the last block of
tapes I was sitting alone in the machine room at 3:00 AM on Sat mornings
afraid to so much as fire up an editor.

-- 
-Scott David Daniels
[EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-28 Thread Tim Peters
[Jens Theisen]
 ...
 Actually I'm not sure what this optimisation should give you anyway. The
 only circumstance under which files with only zeroes are meaningful is
 testing, and that's exactly when you don't want that optimisation.

In most cases, a guarantee that reading uninitialized file data will
return zeroes is a security promise, not an optimization.  C doesn't
require this behavior, but POSIX does.

On FAT/FAT32, if you create a file, seek to a large offset, write a
byte, then read the uninitialized data from offset 0 up to the byte
just written, you get back whatever happened to be sitting on disk at
the locations now reserved for the file.  That can include passwords,
other peoples' email, etc -- anything whatsoever that may have been
written to disk at some time in the disk's history.  Security weenies
get upset at stuff like that ;-)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-28 Thread Ivan Voras
Jens Theisen wrote:

 cp bigfile bigfile2
 
 cat bigfile  bigfile3
 
 du bigfile*
 8   bigfile2
 1032bigfile3
 
 So it's not consumings 0's. It's just doesn't store unwritten data. And I  

Very possibly cp understands sparse file and cat (doint what it's 
meant to do) doesn't :)


I read somewhere that it has a use in database software, but the only
thing I can imagine for this is when using heap queues
(http://python.active-venture.com/lib/node162.html).
 
 That's an article about the heap efficient data structure. Was it your  
 intention to link this?

Yes. The idea is that in implementing such a structure, in which each 
level is 2^x (x=level of the structure, and it's depentent on the 
number of entries the structure must hold) wide, most of blocks could 
exist and never be written to (i.e. they'd be empty). Using sparse 
files would save space :)

(It has nothing to do with python; I remembered the article so I linked 
to it; The sparse-file issue is useful only when implementing heaps 
directly on file or in mmaped file).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-28 Thread Bengt Richter
On Fri, 27 Jan 2006 12:30:49 -0800, Donn Cave [EMAIL PROTECTED] wrote:

In article [EMAIL PROTECTED],
 rbt [EMAIL PROTECTED] wrote:
 Won't work!? It's absolutely fabulous! I just need something big, quick 
 and zeros work great.
 
 How the heck does that make a 400 MB file that fast? It literally takes 
 a second or two while every other solution takes at least 2 - 5 minutes. 
 Awesome... thanks for the tip!!!

Because it isn't really writing the zeros.   You can make these
files all day long and not run out of disk space, because this
kind of file doesn't take very many blocks.   The blocks that
were never written are virtual blocks, inasmuch as read() at
that location will cause the filesystem to return a block of NULs.

I wonder if it will also write virtual blocks when it gets real
zero blocks to write from a user, or even with file system copy utils?

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list


writing large files quickly

2006-01-27 Thread rbt
I've been doing some file system benchmarking. In the process, I need to 
create a large file to copy around to various drives. I'm creating the 
file like this:

fd = file('large_file.bin', 'wb')
for x in xrange(40960):
 fd.write('0')
fd.close()

This takes a few minutes to do. How can I speed up the process?

Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread superfun
One way to speed this up is to write larger strings:

fd = file('large_file.bin', 'wb')
for x in xrange(5120):
 fd.write('')
fd.close()

However, I bet within an hour or so you will have a much better answer
or 10. =)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread casevh

rbt wrote:
 I've been doing some file system benchmarking. In the process, I need to
 create a large file to copy around to various drives. I'm creating the
 file like this:

 fd = file('large_file.bin', 'wb')
 for x in xrange(40960):
  fd.write('0')
 fd.close()

 This takes a few minutes to do. How can I speed up the process?

 Thanks!

Untested, but this should be faster.

block = '0' * 409600
fd = file('large_file.bin', 'wb')
for x in range(1000):
 fd.write('0')
fd.close()

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread Tim Chase
 Untested, but this should be faster.
 
 block = '0' * 409600
 fd = file('large_file.bin', 'wb')
 for x in range(1000):
  fd.write('0')
 fd.close()

Just checking...you mean

fd.write(block)

right? :)  Otherwise, you end up with just 1000 0 characters in 
your file :)

Is there anything preventing one from just doing the following?

fd.write(0 * 40960)

It's one huge string for a very short time.  It skips all the 
looping and allows Python to pump the file out to the disk as 
fast as the OS can handle it. (and sorta as fast as Python can 
generate this humongous string)

-tkc





-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread Paul Rubin
Tim Chase [EMAIL PROTECTED] writes:
 Is there anything preventing one from just doing the following?
   fd.write(0 * 40960)
 It's one huge string for a very short time.  It skips all the looping
 and allows Python to pump the file out to the disk as fast as the OS
 can handle it. (and sorta as fast as Python can generate this
 humongous string)

That's large enough that it might exceed your PC's memory and cause
swapping.  Try strings of about 64k (65536).
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread Grant Edwards
On 2006-01-27, rbt [EMAIL PROTECTED] wrote:

 I've been doing some file system benchmarking. In the process, I need to 
 create a large file to copy around to various drives. I'm creating the 
 file like this:

 fd = file('large_file.bin', 'wb')
 for x in xrange(40960):
  fd.write('0')
 fd.close()

 This takes a few minutes to do. How can I speed up the process?

Don't write so much data.

f = file('large_file.bin','wb')
f.seek(40960-1)
f.write('\x00')
f.close()

That should be almost instantaneous in that the time required
for those 4 lines of code is neglgible compared to interpreter
startup and shutdown.

-- 
Grant Edwards   grante Yow!  These PRESERVES
  at   should be FORCE-FED to
   visi.comPENTAGON OFFICIALS!!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread casevh
Oops. I did mean

   fd.write(block)

The only limit is available memory.  I've used 1MB block sizes when I
did read/write tests. I was comparing NFS vs. local disk performance. I
know Python can do at least 100MB/sec.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread Tim Chase
 fd.write('0')
[cut]
 
 f = file('large_file.bin','wb')
 f.seek(40960-1)
 f.write('\x00')

While a mindblowingly simple/elegant/fast solution (kudos!), the 
OP's file ends up with full of the character zero (ASCII 0x30), 
while your solution ends up full of the NUL character (ASCII 0x00):

   [EMAIL PROTECTED]:~/temp$ xxd op.bin
   000: 3030 3030 3030 3030 3030   00
   [EMAIL PROTECTED]:~/temp$ xxd new.bin
   000:        ..

(using only length 10 instead of 400 megs to save time and disk 
space...)

-tkc





-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread Grant Edwards
On 2006-01-27, Tim Chase [EMAIL PROTECTED] wrote:
 fd.write('0')
 [cut]
 
 f = file('large_file.bin','wb')
 f.seek(40960-1)
 f.write('\x00')

 While a mindblowingly simple/elegant/fast solution (kudos!), the 
 OP's file ends up with full of the character zero (ASCII 0x30), 
 while your solution ends up full of the NUL character (ASCII 0x00):

Oops.  I missed the fact that he was writing 0x30 and not 0x00.

Yes, the hole in the file will read as 0x00 bytes.  If the OP
actually requires that the file contain something other than
0x00 bytes, then my solution won't work.

-- 
Grant Edwards   grante Yow!  I want the presidency
  at   so bad I can already taste
   visi.comthe hors d'oeuvres.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread rbt
Grant Edwards wrote:
 On 2006-01-27, Tim Chase [EMAIL PROTECTED] wrote:
 
fd.write('0')

[cut]

f = file('large_file.bin','wb')
f.seek(40960-1)
f.write('\x00')

While a mindblowingly simple/elegant/fast solution (kudos!), the 
OP's file ends up with full of the character zero (ASCII 0x30), 
while your solution ends up full of the NUL character (ASCII 0x00):
 
 
 Oops.  I missed the fact that he was writing 0x30 and not 0x00.
 
 Yes, the hole in the file will read as 0x00 bytes.  If the OP
 actually requires that the file contain something other than
 0x00 bytes, then my solution won't work.
 

Won't work!? It's absolutely fabulous! I just need something big, quick 
and zeros work great.

How the heck does that make a 400 MB file that fast? It literally takes 
a second or two while every other solution takes at least 2 - 5 minutes. 
Awesome... thanks for the tip!!!

Thanks to all for the advice... one can really learn things here :)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread rbt
Grant Edwards wrote:
 On 2006-01-27, rbt [EMAIL PROTECTED] wrote:
 
 
I've been doing some file system benchmarking. In the process, I need to 
create a large file to copy around to various drives. I'm creating the 
file like this:

fd = file('large_file.bin', 'wb')
for x in xrange(40960):
 fd.write('0')
fd.close()

This takes a few minutes to do. How can I speed up the process?
 
 
 Don't write so much data.
 
 f = file('large_file.bin','wb')
 f.seek(40960-1)
 f.write('\x00')
 f.close()

OK, I'm still trying to pick my jaw up off of the floor. One question... 
  how big of a file could this method create? 20GB, 30GB, limit depends 
on filesystem, etc?

 That should be almost instantaneous in that the time required
 for those 4 lines of code is neglgible compared to interpreter
 startup and shutdown.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread Donn Cave
In article [EMAIL PROTECTED],
 rbt [EMAIL PROTECTED] wrote:
 Won't work!? It's absolutely fabulous! I just need something big, quick 
 and zeros work great.
 
 How the heck does that make a 400 MB file that fast? It literally takes 
 a second or two while every other solution takes at least 2 - 5 minutes. 
 Awesome... thanks for the tip!!!

Because it isn't really writing the zeros.   You can make these
files all day long and not run out of disk space, because this
kind of file doesn't take very many blocks.   The blocks that
were never written are virtual blocks, inasmuch as read() at
that location will cause the filesystem to return a block of NULs.

   Donn Cave, [EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread rbt
Donn Cave wrote:
 In article [EMAIL PROTECTED],
  rbt [EMAIL PROTECTED] wrote:
 
Won't work!? It's absolutely fabulous! I just need something big, quick 
and zeros work great.

How the heck does that make a 400 MB file that fast? It literally takes 
a second or two while every other solution takes at least 2 - 5 minutes. 
Awesome... thanks for the tip!!!
 
 
 Because it isn't really writing the zeros.   You can make these
 files all day long and not run out of disk space, because this
 kind of file doesn't take very many blocks. 

Hmmm... when I copy the file to a different drive, it takes up 
409,600,000 bytes. Also, an md5 checksum on the generated file and on 
copies placed on other drives are the same. It looks like a regular, big 
file... I don't get it.


 The blocks that
 were never written are virtual blocks, inasmuch as read() at
 that location will cause the filesystem to return a block of NULs.
 
Donn Cave, [EMAIL PROTECTED]
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread Robert Kern
rbt wrote:

 Hmmm... when I copy the file to a different drive, it takes up 
 409,600,000 bytes. Also, an md5 checksum on the generated file and on 
 copies placed on other drives are the same. It looks like a regular, big 
 file... I don't get it.

google(sparse files)

-- 
Robert Kern
[EMAIL PROTECTED]

In the fields of hell where the grass grows high
 Are the graves of dreams allowed to die.
  -- Richard Harter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread Grant Edwards
On 2006-01-27, rbt [EMAIL PROTECTED] wrote:

fd.write('0')

[cut]

f = file('large_file.bin','wb')
f.seek(40960-1)
f.write('\x00')

While a mindblowingly simple/elegant/fast solution (kudos!), the 
OP's file ends up with full of the character zero (ASCII 0x30), 
while your solution ends up full of the NUL character (ASCII 0x00):
 
 Oops.  I missed the fact that he was writing 0x30 and not 0x00.
 
 Yes, the hole in the file will read as 0x00 bytes.  If the OP
 actually requires that the file contain something other than
 0x00 bytes, then my solution won't work.

 Won't work!? It's absolutely fabulous! I just need something big, quick 
 and zeros work great.

Then Bob's your uncle, eh?

 How the heck does that make a 400 MB file that fast?

Most of the file isn't really there, it's just a big hole in
a sparse array containing a single allocation block that
contains the single '0x00' byte that was written:

$ ls -l large_file.bin 
-rw-r--r--  1 grante users 40960 Jan 27 15:02 large_file.bin
$ du -h large_file.bin
12K large_file.bin

The filesystem code in the OS is written so that it returns
'0x00' bytes when you attempt to read data from the hole in
the file.  So, if you open the file and start reading, you'll
get 400MB of 0x00 bytes before you get an EOF return.  But the
file really only takes up a couple chunks of disk space, and
chunks are usually on the order of 4KB.

-- 
Grant Edwards   grante Yow!  Is this an out-take
  at   from the BRADY BUNCH?
   visi.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread Grant Edwards
On 2006-01-27, rbt [EMAIL PROTECTED] wrote:

 Hmmm... when I copy the file to a different drive, it takes up 
 409,600,000 bytes. Also, an md5 checksum on the generated file and on 
 copies placed on other drives are the same. It looks like a regular, big 
 file... I don't get it.

Because the filesystem code keeps track of where you are in
that 400MB stream, and returns 0x00 anytime you're reading from
a hole.  The cp program and the md5sum just open the file
and start read()ing.  The filesystem code returns 0x00 bytes
for all of the read positions that are in the hole, just like
Don said:

 The blocks that were never written are virtual blocks,
 inasmuch as read() at that location will cause the filesystem
 to return a block of NULs.

-- 
Grant Edwards   grante Yow!  They
  at   collapsed... like nuns
   visi.comin the street... they had
   no teenappeal!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread Erik Andreas Brandstadmoen
Grant Edwards wrote:
 Because the filesystem code keeps track of where you are in
 that 400MB stream, and returns 0x00 anytime you're reading from
 a hole.  The cp program and the md5sum just open the file
 and start read()ing.  The filesystem code returns 0x00 bytes
 for all of the read positions that are in the hole, just like
 Don said:

And, this file is of course useless for FS benchmarking, since you're 
barely reading data from disk at all. You'll just be testing the FS's 
handling of sparse files. I suggest you go for one of the suggestions 
with larger block sizes. That's probably your best bet.

Regards,

Erik Brandstadmoen
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread Grant Edwards
On 2006-01-27, rbt [EMAIL PROTECTED] wrote:

 OK, I'm still trying to pick my jaw up off of the floor. One
 question...  how big of a file could this method create? 20GB,
 30GB, limit depends on filesystem, etc?

Right.  Back in the day, the old libc and ext2 code had a 2GB
file size limit at one point (it used an signed 32 bit value
to keep track of file size/position).  That was back when 1GB
drive was something to brag about, so it wasn't a big deal for
most people.

I think everthing has large file support is enabled by default
now, so the limit is 2^63 for most modern filesystems --
that's the limit of the file size you can create using the
seek() trick.  The limit for actual on-disk bytes may not be
that large.

Here's a good link:

http://www.suse.de/~aj/linux_lfs.html

-- 
Grant Edwards   grante Yow!  I'm GLAD I
  at   remembered to XEROX all
   visi.commy UNDERSHIRTS!!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread rbt
Grant Edwards wrote:
 On 2006-01-27, rbt [EMAIL PROTECTED] wrote:
 
 
Hmmm... when I copy the file to a different drive, it takes up 
409,600,000 bytes. Also, an md5 checksum on the generated file and on 
copies placed on other drives are the same. It looks like a regular, big 
file... I don't get it.
 
 
 Because the filesystem code keeps track of where you are in
 that 400MB stream, and returns 0x00 anytime you're reading from
 a hole.  The cp program and the md5sum just open the file
 and start read()ing.  The filesystem code returns 0x00 bytes
 for all of the read positions that are in the hole, just like
 Don said:

OK I finally get it. It's too good to be true :)

I'm going back to using _real_ files... files that don't look as if they 
are there but aren't. BTW, the file 'size' and 'size on disk' were 
identical on win 2003. That's a bit deceptive. According to the NTFS 
docs, they should be drastically different... 'size on disk' should be 
like 64K or something.

 
 
The blocks that were never written are virtual blocks,
inasmuch as read() at that location will cause the filesystem
to return a block of NULs.
 
 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread Grant Edwards
On 2006-01-27, Erik Andreas Brandstadmoen [EMAIL PROTECTED] wrote:
 Grant Edwards wrote:
 Because the filesystem code keeps track of where you are in
 that 400MB stream, and returns 0x00 anytime you're reading from
 a hole.  The cp program and the md5sum just open the file
 and start read()ing.  The filesystem code returns 0x00 bytes
 for all of the read positions that are in the hole, just like
 Don said:

 And, this file is of course useless for FS benchmarking, since
 you're barely reading data from disk at all.

Quite right.  Copying such a sparse file is probably only
really testing the write performance of the filesystem
containing the destination file.

 You'll just be testing the FS's handling of sparse files.

Which may be a useful thing to know, but I rather doubt it.

 I suggest you go for one of the suggestions with larger block
 sizes. That's probably your best bet.

-- 
Grant Edwards   grante Yow!  Now I'm concentrating
  at   on a specific tank battle
   visi.comtoward the end of World
   War II!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread Grant Edwards
On 2006-01-27, rbt [EMAIL PROTECTED] wrote:

 OK I finally get it. It's too good to be true :)

Sorry about that.  I should have paid closer attention to what
you were going to do with the file.

 I'm going back to using _real_ files... files that don't look
 as if they are there but aren't. BTW, the file 'size' and
 'size on disk' were identical on win 2003. That's a bit
 deceptive.

What?! Windows lying to the user?  I don't believe it!

 According to the NTFS docs, they should be drastically
 different... 'size on disk' should be like 64K or something.

Probably.

-- 
Grant Edwards   grante Yow!  Where's th' DAFFY
  at   DUCK EXHIBIT??
   visi.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread Steven D'Aprano
On Fri, 27 Jan 2006 12:30:49 -0800, Donn Cave wrote:

 In article [EMAIL PROTECTED],
  rbt [EMAIL PROTECTED] wrote:
 Won't work!? It's absolutely fabulous! I just need something big, quick 
 and zeros work great.
 
 How the heck does that make a 400 MB file that fast? It literally takes 
 a second or two while every other solution takes at least 2 - 5 minutes. 
 Awesome... thanks for the tip!!!
 
 Because it isn't really writing the zeros.   You can make these
 files all day long and not run out of disk space, because this
 kind of file doesn't take very many blocks.   The blocks that
 were never written are virtual blocks, inasmuch as read() at
 that location will cause the filesystem to return a block of NULs.


Isn't this a file system specific solution though? Won't your file system
need to have support for sparse files, or else it won't work?


Here is another possible solution, if you are running Linux, farm the real
work out to some C code optimised for writing blocks to the disk:

# untested and, it goes without saying, untimed
os.system(dd if=/dev/zero of=largefile.bin bs=64K count=16384)


That should make a 4GB file as fast as possible. If you have lots and lots
of memory, you could try upping the block size (bs=...).



-- 
Steven.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread Ivan Voras
Steven D'Aprano wrote:

 Isn't this a file system specific solution though? Won't your file system
 need to have support for sparse files, or else it won't work?

Yes, but AFAIK the only modern (meaning: in wide use today) file 
system that doesn't have this support is FAT/FAT32.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: writing large files quickly

2006-01-27 Thread Grant Edwards
On 2006-01-27, Steven D'Aprano [EMAIL PROTECTED] wrote:

 Because it isn't really writing the zeros.   You can make these
 files all day long and not run out of disk space, because this
 kind of file doesn't take very many blocks.   The blocks that
 were never written are virtual blocks, inasmuch as read() at
 that location will cause the filesystem to return a block of NULs.

 Isn't this a file system specific solution though? Won't your file system
 need to have support for sparse files, or else it won't work?

If your fs doesn't support sparse files, then you'll end up with a
file that really does have 400MB of 0x00 bytes in it.  Which is
what the OP really needed in the first place.

 Here is another possible solution, if you are running Linux, farm the real
 work out to some C code optimised for writing blocks to the disk:

 # untested and, it goes without saying, untimed
 os.system(dd if=/dev/zero of=largefile.bin bs=64K count=16384)

 That should make a 4GB file as fast as possible. If you have lots and lots
 of memory, you could try upping the block size (bs=...).

I agree. that probably is the optimal solution for Unix boxes.
I messed around with something like that once, and block sizes
bigger than 64k didn't make much difference.

-- 
Grant Edwards   grante Yow!  As President I
  at   have to go vacuum my coin
   visi.comcollection!
-- 
http://mail.python.org/mailman/listinfo/python-list