Re: rm -rf is too slow on large files and directory structure(Around 30000)

2012-02-23 Thread Miles Bader
Bilal mk bilalh...@gmail.com writes:
 I am using xfs filesystem and also did the fsck. DMA is enabled.
 Also perfomed xfs defragmentation( xfs_fsr). But still an issue not only rm
 -rf but also cp command

Traditionally XFS is super slow when deleting lots of little files --
much, _much_, slower than ext3, for instance...

[I guess it's supposed to be better now, but I dunno, Ive only
experienced the slow version.]

-Miles

-- 
Freebooter, n. A conqueror in a small way of business, whose annexations lack
of the sanctifying merit of magnitude.


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/buolintkjer@dhlpc061.dev.necel.com



Re: rm -rf is too slow on large files and directory structure(Around 30000)

2012-02-18 Thread Joe Pfeiffer
Is there any chance the OP might have the filesystem mounted with the
'sync' option?


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/1bobsw2fc7@pfeifferfamily.net



Re: Re: rm -rf is too slow on large files and directory structure(Around 30000)

2012-02-17 Thread Bob Proulx
Christofer C. Bell wrote:
 cbell@circe:~/test$ time find rm -type f -exec rm {} \+

There isn't any need to escape the '+' character.

  time find rm -type f -exec rm {} +

 It doesn't seem possible to run a similar test for unlink as it
 appears it only operates on 1 file at a time.  So it does seem that rm
 with the find and/or xargs options you provided is the best way to go
 (at least for this test case).

I definitely recommend using -exec rm {} + over using xargs because
the find method has been incorporated into the POSIX standard.  All
operating systems will have it.  The xargs -0 method is a GNU
extension and won't be available portably.

Once you decide to use GNU extensions (such as xargs -0) then you
might as well use a different GNU extension and use -delete instead.
In for a penny, in for a pound.  Using -delete is almost certainly the
fastest method since it doesn't spawn any external processes.

  time find rm -type f -delete

Bob


signature.asc
Description: Digital signature


Re: rm -rf is too slow on large files and directory structure(Around 30000)

2012-02-16 Thread Robert Brockway

On Thu, 16 Feb 2012, Bilal mk wrote:


I am using xfs filesystem and also did the fsck. DMA is enabled.
Also perfomed xfs defragmentation( xfs_fsr). But still an issue not only rm
-rf but also cp command


Until quite recently XFS was notable for being slow to delete.  Others 
have noted that this is greatly improved in recent kernels but even with 
older kernels there is quite a bit of tuning that you can do to improve 
the delete performance.  Your favourite search engine should give 
you good results.


I put down some notes for myself here a while back:

http://www.practicalsysadmin.com/wiki/index.php/XFS_optimisation

Cheers,

Rob

--
Email: rob...@timetraveller.org Linux counter ID #16440
IRC: Solver (OFTC  Freenode)
Web: http://www.practicalsysadmin.com
Free  Open Source: The revolution that quietly changed the world
One ought not to believe anything, save that which can be proven by nature and the 
force of reason -- Frederick II (26 December 1194 – 13 December 1250)

Re: Re: rm -rf is too slow on large files and directory structure(Around 30000)

2012-02-16 Thread Christofer C. Bell
On Wed, Feb 15, 2012 at 4:51 PM, Clive Standbridge
list-u...@tgstandbridges.plus.com wrote:
 But may provide some benefit when removing a large number (3) of
 files (at least empty ones).

 cbell@circe:~/test$ time find rm -type f -exec rm {} \;

 real              0m48.127s
 user              1m32.926s
 sys               0m38.750s

 First thought - how much of that 48 seconds was spent on launching
 3 instances of rm? It would be instructive to try

  time find rm -type f -exec rm {} \+

 or the more traditional xargs:

  time find rm -type f -print0 | xargs -0 -r rm

 Both of those commands should minimise the number of rm instances.
 Similarly for unlink.

Here are the test results:

cbell@circe:~/test$ time find rm -type f -exec rm {} \+

real0m0.953s
user0m0.064s
sys 0m0.884s
cbell@circe:~/test$

cbell@circe:~/test$ time find rm -type f -print0 | xargs -0 -r rm

real0m0.823s
user0m0.080s
sys 0m0.824s
cbell@circe:~/test$

It doesn't seem possible to run a similar test for unlink as it
appears it only operates on 1 file at a time.  So it does seem that rm
with the find and/or xargs options you provided is the best way to go
(at least for this test case).

-- 
Chris


--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/CAOEVnYuAu8Q7Vh+ECsg=kkmv0j6N3zQqvm=mdg-wz+4qckt...@mail.gmail.com



Re: rm -rf is too slow on large files and directory structure(Around 30000)

2012-02-15 Thread Chris Davies
Jude DaShiell jdash...@shellworld.net wrote:
 Anyone heard of the unlink command?

Yes. And your point is...?

Chris


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/dvls09xg3r@news.roaima.co.uk



Re: rm -rf is too slow on large files and directory structure(Around 30000)

2012-02-15 Thread Christofer C. Bell
On Wed, Feb 15, 2012 at 1:38 AM, Jude DaShiell jdash...@shellworld.net wrote:
 Anyone heard of the unlink command?

unlink is slower than rm removing a 1.5GB file (at least on ext3):

cbell@circe:~$ time rm test1

real0m0.278s
user0m0.000s
sys 0m0.264s

cbell@circe:~$ time unlink test2

real0m0.375s
user0m0.000s
sys 0m0.364s
cbell@circe:~$

But may provide some benefit when removing a large number (3) of
files (at least empty ones).

cbell@circe:~/test$ time find rm -type f -exec rm {} \;

real0m48.127s
user1m32.926s
sys 0m38.750s

cbell@circe:~/test$ time find unlink -type f -exec unlink {} \;

real0m46.167s
user1m32.194s
sys 0m39.346s
cbell@circe:~/test$

I suspect that removing a large number of non-zero byte files will be
slower with unlink than rm.

-- 
Chris


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/CAOEVnYs3775N=sbvgto0y-odzrlnrj5t3bqrpjkt1fh5w7v...@mail.gmail.com



Re: rm -rf is too slow on large files and directory structure(Around 30000)

2012-02-15 Thread Bob Proulx
Christofer C. Bell wrote:
 unlink is slower than rm removing a 1.5GB file (at least on ext3):
 ...
 I suspect that removing a large number of non-zero byte files will be
 slower with unlink than rm.

If it is then it is pointing to a kernel performance issue.  Because
there is very little difference between them.  Until recently rm used
unlink(2) and there would have been no difference.  But recent
versions of coreutils now use unlinkat(2) for improved security now
instead.  Any difference in performace would be in the realm of the
kernel internals.  It doesn't seem to me like there should be any
significant difference.

Bob


signature.asc
Description: Digital signature


Re: Re: rm -rf is too slow on large files and directory structure(Around 30000)

2012-02-15 Thread Clive Standbridge
 But may provide some benefit when removing a large number (3) of
 files (at least empty ones).
 
 cbell@circe:~/test$ time find rm -type f -exec rm {} \;
 
 real  0m48.127s
 user  1m32.926s
 sys   0m38.750s

First thought - how much of that 48 seconds was spent on launching
3 instances of rm? It would be instructive to try 

  time find rm -type f -exec rm {} \+

or the more traditional xargs:

  time find rm -type f -print0 | xargs -0 -r rm

Both of those commands should minimise the number of rm instances.
Similarly for unlink.

-- 
Cheers,
Clive


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120215225049.ga8...@rimmer.esmertec.com



Re: rm -rf is too slow on large files and directory structure(Around 30000)

2012-02-15 Thread Bilal mk
On Wed, Feb 15, 2012 at 12:14 PM, Bob Proulx b...@proulx.com wrote:

 Bilal mk wrote:
  I tried to remove 5GB directory. In that directory around 3 files and
  directory. It will take more than 30 min to complete.

 A large number of files consuming a large number of blocks will take a
 significant amount of time to process.  That is all there is to it.

 Some filesystems are faster than others.  What filesystem are you
 using?  On what type of cpu?

 If you happen to be destroying an entire filesystem then you could
 simply destroy the entire filesystem by unmounting it and then making
 a new filesystem on top of it..

  There is no other cpu intensive process running. After sometime it goes
 to
  D state and unbale to kii that process.

 If you have processes stuck in the D state (uninterruptible sleep)
 then something bad has happened.  This would indicate a bug.

 It sounds like you are having kernel bugs.  You may need to fsck your
 filesystems.  I would double check that dma is enabled to your drives.


I am using xfs filesystem and also did the fsck. DMA is enabled.
Also perfomed xfs defragmentation( xfs_fsr). But still an issue not only rm
-rf but also cp command

USER   PID %CPU %MEMVSZ   RSS TTY  STAT START   TIME COMMAND
root  1134  0.0  0.0  0 0 ?*D*10:18   0:00
[kdmflush]

I have also tested disk with smartmontools. But reported no issues.

My kernel version is 2.6.32-5-amd64. I have also used same
configuartion(same kernel) and same hardware on another machine. But on
that machine there is no issue.

Is it a kernel bug or hardware issue. Any suggestion for troubleshooting or
fix this issue.

Thanks



  I have also tried find with xargs method to remove. It will also take
 long
  time to complete
  find /directory | xargs rm -rf

 I doubt the problem is in rm since it has already been optimized to be
 quite fast.  The newer versions have even more optimization.  But it
 isn't worth the trouble to do anything other than wait.  Most of the
 time will be spent in the kernel organizing the now free blocks.

 If you want to experiment you could try find.

  find /directory -depth -delete

 That is basically the same as rm -rf but using find only.

 Bob



Re: rm -rf is too slow on large files and directory structure(Around 30000)

2012-02-14 Thread Bob Proulx
Bilal mk wrote:
 I tried to remove 5GB directory. In that directory around 3 files and
 directory. It will take more than 30 min to complete.

A large number of files consuming a large number of blocks will take a
significant amount of time to process.  That is all there is to it.

Some filesystems are faster than others.  What filesystem are you
using?  On what type of cpu?

If you happen to be destroying an entire filesystem then you could
simply destroy the entire filesystem by unmounting it and then making
a new filesystem on top of it..

 There is no other cpu intensive process running. After sometime it goes to
 D state and unbale to kii that process.

If you have processes stuck in the D state (uninterruptible sleep)
then something bad has happened.  This would indicate a bug.

It sounds like you are having kernel bugs.  You may need to fsck your
filesystems.  I would double check that dma is enabled to your drives.

 I have also tried find with xargs method to remove. It will also take long
 time to complete
 find /directory | xargs rm -rf

I doubt the problem is in rm since it has already been optimized to be
quite fast.  The newer versions have even more optimization.  But it
isn't worth the trouble to do anything other than wait.  Most of the
time will be spent in the kernel organizing the now free blocks.

If you want to experiment you could try find.

  find /directory -depth -delete

That is basically the same as rm -rf but using find only.

Bob


signature.asc
Description: Digital signature


Re: rm -rf is too slow on large files and directory structure(Around 30000)

2012-02-14 Thread Jochen Spieker
Bilal mk:
 
 I tried to remove 5GB directory. In that directory around 3 files and
 directory. It will take more than 30 min to complete.
 There is no other cpu intensive process running. After sometime it goes to
 D state and unbale to kii that process.

Removing that many files is I/O bound. The CPU doesn't play any
significant role in it. The only way to speed things up is to either get
faster storage (an SSD with high IOPS value for random writing) or you
can try another filesystem. IIRC XFS is good at what you are doing. I
cannot recommend it, though, as I don't have any recent experience with
it. 

J.
-- 
I wish I was gay.
[Agree]   [Disagree]
 http://www.slowlydownward.com/NODATA/data_enter2.html


signature.asc
Description: Digital signature


Re: rm -rf is too slow on large files and directory structure(Around 30000)

2012-02-14 Thread Stan Hoeppner
On 2/15/2012 12:55 AM, Jochen Spieker wrote:
 Bilal mk:

 I tried to remove 5GB directory. In that directory around 3 files and
 directory. It will take more than 30 min to complete.
 There is no other cpu intensive process running. After sometime it goes to
 D state and unbale to kii that process.
 
 Removing that many files is I/O bound. 

This isn't correct.  Removing a kernel source tree is fast, and it
contains on the order of 4500 directories and 50K files.

EXT4 can 'rm -rf' the kernel source in 2-3 seconds.  XFS prior to
delaylog could take a minute or two, with delaylog it's 4 seconds.

So that's 2-4 seconds to remove a directory tree of 50k files.  The OP's
system is taking forever then freezing.  So if it's EXT4 he's using,
this isn't an IO problem but a bug, or something else, maybe bad hardware.

 The CPU doesn't play any
 significant role in it. 

CPU, and memory, play a very significant role here if the filesystem is
XFS.  Delayed logging takes all of the log journal writes and buffers
them, so duplicate changes to the metadata are rolled up into a single
physical IO.  With enough metadata changes it becomes CPU bound.  But
we're talking lots of metadata if we have a modern fast CPU.

I can't speak to EXTx behavior in this regard as I'm not familiar with it.

 The only way to speed things up is to either get
 faster storage (an SSD with high IOPS value for random writing) or you
 can try another filesystem. IIRC XFS is good at what you are doing. I
 cannot recommend it, though, as I don't have any recent experience with
 it. 

XFS is absolutely *horrible* with this workload prior to kernel 2.6.35,
when delayed logging was introduced.  So if this is your workload, and
you want XFS, you need mainline 2.6.35, better still 3.0.0.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4f3b5eca.4010...@hardwarefreak.com



Re: rm -rf is too slow on large files and directory structure(Around 30000)

2012-02-14 Thread Jude DaShiell
Anyone heard of the unlink command?On Wed, 15 Feb 2012, Stan Hoeppner 
wrote:

 On 2/15/2012 12:55 AM, Jochen Spieker wrote:
  Bilal mk:
 
  I tried to remove 5GB directory. In that directory around 3 files and
  directory. It will take more than 30 min to complete.
  There is no other cpu intensive process running. After sometime it goes to
  D state and unbale to kii that process.
  
  Removing that many files is I/O bound. 
 
 This isn't correct.  Removing a kernel source tree is fast, and it
 contains on the order of 4500 directories and 50K files.
 
 EXT4 can 'rm -rf' the kernel source in 2-3 seconds.  XFS prior to
 delaylog could take a minute or two, with delaylog it's 4 seconds.
 
 So that's 2-4 seconds to remove a directory tree of 50k files.  The OP's
 system is taking forever then freezing.  So if it's EXT4 he's using,
 this isn't an IO problem but a bug, or something else, maybe bad hardware.
 
  The CPU doesn't play any
  significant role in it. 
 
 CPU, and memory, play a very significant role here if the filesystem is
 XFS.  Delayed logging takes all of the log journal writes and buffers
 them, so duplicate changes to the metadata are rolled up into a single
 physical IO.  With enough metadata changes it becomes CPU bound.  But
 we're talking lots of metadata if we have a modern fast CPU.
 
 I can't speak to EXTx behavior in this regard as I'm not familiar with it.
 
  The only way to speed things up is to either get
  faster storage (an SSD with high IOPS value for random writing) or you
  can try another filesystem. IIRC XFS is good at what you are doing. I
  cannot recommend it, though, as I don't have any recent experience with
  it. 
 
 XFS is absolutely *horrible* with this workload prior to kernel 2.6.35,
 when delayed logging was introduced.  So if this is your workload, and
 you want XFS, you need mainline 2.6.35, better still 3.0.0.
 
 


Jude jdashiel-at-shellworld-dot-net
http://www.shellworld.net/~jdashiel/nj.html


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/alpine.bsf.2.01.1202150237390@freire1.furyyjbeyq.arg