Re: [Linux-users] xargs and awkery

2016-03-09 Thread steve

On 03/08/2016 05:57 PM, Criggie wrote:

I'm sure this can be optimised, but how's this for some dirty hackery.



server.dc home $  `cat _fdupes-2016-03-08.txt | xargs -n 3 | awk ' { print
  "rm -f " $2, $3  " ; ln " $1 , $2 " ; ln " $1, $3 } '`



Clue 1

the input file contains the output of fdupes, and listed only triples of
identical files.  There were no fours or twos
A better one would have looked for a blank line in the input, and looped
through from 2 to N .






Clue 2 before

server.dc home $ ll ./dir?/abc
-rw-rw-r-- 1 root nagios 76047612 Sep  9 22:39 ./dir1/abc
-rw-rw-r-- 1 root nagios 76047612 Sep  9 22:39 ./dir2/abc
-rw-r--r-- 1 root root   76047612 Oct 28 02:52 ./dir3/abc







Clue 3 after

server.dc home $ ll ./dir?/abc
-rw-rw-r-- 3 root nagios 76047612 Sep  9 22:39 ./dir1/abc
-rw-rw-r-- 3 root nagios 76047612 Sep  9 22:39 ./dir2/abc
-rw-rw-r-- 3 root nagios 76047612 Sep  9 22:39 ./dir3/abc



I think the nifty thing was -n 3 for xargs. I was unaware it could do that.


Answer
This machine has a lot of largish files triplicated on the disk.  Since I
can't convert it to a filesyystem with deduplication, this deleted 2/3 of
the files, and hard linked them back into place.
And the script merely spits out shell commands which are then executed.
So testing it is just running the command without the backticks of
execution.

So the mount in question went from 355GB in use to 170GB, or 93% to 45%
usage.



I'd add a --no-run-if-empty to the xargs, just to be paranoid. Can't see 
the need for backtics either ( which should be replaced in modern bashes 
for $(...) so a new shell isn't forked to run the command )


(actually I'd just buy a bigger disk)

Steve

--
Steve Holdoway BSc(Hons) MIITP
http://www.greengecko.co.nz
Linkedin: http://www.linkedin.com/in/steveholdoway
Skype: sholdowa

___
Linux-users mailing list
Linux-users@lists.canterbury.ac.nz
http://lists.canterbury.ac.nz/mailman/listinfo/linux-users


Re: [Linux-users] xargs and awkery

2016-03-08 Thread Volker Kuhlmann
On Tue 08 Mar 2016 17:57:10 NZDT +1300, Criggie wrote:

> I'm sure this can be optimised, but how's this for some dirty hackery.

Dirty, yes, clever, not so sure. I don't think I'd be game to run that
on one of my filesystems. Checking for empty lines would be essential,
and gawk is much better at splitting the input than xargs -n3!
You could also program this in bash, efficiency is not your main
problem, and it had the advantage of being reusable.

Personally I'd run cp -al to-be-sorted some-other-place befre running
anything else. Then I'd think about short scripting that is able to
verify whether the linking took place correctly. E.g.
  find -type f | xargs md5sum
must produce identical output before and after.

To avoid getting into the same situation again, look at rsync
--dest-dir. That's the secret ingredient in rsnapshot.

Your dirty method destroys directories lastmod stamps, I always get very
irate with that kind of thing.

> And the script merely spits out shell commands which are then executed.

Good.

> So testing it is just running the command without the backticks of
> execution.

Crikey! I wouldn't risk that. Pipe into sh -s instead.

Volker

-- 
Volker Kuhlmann
http://volker.top.geek.nz/  Please do not CC list postings to me.
___
Linux-users mailing list
Linux-users@lists.canterbury.ac.nz
http://lists.canterbury.ac.nz/mailman/listinfo/linux-users