Hello everyone. I'm thinking through a short program I want to write that will 'par2'/generate ECCs for all of my work files which branch out from a single directory and number approximately 15,000. Specifically:
1) day one:
- create a mirror copy of the directory tree empty of all files (there are a bunch of ways in bash of doing this). - recurse down the directory tree which has the files and run a par2 create calculation on each file which generates approximately 10 *.par2 fileblocks. I will then copy the *.par2 fileblocks to the mirror directory tree into the same position as the 'principal file. Therefore assuming 10 *.par2 fileblocks for every actual file, the mirror tree will have around 150,000 *.par2 fileblocks (space and CPU time are a non-issue).
2) day two:
- for each file in the primary directory, par2 verify it with respect to its corresponding *.par2 fileblocks in the mirror tree. If it's ok, move on to the next file, if not, repair it, generate a new set of *.par2 fileblocks and copy them over to the mirror.
3) day three:
 - same as day two, ongoing.

I'm aware that most par2 programs need the file and *.par blocks to be in the same location but let's assume I find a way around this. Also, I believe it would be possible to par2 the top directory (which will give me work1.par2 - work10.par2) but the problem is performed this way, the blocks treat all files as a single whole so if I detect corruption, I have no way of locating which file.

I'm considering two ways of doing this:

Option A:
- This seems the most obvious if somewhat inelegant: define a few functions, and incorporate them into a for loop which will be applied to each file as described in 1) - 3) above.

Option B:
- I'm afraid my thinking is not entirely clear regards this option but somehow I import metadata for every (primary) file into a list (I think all that's needed is file name and location), perhaps even a nested list although I'm not sure if that provides an advantage. Then I apply the operations for 1) - 3) above sequentially per list item, the assumption being the list data and my home made functions will be sufficient.

I've found various par2 programs on PyPi and possibly pyFileFixity could be used but in this instance I'd rather give it a go myself. For various reasons I can't use ZFS which would, of course, negate the need for doing any of this. It seems this would be my consolation prize :)
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to