The good, 1 network connection to pull the data from the master to the client. The bad, it will be a lot of data for a larger catalog, but one can always compress it.
The reason for this, is, I am not sure how many of you have used NetBackup 6.0MPx or for how long, but early on, maybe this is better in 6.0MP4, but if you ran a lot of commands very quickly or simultaneously, it would crash various services on the master. Otherwise, bplist may not be a bad idea for each file. If it is stable under 6.0, it is still an option; however, it would be nice to find a fast solution that does not involve 1,000,000 queries, such as the method I mentioned. Justin. On 3/26/07, Justin Piszcz <[EMAIL PROTECTED]> wrote: > The problem I worry about with running a bplist on each file is the > network overhead and the overhead that will hit the master server. If > you have 50 servers with 1,000,000 files each, that would be 50 > million network requests total. I was thinking dump the catalog onto > the local machine, build a hash (or use certain UNIX utilities in > shell to emulate this concept) and then run a find on the filesystem > and then loop through each line in it, compare it to the hash or > datadump of everything that has been backed up. > > I have an idea that might work, think of: > > 1. file A has dirs a,b,c (sort | uniq it) > 2. file B has dirs c,d,e (sort | uniq it) > > Think of: > > cat fileA fileB | sort | uniq -c > > If any line starts with > 1, then it appeared in both files, and > hence, has been backed up. This has some (local high overhead); > however, it may be the fastest solution without worrying about hashing > the entire file into memory with perl. > > What do you think of this solution? I plan on trying this later or tomorrow. > > Justin. > > > > > On 3/26/07, Darren Dunham <[EMAIL PROTECTED]> wrote: > > > If one is to create a script to ensure that the files on the > > > filesystem are backed upon before removing them, what is the best > > > data-store model for doing so? > > > > > > Obviously, if you have > 1,000,000 files in the catalog and you need > > > to check each of those, you do not want to bplist -B -C -R 999999 > > > /path/to/file/1.txt for each file. However, you do not want to grep > > > "1" one_gigabyte_catalog.txt either as there is really too much > > > overhead in either case. > > > > A million is a lot, but with sufficiently large machines, you might be > > able to fit all the names in memory (and if you're really lucky, a perl > > hash). > > > > With a lot of memory, I'd build a name hash from the expected files, > > then run through bplist and verify that every file was in the hash. > > > > When the memory needs of the hash cause this method to break down, you > > can move to alternative databases. There are several perl modules that > > let you set up a quick database without installing MySQL or Postgres. > > (but you could use those if you had them). Then the comparison is > > slower, but much less awful than running a million invocations of > > bpflist just to check one file at a time. > > > > -- > > Darren Dunham [EMAIL PROTECTED] > > Senior Technical Consultant TAOS http://www.taos.com/ > > Got some Dr Pepper? San Francisco, CA bay area > > < This line left intentionally blank to confuse you. > > > _______________________________________________ > > Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu > > http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu > > > _______________________________________________ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu