After some head banging and going outside (too bright), I found a much simpler, and quicker solution!
Rather than traversing the filesystem from the head node, down. I took advantage of the type column in the database and grabbed every user's directory. From that point, I simply did a lookup on the index parent_id where each users's directory was the key. A simple MySQL COUNT() and SUM() did the trick. To put this into a human time perspective, 8+ hours, to ~3+ minutes. -Adam -- Adam Brenner Computer Science, Undergraduate Student Donald Bren School of Information and Computer Sciences Research Computing Support Office of Information Technology http://www.oit.uci.edu/rcs/ University of California, Irvine www.ics.uci.edu/~aebrenne/ [email protected] On Wed, Dec 11, 2013 at 2:25 PM, Adam Brenner <[email protected]> wrote: > Howdy All, > > I am building a tool that uses RBH's database scheme to traverse our > scanned filesystem and report the users, along with all the folders > that contain over N amount of entries in them (1000 for example). In > addition, we will be reporting on old file usage based on mtime (we do > not support atime on our filesystem). These different reports will > then automatically email the user, us, etc. about it. > > We have 75 million entires in total spread across a few RBH databases > (one database for each filesystem). > > > > The part I am having trouble with, is _efficiently_ starting from the > lowest parent_id (6BCDE980:2), the root of the filesystem, and > traversing all the way to the lowest node on the graph. Effectlying > saying, given this parent_id of 6BCDE980:2, output all of its > children, and their children until no more children exists. > > I believe part of the issue is perhaps my lack of understanding self > referencing databases, where id of one entry, is a parent_id of > another entry. I do not believe MySQL can handle this sort of data > structure. Is this correct? Have others done something similar, > traversing the RBH scheme from the root node to the bottom? > > Below is the pseudocode that I have written which I is the brute force > / naive solution for RBH 2.4.3 SCHEME. Any sort of comments will be > greatly appreciated. > > function wrapper (string startPath) { > > query := SELECT usr.id, usr.fullpath AS fullpath FROM ENTRIES AS > grp LEFT JOIN ENTRIES AS usr ON usr.parent_id = grp.id WHERE > grp.fullpath = 'startPath' AND usr.fullpath != 'startPath' > // ^^ Produces one level depth of children on the parent_id of startPath > > foreach result in query { > find1000Entries(result) > output what is on the stack and email users > } > > } > > function find1000Entries (string startPath) { > > query := SELECT COUNT(usr.fullpath) AS zotCount FROM ENTRIES AS > grp LEFT JOIN ENTRIES AS usr ON usr.parent_id = grp.id WHERE > grp.fullpath = 'startPath' > // ^^ Produces one level depth of children on the parent_id of startPath > > if(zotCount from query >= 1000) > report a match and push to stack > > queryNext := SELECT usr.id, usr.fullpath AS fullpath FROM ENTRIES > AS grp LEFT JOIN ENTRIES AS usr ON usr.parent_id = grp.id WHERE > grp.fullpath = 'startPath' AND usr.type = 'dir' > // ^^ Produces one level depth of children on the parent_id of startPath > // that are only directories > > foreach result in queryNext { > find1000Entries(result) > ^^^ Call this function again, but now on directory given by queryNext > ^^^ keep doing this till no more entires exist > > } > > } > > -- > Adam Brenner > Computer Science, Undergraduate Student > Donald Bren School of Information and Computer Sciences > > Research Computing Support > Office of Information Technology > http://www.oit.uci.edu/rcs/ > > University of California, Irvine > www.ics.uci.edu/~aebrenne/ > [email protected] ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ robinhood-support mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/robinhood-support
