After some head banging and going outside (too bright), I found a much
simpler, and quicker solution!

Rather than traversing the filesystem from the head node, down. I took
advantage of the type column in the database and grabbed every user's
directory. From that point, I simply did a lookup on the index
parent_id where each users's directory was the key. A simple MySQL
COUNT() and SUM() did the trick.

To put this into a human time perspective, 8+ hours, to ~3+ minutes.

-Adam

--
Adam Brenner
Computer Science, Undergraduate Student
Donald Bren School of Information and Computer Sciences

Research Computing Support
Office of Information Technology
http://www.oit.uci.edu/rcs/

University of California, Irvine
www.ics.uci.edu/~aebrenne/
[email protected]


On Wed, Dec 11, 2013 at 2:25 PM, Adam Brenner <[email protected]> wrote:
> Howdy All,
>
> I am building a tool that uses RBH's database scheme to traverse our
> scanned filesystem and report the users, along with all the folders
> that contain over N amount of entries in them (1000 for example). In
> addition, we will be reporting on old file usage based on mtime (we do
> not support atime on our filesystem). These different reports will
> then automatically email the user, us, etc. about it.
>
> We have 75 million entires in total spread across a few RBH databases
> (one database for each filesystem).
>
>
>
> The part I am having trouble with, is _efficiently_ starting from the
> lowest parent_id (6BCDE980:2), the root of the filesystem, and
> traversing all the way to the lowest node on the graph. Effectlying
> saying, given this parent_id of 6BCDE980:2, output all of its
> children, and their children until no more children exists.
>
> I believe part of the issue is perhaps my lack of understanding self
> referencing databases, where id of one entry, is a parent_id of
> another entry. I do not believe MySQL can handle this sort of data
> structure. Is this correct? Have others done something similar,
> traversing the RBH scheme from the root node to the bottom?
>
> Below is the pseudocode that I have written which I is the brute force
> / naive solution for RBH 2.4.3 SCHEME. Any sort of comments will be
> greatly appreciated.
>
> function wrapper (string startPath) {
>
>     query := SELECT usr.id, usr.fullpath AS fullpath FROM ENTRIES AS
> grp LEFT JOIN ENTRIES AS usr ON usr.parent_id = grp.id WHERE
> grp.fullpath = 'startPath' AND usr.fullpath != 'startPath'
>     // ^^ Produces one level depth of children on the parent_id of startPath
>
>     foreach result in query {
>         find1000Entries(result)
>         output what is on the stack and email users
>     }
>
> }
>
> function find1000Entries (string startPath) {
>
>     query := SELECT COUNT(usr.fullpath) AS zotCount FROM ENTRIES AS
> grp LEFT JOIN ENTRIES AS usr ON usr.parent_id = grp.id WHERE
> grp.fullpath = 'startPath'
>     // ^^ Produces one level depth of children on the parent_id of startPath
>
>     if(zotCount from query >= 1000)
>         report a match and push to stack
>
>     queryNext := SELECT usr.id, usr.fullpath AS fullpath FROM ENTRIES
> AS grp LEFT JOIN ENTRIES AS usr ON usr.parent_id = grp.id WHERE
> grp.fullpath = 'startPath' AND usr.type = 'dir'
>     // ^^ Produces one level depth of children on the parent_id of startPath
>     // that are only directories
>
>     foreach result in queryNext {
>         find1000Entries(result)
>         ^^^ Call this function again, but now on directory given by queryNext
>         ^^^ keep doing this till no more entires exist
>
>     }
>
> }
>
> --
> Adam Brenner
> Computer Science, Undergraduate Student
> Donald Bren School of Information and Computer Sciences
>
> Research Computing Support
> Office of Information Technology
> http://www.oit.uci.edu/rcs/
>
> University of California, Irvine
> www.ics.uci.edu/~aebrenne/
> [email protected]

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
robinhood-support mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/robinhood-support

Reply via email to