AFAIK this is not a nutch or hadoop issue, but a basic math problem, which is inherent in walking trees. You have to figure out how many levels of the graph you are willing to walk, and live with the mathematical consequences. The further away you walk, the more untenable the math gets, but for a few levels it can be OK. But it is nothing but a brute force problem.
Hank On Mon, Jul 14, 2008 at 11:57 AM, Dennis Kubes <[EMAIL PROTECTED]> wrote: > Does anybody know how to efficiently (non-exponentially) walk a web graph > to detect cycles. This would be very useful in identifying spammy webpage > and tight knit communities. > > I have a program that I will be releasing soon that does the detection > through converting a webgraph into a tree and walking the tree nodes, but it > is exponential in terms of intermediate map reduce output and computation. > > Dennis > -- blog: whydoeseverythingsuck.com
