I think you may have to implement it yourself since it is so specific.
2008/11/13 Windflying <[EMAIL PROTECTED]> > Hi all, > > > > I'd like to use nutch to crawl my internal company svn repository. But it > cannot work, always "0 urls". > > The structure is similar to http://svn.macosforge.org/repository/macports/ > site. > > > > Thanks a lot for Alex's great help and clear interpretation, always be > good, > man, finally I got the reason as following (from Alex's saying): > > "Look at the source for the http://svn.macosforge.org/repository/macports/ > site. It contains SVN internal xml markup and it is not HTML. When brawser > downloads content from this page it automaticaly applies XSL stylesheet > refernced from the XML and whcih produces HTML. > Nutch cannot do it by default. When it download content it tries to parse > it > with HTML parser and of cource doesn't see the <a> tag and so doesn't > produce new links. > I am affraid you should develope special plugin which would apply XML > stylesheet and place it before HTML paser. " > > > > Does anybody know if there is a plugin to make nutch parse XML stylesheet? > Any idear? > > > > Thanks. > > > > Bryan > > > >
