[ https://issues.apache.org/jira/browse/NUTCH-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474713 ]
Dennis Kubes commented on NUTCH-447: ------------------------------------ This tool is for people who need a defined category structure or want to grab all or part of the dmoz category structure without urls. You could certainly then use this list as the topic list in the DmozParserTool to only crawl under a certain category. > Dmoz Structure Parser Tool > -------------------------- > > Key: NUTCH-447 > URL: https://issues.apache.org/jira/browse/NUTCH-447 > Project: Nutch > Issue Type: New Feature > Affects Versions: 0.9.0 > Environment: all platforms > Reporter: Dennis Kubes > Assigned To: Dennis Kubes > Priority: Minor > Attachments: dmoz-structure.patch > > > This is a tool that will take the dmoz structure RDF file and return a > listing of the categories. The categories return can be limited by depth or > by regular expression pattern. This tool borrows heavily from the DmozParser. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers