[ 
https://issues.apache.org/jira/browse/NUTCH-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474713
 ] 

Dennis Kubes commented on NUTCH-447:
------------------------------------

This tool is for people who need a defined category structure or want to grab 
all or part of the dmoz category structure without urls.  You could certainly 
then use this list as the topic list in the DmozParserTool to only crawl under 
a certain category.  

> Dmoz Structure Parser Tool
> --------------------------
>
>                 Key: NUTCH-447
>                 URL: https://issues.apache.org/jira/browse/NUTCH-447
>             Project: Nutch
>          Issue Type: New Feature
>    Affects Versions: 0.9.0
>         Environment: all platforms
>            Reporter: Dennis Kubes
>         Assigned To: Dennis Kubes
>            Priority: Minor
>         Attachments: dmoz-structure.patch
>
>
> This is a tool that will take the dmoz structure RDF file and return a 
> listing of the categories.  The categories return can be limited by depth or 
> by regular expression pattern.  This tool borrows heavily from the DmozParser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to