I am trying to crawl and index Urls based on the their depth levels. In my scenario, I am interested in two content types: html and images. For images, I need to index any imaged based Url regardless of its depth. However, for html content, I only need to index them if they are provided via my seed list (depth 1).
I am thinking of writing a custom indexFilter plugin that returns an empty document if the parsed content meets the condition above. However, I do not know how to get the depth of a Url. So, I looked into the scoring-depth plugin and it seems I can get the depth using : String depthString = parseData.getMeta(DEPTH_KEY); Can I do that or there is a better way? Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-get-the-depth-of-url-in-nutch-tp4152122.html Sent from the Nutch - User mailing list archive at Nabble.com.

