Hi, Thanks for the response and sorry for not replying earlier.
I would like just to note that in case of nutch 1.4 the default parser used (probably this can change) is the "html parser" and the source code can be found under the "apache-nutch-1.4-src\src\plugin\parse-html\src\java" Best -- View this message in context: http://lucene.472066.n3.nabble.com/Class-in-the-code-that-handles-parsing-of-html-files-and-selection-of-URLs-tp3890250p3950595.html Sent from the Nutch - User mailing list archive at Nabble.com.

