Hi,

I have applied Google Summer of Code this year for Apache Nutch project on 
giving support for HTML5 specifications. As you know Nutch uses nekoHtml(by 
default), tagSoup and tika for parsing html pages. What I wonder is, in what 
proportion tika supports HTML5 specifications. What parsers tika has applicable 
for it. Are there any relevant issues on JIRA tracker? Kindly advice me. Any 
help will be appreciated. 

Thanks in advance

Reply via email to