Hello - this is a good question and i probably agree. I've just read your and 
Ken's conversation at Tika's list and associated Jira and will get back 
tomorrow. The bottom line, if i missed script links in the patch, it is my 
mistake and we should correct it.

M.

 
 
-----Original message-----
> From:Joseph Naegele <[email protected]>
> Sent: Tuesday 5th April 2016 21:45
> To: [email protected]
> Subject: collect script tags using parse-tika
> 
> Hi all,
> 
>  
> 
> I asked this on the Tika user list, but I want to bring it up here as well:
> 
>  
> 
> The parse-tika plugin is appealing because it offers the ability to use
> Boilerpipe, however it doesn't parse <script> tags as outlinks like
> parse-html does. Does anyone know of a good reason parse-tika *shouldn't*
> parse <script src="."> tags as outlinks? If not, I'll propose adding this
> functionality to Tika's LinkContentHandler.
> 
>  
> 
> Thanks,
> 
> Joe
> 
> 

Reply via email to