Hi,

I sometimes come across relative outlinks in the source that are intended as 
absolute but where the webmaster or CMS omits the protocol scheme. This 
results in repeating URI segments and crap URL's. 

Would an option that treat such URL's as absolute be a good idea? This problem 
is similar to the other thread with relative URL's without a base. 

The issue right now is that Tika already does the URL resolving as part of the 
parsing so we have no control.

Thoughts?
Thanks

Reply via email to