Hi, I sometimes come across relative outlinks in the source that are intended as absolute but where the webmaster or CMS omits the protocol scheme. This results in repeating URI segments and crap URL's.
Would an option that treat such URL's as absolute be a good idea? This problem is similar to the other thread with relative URL's without a base. The issue right now is that Tika already does the URL resolving as part of the parsing so we have no control. Thoughts? Thanks