Hi,

Apologies, I managed to send the mail too soon. Please see below for reply

On Sun, Oct 20, 2013 at 4:17 PM, <[email protected]> wrote:

> Re: URL Encoding Issues in Apache Any23
>         110 by: S.L
>
> That is correct , that is the only discrepancy that I have noticed so far
> ,
>

OK, so at least we are on the same page regarding the actual problem.


> I think whats happening here is that any23 is encoding an already encoded
> URL , I have not found a way to avoid that in Java i.e avoid encoding an
> already encoded URL.
>

Possibly yes this seems to be what is happening. My hunch is that the
question we need to be asking (and addressing) is whether this problem is
via the TikaEncodingDetector [0] and hence attributable directly to Tika or
whether it is something within Any23.



> Is there a way to do so ? Does any23 consider the possibility of the URL
> being already encoded ?
>
> Again, it looks to me like this may be a Tika question. You can try
debugging your code as it executes. I would suggest that you look around
Line 561 of the SingleDocumentExtractor [1] as this seems to be where the
magic is happening. I would focus on this class for now until you can
pin-point exactly where the URL encoding is happening.

hth

[0] *http://s.apache.org/ILT*
[1] *http://s.apache.org/DhK*

Reply via email to