[jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2006-06-20 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ] [EMAIL PROTECTED] updated NUTCH-110: Attachment: fixIllegalXmlChars08-v5.patch No, the double call to getLegalXml is not intentional. Its a mistake. Thanks for finding it. I've attached

[jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2006-06-19 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ] [EMAIL PROTECTED] updated NUTCH-110: Attachment: fixIllegalXmlChars08-v4.patch v3 mistakenly included debugging code. Attached cleaned up v4. OpenSearchServlet outputs illegal xml

[jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2006-06-16 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ] [EMAIL PROTECTED] updated NUTCH-110: Attachment: fixIllegalXmlChars08-v3.patch Version of patch that doesn't ...process the String twice if it contains some illegal characters!. Its name

[jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2006-06-16 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ] [EMAIL PROTECTED] updated NUTCH-110: Version: 0.8-dev (was: 0.7) Was version 0.7. Changed 'Affects Version' to 0.8-dev. OpenSearchServlet outputs illegal xml characters

[jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2006-05-25 Thread Stefan Neufeind (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ] Stefan Neufeind updated NUTCH-110: -- Attachment: fixIllegalXmlChars08.patch Since original patch didn't cleanly apply for me on 0.8-dev (nightly-2006-05-20) I re-did it for 0.8 ... With this

[jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2005-10-14 Thread [EMAIL PROTECTED] (JIRA)
: (NUTCH-110) OpenSearchServlet outputs illegal xml characters ... So, will I amend the patch in NUTCH-110 so it uses XMLSerializerHelper#toValidXmlText in place of #getLegalXml method? Copy the method's contents. It doesn't really make sense to copy the entire class just for this method. Good luck

Re: [jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2005-10-14 Thread stack
Dawid Weiss wrote: ... So, will I amend the patch in NUTCH-110 so it uses XMLSerializerHelper#toValidXmlText in place of #getLegalXml method? Copy the method's contents. It doesn't really make sense to copy the entire class just for this method. Good luck. Thanks Dawid. I've just

Re: [jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2005-10-13 Thread Andrzej Bialecki
Chris Mattmann wrote: Hi, I'm not an XML expert by any means, but wouldn't it be simpler to just wrap any text where illegal chars are possible with a !CDATA[ ]! tag? That way, the offending characters won't be dropped and the process won't be lossy, no? If the CDATA method won't work,

Re: [jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2005-10-13 Thread Andrzej Bialecki
Dawid Weiss wrote: We should not drop the offending characters, but escape them. Either the Unicode entity (#nn;) or CDATA way is ok (and CDATA way is simpler). This isn't entirely true, Andrzej -- escaping a character, or putting it in a CDATA section is just about different ways of

Re: [jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2005-10-13 Thread Dawid Weiss
Right, I didn't think about this... somehow I thought this was all about special characters like ' . Oh, believe me: this knowledge came from sour experience not from book wisdom... I know for sure some XML parsers complain about invalid characters, while others don't. Then we should

Re: [jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2005-10-13 Thread stack
Andrzej Bialecki wrote: Then we should take the best of both worlds - escape valid characters, and replace invalid ones with '?' or space, or nothing. I know a place where we could find some inspiration (Carrot2 XMLSerializerHelper.java ... ;-) ) Thanks for the pointer. See starting

Re: [jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2005-10-13 Thread Dawid Weiss
The differences between this method and the patch supplied in NUTCH-110 are: Take a closer look at the source code -- 1. XMLSerializerHelper#toValidXmlText throws an exception when an invalid character whereas NUTCH-110 just drops it. Not really, it is governed by a boolean flag. If

[jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2005-10-12 Thread [EMAIL PROTECTED] (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ] [EMAIL PROTECTED] updated NUTCH-110: Attachment: fixIllegalXmlChars.patch Attached patch runs all xml text through a check for bad xml characters. This patch is brutal dropping silently

RE: [jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters

2005-10-12 Thread Chris Mattmann
, or the California Institute of Technology. -Original Message- From: [EMAIL PROTECTED] (JIRA) [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 12, 2005 5:19 PM To: nutch-dev@incubator.apache.org Subject: [jira] Updated: (NUTCH-110) OpenSearchServlet outputs illegal xml characters