XML escaping is probably the best approach. Either surround the whole thing with "<[CDATA[" and "]]>", or do use one of the many libraries out there that will escape the string for you.
While an MD5 is designed to be cryptographically secure one way function, it is NOT guaranteed to be a one-to-one (invertible) function. You could theoretically have two distinct URLs that have the same MD5. > -----Original Message----- > From: Nuno Leitao [mailto:[EMAIL PROTECTED] > Sent: Monday, July 23, 2007 5:22 PM > To: solr-user@lucene.apache.org > Subject: Re: Computing an md5 of a text field. > > Thanks Yonik, > > Basically, I am indexing a number of items where the unique > ID is a URL. Because URL's can contain invalid XML > characters, and I will be doing some XSLT postprocessing, I > was thinking that a good way to solve the problem would be to > store these unique ID's as md5's instead. > > I think I found another alternative - it follows the > pre-processing avenue you suggested. > > Best Regards. > > --Nuno > > On 23 Jul 2007, at 18:25, Yonik Seeley wrote: > > > On 7/23/07, Nuno Leitao <[EMAIL PROTECTED]> wrote: > >> I would like to be able to compute and store the MD5 sum > for a given > >> text in a field (in my case, I am talking about a URL string). For > >> example, if I have a field called 'url' the following would happen: > >> > >> 'http://wiki.apache.org' -> 'cb4f7e6ca1a0c00b146894b75d9f98dc' > > > > First, what are you trying to achieve by this? If you give > people the > > higher level problem, they might be able to suggest a better way. > > > > Since you construct the XML document to send to Solr, > simply compute > > the MD5 and add that also: > > > > <field name="url">http://wiki.apache.org</field> > > <field name="urlMD5">cb4f7e6ca1a0c00b146894b75d9f98dc</field> > > > > Or did you want to store the MD5 instead of the URL? Did > you want it > > searchable somehow? > > > > -Yonik >