Hi, I've been working on adding some Solr-integration into my current project, but have run into a problem with non-ascii characters.
I send a document like the following: --- <?xml version="1.0" encoding="UTF-8"?> <add><doc> <field name="question_id">228</field> <field name="question_title">Vedhæft billede til min formular</field> <field name="userid">26</field> <field name="question_text">Jeg har lavet en side som skal info om værkstedet Badsetuen i Odense, som er under kraftig omlægning af kommunen - dvs nedskæring. Jeg har her oprettet en formular hvor brugere kan sende en tekst på email om deres håndværk udført på stedet. Jeg mangler et felt til at vedhæfte billede http://www.badstuen.dannyboyd.dk/ Nogle ideer ?</field> <field name="question_date">2006-05-17T08:44:23Z</field> <field name="question_tags">Upload</field> <field name="question_tags">HTML</field> <field name="question_tags">Email</field> <field name="question_tags">Vedhæftning</field> </doc></add> --- But when I do a search like "/solr/select/?q=billede" (default search is the field "text" which is a multiValued copyField from question_title and question_text) I will get the document back as --- ?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> ... </lst> <result name="response" numFound="1" start="0"> <doc> <date name="question_date">2006-05-17T08:44:23Z</date> <int name="question_id">228</int> <arr name="question_tags"><str>Upload</str><str>HTML</str><str>Email</str> <str>Vedhæftning</str></arr> <str name="question_text">Jeg har lavet en side som skal info om værkstedet Badsetuen i Odense, som er under kraftig omlægning af kommunen - dvs nedskæring. Jeg har her oprettet en formular hvor brugere kan sende en tekst pÃ¥ email om deres hÃ¥ndværk udført pÃ¥ stedet. Jeg mangler et felt til at vedhæfte billede http://www.badstuen.dannyboyd.dk/ Nogle ideer ?</str> <str name="question_title">Vedhæft billede til min formular</str> <str name="userid">26</str> </doc> </result> </response> --- Which is basicly the same text, but displayed as ISO-8859-1. How can this be? Do I have to send off some header saying it is UTF-8, or should I just send the data as UTF-8 (that produces the correct encoding in answers, but sounds like a silly way of doing it) Any ideas? Btw, the install-script listed at http://wiki.apache.org/solr/SolrTomcat is a bit wrong. Should I just contribute the fixes (new solr dir and name to fetch) to the wiki, or will any of you guys rather do it yourself? Regards -fangel