Re: Solrj Stats encoding problem
Yeah, that's right, I just set all the params in q param. Stupid mistake. Thanks, Chris. -- View this message in context: http://lucene.472066.n3.nabble.com/Solrj-Stats-encoding-problem-tp4068429p4069431.html Sent from the Solr - User mailing list archive at Nabble.com.
Solrj Stats encoding problem
Hi, I've tested a query using solr admin web interface and it works fine. But when I'm trying to execute the same search using solrj, it doesn't include Stats information. I've figured out that it's because my query is encoded. Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO 2013-06-30T11:59:59.999Z]stats=truestats.field=numberOfBytesstats.facet=eventType The query in java is like q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType If I copy this query to browser address bar, it doesn't work, but it does if I replace encoded := with original values. What should I do do make it work through java? The code is like the following: SolrQuery solrQuery = new SolrQuery(); solrQuery.setQuery(queryBuilder.toString()); QueryResponse query = getSolrServer().query(solrQuery); -- View this message in context: http://lucene.472066.n3.nabble.com/Solrj-Stats-encoding-problem-tp4068429.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrj Stats encoding problem
Sounds like the Solr Admin UI is too-aggressively encoding the query part of the URL for display. Each query parameter value needs to be encoded, not the entire URL query string as a whole. -- Jack Krupansky -Original Message- From: ethereal Sent: Wednesday, June 05, 2013 4:11 PM To: solr-user@lucene.apache.org Subject: Solrj Stats encoding problem Hi, I've tested a query using solr admin web interface and it works fine. But when I'm trying to execute the same search using solrj, it doesn't include Stats information. I've figured out that it's because my query is encoded. Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO 2013-06-30T11:59:59.999Z]stats=truestats.field=numberOfBytesstats.facet=eventType The query in java is like q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType If I copy this query to browser address bar, it doesn't work, but it does if I replace encoded := with original values. What should I do do make it work through java? The code is like the following: SolrQuery solrQuery = new SolrQuery(); solrQuery.setQuery(queryBuilder.toString()); QueryResponse query = getSolrServer().query(solrQuery); -- View this message in context: http://lucene.472066.n3.nabble.com/Solrj-Stats-encoding-problem-tp4068429.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrj Stats encoding problem
: I've tested a query using solr admin web interface and it works fine. : But when I'm trying to execute the same search using solrj, it doesn't : include Stats information. : I've figured out that it's because my query is encoded. I don't think you are understading how to use SolrJ andthe SolrQuery object : Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO : 2013-06-30T11:59:59.999Z]stats=truestats.field=numberOfBytesstats.facet=eventType : The query in java is like : q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType ... : SolrQuery solrQuery = new SolrQuery(); : solrQuery.setQuery(queryBuilder.toString()); : QueryResponse query = getSolrServer().query(solrQuery); it looks like you are passing the setQuery method an entire URL encoded set of params from a request you made in your browser. the setQuery method is syntactic sugar for for specifying just the q param containing the query string, and it should not alreayd be escaped (ie: eventTimestamp:[2013-06-01T12:00:00.000Z TO 2013-06-30T11:59:59.999Z]). Other methods exist on the SolrQuery object to provide syntactic sugar for other things (ie: specifying facet fields, enabling highlighting, etc...) If you want to provide a list of params using explicit names (q, stats, stats,field, etc...) you can ignore the helper methods on SolrQuery and just direct use the low level methods it inherits from ModifibleSolrParams like setParam ... SolrQuery query = new SolrQuery(); query.setParam(q, eventTimestamp:[2013-06-01T12:00:00.000Z TO 2013-06-30T11:59:59.999Z]); query.setParam(stats, true); query.setParam(stats.field, numberOfBytes,eventType); QueryResponse response = getSolrServer().query(query); -Hoss
Re: Solrj Stats encoding problem
On 6/5/2013 2:11 PM, ethereal wrote: Hi, I've tested a query using solr admin web interface and it works fine. But when I'm trying to execute the same search using solrj, it doesn't include Stats information. I've figured out that it's because my query is encoded. Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO 2013-06-30T11:59:59.999Z]stats=truestats.field=numberOfBytesstats.facet=eventType The query in java is like q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType If I copy this query to browser address bar, it doesn't work, but it does if I replace encoded := with original values. What should I do do make it work through java? The code is like the following: SolrQuery solrQuery = new SolrQuery(); solrQuery.setQuery(queryBuilder.toString()); QueryResponse query = getSolrServer().query(solrQuery); The only QueryBuilder objects I can find are in the Lucene API, so I have no idea what that part of your code is doing. Here's how I would duplicate the query you reference in SolrJ. The query string is broken apart so that the lines won't wrap awkwardly: String url = http://localhost:8983/solr/collection1;; SolrServer server = new HttpSolrServer(url); String qs = eventTimestamp: + [2013-06-01T12:00:00.000Z TO 2013-06-30T11:59:59.999Z]; SolrQuery query = new SolrQuery(); query.setQuery(qs); query.set(stats, true); query.set(stats.field, numberOfBytes); query.set(stats.facet, eventType); QueryResponse rsp = server.query(query); Thanks, Shawn
Encoding problem while indexing
I am working on indexing arabic documents containg arabic diacritics and dotless characters (old arabic characters), I am using Apache Tomcat server, and I am using my modified version of the aramorph analyzer as the arabic analyzer. I managed on the development enviorment to normalize the arabic diacritics and dotless characters (same concept as in the solr.ArabicNormalizationFilterFactory). and i can verfiy that the analyzer is working fine, and i get the correct stem for arabic words. the input text file for testing has a utf-8 encoding. When i build the aramorph jar file and place it under solr lib, the diacritics and the dotless characters splits the word. I made sure that the server.xml contains the URI-Encoding=utf-8. I also made sure that the text being send to solr using solj is utf-8 encoding example : solr.addBean(new Doc(4,new String(حِباًَ.getBytes(UTF8; but nothing is working. I tried to use the analyze link on solr admin for both indexing and querying and both shows that the arabic word is splited if a diacritics or dotless character is found. Do you have any idea what might be the problem schema snippet: fieldType name=text class=solr.TextField analyzer type=index class=gpl.pierrick.brihaye.aramorph.lucene.ArabicNormalizeStemmer/ analyzer type=query class=gpl.pierrick.brihaye.aramorph.lucene.ArabicNormalizeStemmer/ /fieldType I also added the following parameter to the JVM: -Dfile.encoding=UTF-8 Thanks, engy
Re: Encoding problem with ExtractRequestHandler for HTML indexing
I suppose you mean Extract_ing_RequestHandler. Out of curiosity, I sent in a Japanese HTML file of EUC-JP encoding, and it converted to Unicode properly and the index has correct Japanese words. Does your HTML files have META tag for Content-type with the value having charset= ? For example, this is what I have: meta http-equiv=Content-Type content=text/html; charset=EUC-JP / On Mar 21, 2010, at 9:45 AM, Ukyo Virgden wrote: Hi, I'm trying to index HTML documents with different encodings. My html are either in win-12XX, ISO-8859-X or UTF8 encoding. handler correctly parses all html in their respective encodings and indexes. However on the web interface I'm developing I enter query terms in UTF-8 which naturally does not match with content with different encodings. Also the results I see on my web app is not utf8 encoded as expected. My question, is there any filter I can use to convert all content extracted by the handler to UTF-8 prior to indexing? Does it make sense to write a filter which would convert tokens to UTF-8, or even is it possible with multiple encodings? Thanks in advance. Ukyo Teruhiko Kuro Kurosaka RLP + Lucene Solr = powerful search for global contents
Encoding problem with ExtractRequestHandler for HTML indexing
Hi, I'm trying to index HTML documents with different encodings. My html are either in win-12XX, ISO-8859-X or UTF8 encoding. handler correctly parses all html in their respective encodings and indexes. However on the web interface I'm developing I enter query terms in UTF-8 which naturally does not match with content with different encodings. Also the results I see on my web app is not utf8 encoded as expected. My question, is there any filter I can use to convert all content extracted by the handler to UTF-8 prior to indexing? Does it make sense to write a filter which would convert tokens to UTF-8, or even is it possible with multiple encodings? Thanks in advance. Ukyo
RE: encoding problem
Finally resolved the problem! The solution was 3-pronged on my windows PC- Added to my.ini under mysqld- default-character-set=utf8 collation_server=utf8_unicode_ci character_set_server=utf8 skip-character-set-client-handshake Added to JAVA_OPTS environmental variable – -Dfile.encoding=UTF-8 Added to beginning of tomcat startup.bat (positioning is important!) set JAVA_OPTS=-Dfile.encoding=UTF-8 Thanks to everyone for their much appreciated help! Bern -Original Message- From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] Sent: Monday, 31 August 2009 9:18 AM To: 'solr-user@lucene.apache.org' Subject: RE: encoding problem Still having a few issues with encoding, although I've been able to resolve the particular issue below by just re-editing the affected record. The other encoding issue is with Greek characters. With solr turned off in our user-facing application, greek characters e.g. α,ω (small alpha, small omega) display correctly. But with solr turned on, garbage displays instead. If we enter the characters as decimal (e.g. #969;), all displays OK with or without solr. Does this suggest anything to anyone?? TIA bern
RE: encoding problem
Still having a few issues with encoding, although I've been able to resolve the particular issue below by just re-editing the affected record. The other encoding issue is with Greek characters. With solr turned off in our user-facing application, greek characters e.g. α,ω (small alpha, small omega) display correctly. But with solr turned on, garbage displays instead. If we enter the characters as decimal (e.g. #969;), all displays OK with or without solr. Does this suggest anything to anyone?? TIA bern -Original Message- From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] Sent: Friday, 28 August 2009 9:31 AM To: 'solr-user@lucene.apache.org'; 'yo...@lucidimagination.com' Subject: RE: encoding problem Shalin, the XML from solr admin for the relevant field is displaying as - str name=citation_ta title=Browse by Author Name for Moncrieff, Joan href=/fez/list/author/Moncrieff%2C+Joan/Moncrieff, Joan/a, a title=Browse by Author Name for Macauley, Peter href=/fez/list/author/Macauley%2C+Peter/Macauley, Peter/a and a title=Browse by Author Name for Epps, Janine href=/fez/list/author/Epps%2C+Janine/Epps, Janine/a a title=Browse by Year 2006 href=/fez/list/year/2006/2006/a, a title=Click to view Journal, Media Article: ldquo;My Universe is Hererdquo;: Implications For the Future of Academic Libraries From the Results of a Survey of Researchers href=/fez/view/changeme:156“My Universe is Hereâ€Â�: Implications For the Future of Academic Libraries From the Results of a Survey of Researchers/ai/i, vol. 38, no. 2, pp. 71-83./str The weird thing is that the title displays OK in one place, but not in the href bit. bern
RE: encoding problem
Hi Shalin, strangely, things still aren't working. I've set the JAVA_OPTS through either the GUI or to startup.bat, but absolutely no impact. Have tried reindexing also, but still no impact - results such as - “My Universe is Here� bern -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, 26 August 2009 5:50 PM To: solr-user@lucene.apache.org Subject: Re: encoding problem On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Thanks for your quick reply, Shalin. Tomcat is running on my Windows machine, but does not appear in Windows Services (as I was expecting it should ... am I wrong?). I'm running it from a startup.bat on my desktop - see below. Do I add the Dfile line to the startup.bat? SOLR is part of the repository software that we are running. Tomcat respects an environment variable called JAVA_OPTS through which you can pass any jvm argument (e.g. heap size, file encoding). Set JAVA_OPTS=-Dfile.encoding=UTF-8 either through the GUI or by adding the following to startup.bat: set JAVA_OPTS=-Dfile.encoding=UTF-8 -- Regards, Shalin Shekhar Mangar.
Re: encoding problem
Have you determined if the problem is on the indexing side or the query side? I don't see any reason you should have to set/change any encoding in the JVM. -Yonik http://www.lucidimagination.com On Thu, Aug 27, 2009 at 7:03 PM, Bernadette Houghtonbernadette.hough...@deakin.edu.au wrote: Hi Shalin, strangely, things still aren't working. I've set the JAVA_OPTS through either the GUI or to startup.bat, but absolutely no impact. Have tried reindexing also, but still no impact - results such as - “My Universe is Here� bern -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, 26 August 2009 5:50 PM To: solr-user@lucene.apache.org Subject: Re: encoding problem On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Thanks for your quick reply, Shalin. Tomcat is running on my Windows machine, but does not appear in Windows Services (as I was expecting it should ... am I wrong?). I'm running it from a startup.bat on my desktop - see below. Do I add the Dfile line to the startup.bat? SOLR is part of the repository software that we are running. Tomcat respects an environment variable called JAVA_OPTS through which you can pass any jvm argument (e.g. heap size, file encoding). Set JAVA_OPTS=-Dfile.encoding=UTF-8 either through the GUI or by adding the following to startup.bat: set JAVA_OPTS=-Dfile.encoding=UTF-8 -- Regards, Shalin Shekhar Mangar.
RE: encoding problem
Shalin, the XML from solr admin for the relevant field is displaying as - str name=citation_ta title=Browse by Author Name for Moncrieff, Joan href=/fez/list/author/Moncrieff%2C+Joan/Moncrieff, Joan/a, a title=Browse by Author Name for Macauley, Peter href=/fez/list/author/Macauley%2C+Peter/Macauley, Peter/a and a title=Browse by Author Name for Epps, Janine href=/fez/list/author/Epps%2C+Janine/Epps, Janine/a a title=Browse by Year 2006 href=/fez/list/year/2006/2006/a, a title=Click to view Journal, Media Article: ldquo;My Universe is Hererdquo;: Implications For the Future of Academic Libraries From the Results of a Survey of Researchers href=/fez/view/changeme:156“My Universe is Here�: Implications For the Future of Academic Libraries From the Results of a Survey of Researchers/ai/i, vol. 38, no. 2, pp. 71-83./str The weird thing is that the title displays OK in one place, but not in the href bit. bern
RE: encoding problem
Hi Shalin, stupid question - I'm an apache/solr newbie - but how do I access the JVM??? Regards Bern -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Wednesday, 26 August 2009 5:10 PM To: solr-user@lucene.apache.org Subject: Re: encoding problem On Wed, Aug 26, 2009 at 10:24 AM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: We have an encoding problem with our solr application. That is, non-ASCII chars displaying fine in SOLR, but in googledegook in our application . Our tomcat server.xml file already contains URIencoding=UTF-8 under the relevant connector. A google search reveals that I should set the encoding for the JVM, but have no idea how to do this. I'm running Windows, and there is no tomcat process in my Windows Services. Add the following parameter to the JVM: -Dfile.encoding=UTF-8 -- Regards, Shalin Shekhar Mangar.
Re: encoding problem
On Wed, Aug 26, 2009 at 12:42 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Hi Shalin, stupid question - I'm an apache/solr newbie - but how do I access the JVM??? When you execute the java executable, just add -Dfile.encoding=UTF-8 as a command line argument to the executable. How are you consuming Solr? You mentioned there is no tomcat, is your solr client a desktop java application? -- Regards, Shalin Shekhar Mangar.
RE: encoding problem
Thanks for your quick reply, Shalin. Tomcat is running on my Windows machine, but does not appear in Windows Services (as I was expecting it should ... am I wrong?). I'm running it from a startup.bat on my desktop - see below. Do I add the Dfile line to the startup.bat? SOLR is part of the repository software that we are running. Thanks! BERN Startup.bat - @echo off if %OS% == Windows_NT setlocal rem --- rem Start script for the CATALINA Server rem rem $Id: startup.bat 302918 2004-05-27 18:25:11Z yoavs $ rem --- rem Guess CATALINA_HOME if not defined set CURRENT_DIR=%cd% if not %CATALINA_HOME% == goto gotHome set CATALINA_HOME=%CURRENT_DIR% if exist %CATALINA_HOME%\bin\catalina.bat goto okHome cd .. set CATALINA_HOME=%cd% cd %CURRENT_DIR% :gotHome if exist %CATALINA_HOME%\bin\catalina.bat goto okHome echo The CATALINA_HOME environment variable is not defined correctly echo This environment variable is needed to run this program goto end :okHome set EXECUTABLE=%CATALINA_HOME%\bin\catalina.bat rem Check that target executable exists if exist %EXECUTABLE% goto okExec echo Cannot find %EXECUTABLE% echo This file is needed to run this program goto end :okExec rem Get remaining unshifted command line arguments and save them in the set CMD_LINE_ARGS= :setArgs if %1== goto doneSetArgs set CMD_LINE_ARGS=%CMD_LINE_ARGS% %1 shift goto setArgs :doneSetArgs call %EXECUTABLE% start %CMD_LINE_ARGS% :end
Re: encoding problem
On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton bernadette.hough...@deakin.edu.au wrote: Thanks for your quick reply, Shalin. Tomcat is running on my Windows machine, but does not appear in Windows Services (as I was expecting it should ... am I wrong?). I'm running it from a startup.bat on my desktop - see below. Do I add the Dfile line to the startup.bat? SOLR is part of the repository software that we are running. Tomcat respects an environment variable called JAVA_OPTS through which you can pass any jvm argument (e.g. heap size, file encoding). Set JAVA_OPTS=-Dfile.encoding=UTF-8 either through the GUI or by adding the following to startup.bat: set JAVA_OPTS=-Dfile.encoding=UTF-8 -- Regards, Shalin Shekhar Mangar.
RE: encoding problem
If you are complaining about Web Application (other than SOLR) (probably behind-the Apache HTTPD) having encoding problem - try to troubleshoot it with Mozilla Firefox + Live Http Headers plugin. Look at Content-Encoding HTTP response headers, and don't forget about meta http-equiv... tag inside HTML... -Fuad http://www.tokenizer.org -Original Message- From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] Sent: August-26-09 12:55 AM To: 'solr-user@lucene.apache.org' Subject: encoding problem We have an encoding problem with our solr application. That is, non-ASCII chars displaying fine in SOLR, but in googledegook in our application . Our tomcat server.xml file already contains URIencoding=UTF-8 under the relevant connector. A google search reveals that I should set the encoding for the JVM, but have no idea how to do this. I'm running Windows, and there is no tomcat process in my Windows Services. TIA Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: bern_hough...@hotmail.com Email: bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au Website: http://www.deakin.edu.au http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic) Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free
encoding problem
We have an encoding problem with our solr application. That is, non-ASCII chars displaying fine in SOLR, but in googledegook in our application . Our tomcat server.xml file already contains URIencoding=UTF-8 under the relevant connector. A google search reveals that I should set the encoding for the JVM, but have no idea how to do this. I'm running Windows, and there is no tomcat process in my Windows Services. TIA Bernadette Houghton, Library Business Applications Developer Deakin University Geelong Victoria 3217 Australia. Phone: 03 5227 8230 International: +61 3 5227 8230 Fax: 03 5227 8000 International: +61 3 5227 8000 MSN: bern_hough...@hotmail.com Email: bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au Website: http://www.deakin.edu.au http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic) Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free
Re: Encoding problem
Thanks,I detected that same problem. I have CP 1252 system file encoding and was recording data-config.xml file in UTF-8. DIH was reading using the default encoding. One possible workarround was using InputStream and OutputStream like DIH, but the files won't be in UTF-8 if the system has different encoding (not really good for XML files). I will get the latest 1.4 build and maintain the files in UTF-8. On Fri, Mar 27, 2009 at 9:37 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Sat, Mar 28, 2009 at 12:51 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: I see that you are specifying the topologyname's value in the query itself. It might be a bug in DataImportHandler because it reads the data-config as a string from an InputStream. If your default platform encoding is not UTF-8, this may be the cause. I've opened SOLR-1090 to fix this issue. https://issues.apache.org/jira/browse/SOLR-1090 -- Regards, Shalin Shekhar Mangar.
Encoding problem
I'm having problems with encoding in responses from search queries. The encoding problem only occurs in the topologyname field, if a instancename has accents it is returned correctly. In all my configurations I have UTF-8. ?xml version=1.0 encoding=UTF-8? dataConfig document name=topologies entity query=SELECT DISTINCT '3141-' || Sub0.SUBID as id, 'Inventário' as topologyname, 3141 as topologyid, Sub0.SUBID as instancekey, Sub0.NAME as instancename FROM ... field column=INSTANCEKEY name=instancekey/ field column=ID name=id/ field column=TOPOLOGYID name=topologyid/ field column=INSTANCENAME name=instancename/ field column=TOPOLOGYNAME name=topologyname/... As an example, I can have in the response the following result: doc long name=instancekey285/long str name=instancenameInformática/str long name=topologyid3141/long str name=topologynameInventário/str /doc Thanks in advance, Rui Pereira
Re: Encoding problem
Hi, I had the same problem with DATAIMPORTHandler : i have a utf-8 mysql DATABASE but it's seems that DIH import data in LATIN... So i just use Transformer to (re)encode my strings in UTF-8. Rui Pereira-2 wrote: I'm having problems with encoding in responses from search queries. The encoding problem only occurs in the topologyname field, if a instancename has accents it is returned correctly. In all my configurations I have UTF-8. ?xml version=1.0 encoding=UTF-8? dataConfig document name=topologies entity query=SELECT DISTINCT '3141-' || Sub0.SUBID as id, 'Inventário' as topologyname, 3141 as topologyid, Sub0.SUBID as instancekey, Sub0.NAME as instancename FROM ... field column=INSTANCEKEY name=instancekey/ field column=ID name=id/ field column=TOPOLOGYID name=topologyid/ field column=INSTANCENAME name=instancename/ field column=TOPOLOGYNAME name=topologyname/... As an example, I can have in the response the following result: doc long name=instancekey285/long str name=instancenameInformática/str long name=topologyid3141/long str name=topologynameInventário/str /doc Thanks in advance, Rui Pereira -- View this message in context: http://www.nabble.com/Encoding-problem-tp22743698p22745133.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Encoding problem
On Fri, Mar 27, 2009 at 8:41 PM, Rui Pereira ruipereira...@gmail.comwrote: I'm having problems with encoding in responses from search queries. The encoding problem only occurs in the topologyname field, if a instancename has accents it is returned correctly. In all my configurations I have UTF-8. ?xml version=1.0 encoding=UTF-8? dataConfig document name=topologies entity query=SELECT DISTINCT '3141-' || Sub0.SUBID as id, 'Inventário' as topologyname, 3141 as topologyid, Sub0.SUBID as instancekey, Sub0.NAME as instancename FROM ... field column=INSTANCEKEY name=instancekey/ field column=ID name=id/ field column=TOPOLOGYID name=topologyid/ field column=INSTANCENAME name=instancename/ field column=TOPOLOGYNAME name=topologyname/... As an example, I can have in the response the following result: doc long name=instancekey285/long str name=instancenameInformática/str long name=topologyid3141/long str name=topologynameInventário/str /doc I see that you are specifying the topologyname's value in the query itself. It might be a bug in DataImportHandler because it reads the data-config as a string from an InputStream. If your default platform encoding is not UTF-8, this may be the cause. Can you try running the Solr's (or your servlet-container's) java process with -Dfile.encoding=UTF-8 and see if that fixes the problem? -- Regards, Shalin Shekhar Mangar.
Re: Encoding problem
On Sat, Mar 28, 2009 at 12:51 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: I see that you are specifying the topologyname's value in the query itself. It might be a bug in DataImportHandler because it reads the data-config as a string from an InputStream. If your default platform encoding is not UTF-8, this may be the cause. I've opened SOLR-1090 to fix this issue. https://issues.apache.org/jira/browse/SOLR-1090 -- Regards, Shalin Shekhar Mangar.
UTF-8 encoding problem on one of two Solr setups
Hi all, I have set up an identical Solr 1.1 on two different machines. One works fine, the other one has a UTF-8 encoding problem. #1 is my local Windows XP machine. Solr is running basically in a configuration like in the tutorial example with Jetty/5.1.11RC0 (Windows XP/5.1 x86 java/1.6.0). Everything works fine here as expected. #2 is a Linux machine with Solr running inside Tomcat 6. The problem happens here. This is the place where Solr will be running finally. To rule out all problems in my PHP and Java code, I tested the problem with the Solr admin page and it happens there as well. (Tested with Firefox 2 with site's char encoding UTF-8.) When entering an arbitrary search string containing UTF-8 chars I get a correct response from the local Windows Solr setup: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=indenton/str str name=start0/str str name=qMünchen/str -- sample string containing a German umlaut-u str name=rows10/str str name=version2.2/str /lst /lst [...] When I do exactly the same, just on the admin page of the other Solr setup (but from exactly the same browser), I get the following response: [...] str name=qitem$searchstring_de:München/str [...] Obviously the umlaut-u UTF-8 bytes 0xC3 0xB6 had been interpreted as two 8-bit chars instead of one UTF-8 char. Unfortunately I am pretty new to Solr, Tomcat and related topics, so I was not able to find the problem yet. My guess is that it is outside of Solr, maybe in the Tomcat configuration, but so far I spent the entire day without a further clue. But apart from that Solr really rocks. Indexing tons of content and searching works just fine and fast and it was pretty easy to get into everything. Now I am changing all data to UTF-8 and ran into my first serious obstacle... after a few weeks of Solr usage! Any hint/help appreciated. Thank you very much. Mario
Re: UTF-8 encoding problem on one of two Solr setups
This may be your problem. The below docs are for the HTTP connector, simlar configuration can be made to the AJP and other connectors See http://tomcat.apache.org/tomcat-6.0-doc/config/http.html URIEncoding This specifies the character encoding used to decode the URI bytes, after %xx decoding the URL. If not specified, ISO-8859-1 will be used. -Sean [EMAIL PROTECTED] wrote: Hi all, I have set up an identical Solr 1.1 on two different machines. One works fine, the other one has a UTF-8 encoding problem. #1 is my local Windows XP machine. Solr is running basically in a configuration like in the tutorial example with Jetty/5.1.11RC0 (Windows XP/5.1 x86 java/1.6.0). Everything works fine here as expected. #2 is a Linux machine with Solr running inside Tomcat 6. The problem happens here. This is the place where Solr will be running finally. To rule out all problems in my PHP and Java code, I tested the problem with the Solr admin page and it happens there as well. (Tested with Firefox 2 with site's char encoding UTF-8.) When entering an arbitrary search string containing UTF-8 chars I get a correct response from the local Windows Solr setup: ?xml version="1.0" encoding="UTF-8"? response lst name="responseHeader" int name="status"0/int int name="QTime"0/int lst name="params" str name="indent"on/str str name="start"0/str str name="q"Mnchen/str -- sample string containing a German umlaut-u str name="rows"10/str str name="version"2.2/str /lst /lst [...] When I do exactly the same, just on the admin page of the other Solr setup (but from exactly the same browser), I get the following response: [...] str name="q"item$searchstring_de:Mnchen/str [...] Obviously the umlaut-u UTF-8 bytes 0xC3 0xB6 had been interpreted as two 8-bit chars instead of one UTF-8 char. Unfortunately I am pretty new to Solr, Tomcat and related topics, so I was not able to find the problem yet. My guess is that it is outside of Solr, maybe in the Tomcat configuration, but so far I spent the entire day without a further clue. But apart from that Solr really rocks. Indexing tons of content and searching works just fine and fast and it was pretty easy to get into everything. Now I am changing all data to UTF-8 and ran into my first serious obstacle... after a few weeks of Solr usage! Any hint/help appreciated. Thank you very much. Mario
RE: UTF-8 encoding problem on one of two Solr setups
You might want to check out this page http://wiki.apache.org/solr/SolrTomcat Tomcat needs a small config change out of the box to properly support UTF-8. Thanks, Charlie -Original Message- From: Mario Knezovic [mailto:[EMAIL PROTECTED] Sent: Friday, August 17, 2007 12:58 PM To: solr-user@lucene.apache.org Subject: UTF-8 encoding problem on one of two Solr setups Hi all, I have set up an identical Solr 1.1 on two different machines. One works fine, the other one has a UTF-8 encoding problem. #1 is my local Windows XP machine. Solr is running basically in a configuration like in the tutorial example with Jetty/5.1.11RC0 (Windows XP/5.1 x86 java/1.6.0). Everything works fine here as expected. #2 is a Linux machine with Solr running inside Tomcat 6. The problem happens here. This is the place where Solr will be running finally. To rule out all problems in my PHP and Java code, I tested the problem with the Solr admin page and it happens there as well. (Tested with Firefox 2 with site's char encoding UTF-8.) When entering an arbitrary search string containing UTF-8 chars I get a correct response from the local Windows Solr setup: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime0/int lst name=params str name=indenton/str str name=start0/str str name=qMünchen/str -- sample string containing a German umlaut-u str name=rows10/str str name=version2.2/str /lst /lst [...] When I do exactly the same, just on the admin page of the other Solr setup (but from exactly the same browser), I get the following response: [...] str name=qitem$searchstring_de:München/str [...] Obviously the umlaut-u UTF-8 bytes 0xC3 0xB6 had been interpreted as two 8-bit chars instead of one UTF-8 char. Unfortunately I am pretty new to Solr, Tomcat and related topics, so I was not able to find the problem yet. My guess is that it is outside of Solr, maybe in the Tomcat configuration, but so far I spent the entire day without a further clue. But apart from that Solr really rocks. Indexing tons of content and searching works just fine and fast and it was pretty easy to get into everything. Now I am changing all data to UTF-8 and ran into my first serious obstacle... after a few weeks of Solr usage! Any hint/help appreciated. Thank you very much. Mario