Re: Solrj Stats encoding problem

2013-06-10 Thread ethereal
Yeah, that's right, I just set all the params in q param. Stupid mistake.
Thanks, Chris.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-Stats-encoding-problem-tp4068429p4069431.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solrj Stats encoding problem

2013-06-05 Thread ethereal
Hi,

I've tested a query using solr admin web interface and it works fine.
But when I'm trying to execute the same search using solrj, it doesn't
include Stats information.
I've figured out that it's because my query is encoded.
Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO
2013-06-30T11:59:59.999Z]stats=truestats.field=numberOfBytesstats.facet=eventType
The query in java is like
q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType
If I copy this query to browser address bar, it doesn't work, but it does if
I replace encoded := with original values. What should I do do make it work
through java?
The code is like the following:

SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery(queryBuilder.toString());
QueryResponse query = getSolrServer().query(solrQuery);



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-Stats-encoding-problem-tp4068429.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solrj Stats encoding problem

2013-06-05 Thread Jack Krupansky
Sounds like the Solr Admin UI is too-aggressively encoding the query part of 
the URL for display. Each query parameter value needs to be encoded, not the 
entire URL query string as a whole.


-- Jack Krupansky

-Original Message- 
From: ethereal

Sent: Wednesday, June 05, 2013 4:11 PM
To: solr-user@lucene.apache.org
Subject: Solrj Stats encoding problem

Hi,

I've tested a query using solr admin web interface and it works fine.
But when I'm trying to execute the same search using solrj, it doesn't
include Stats information.
I've figured out that it's because my query is encoded.
Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO
2013-06-30T11:59:59.999Z]stats=truestats.field=numberOfBytesstats.facet=eventType
The query in java is like
q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType
If I copy this query to browser address bar, it doesn't work, but it does if
I replace encoded := with original values. What should I do do make it work
through java?
The code is like the following:

SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery(queryBuilder.toString());
QueryResponse query = getSolrServer().query(solrQuery);



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solrj-Stats-encoding-problem-tp4068429.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Solrj Stats encoding problem

2013-06-05 Thread Chris Hostetter

: I've tested a query using solr admin web interface and it works fine.
: But when I'm trying to execute the same search using solrj, it doesn't
: include Stats information.
: I've figured out that it's because my query is encoded.

I don't think you are understading how to use SolrJ andthe SolrQuery 
object

: Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO
: 
2013-06-30T11:59:59.999Z]stats=truestats.field=numberOfBytesstats.facet=eventType
: The query in java is like
: 
q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType

...

: SolrQuery solrQuery = new SolrQuery();
: solrQuery.setQuery(queryBuilder.toString());
: QueryResponse query = getSolrServer().query(solrQuery);

it looks like you are passing the setQuery method an entire URL encoded 
set of params from a request you made in your browser.  the setQuery 
method is syntactic sugar for for specifying just the q param containing 
the query string, and it should not alreayd 
be escaped (ie: eventTimestamp:[2013-06-01T12:00:00.000Z TO 
2013-06-30T11:59:59.999Z]).  Other methods exist on the SolrQuery 
object to provide syntactic sugar for other things (ie: specifying facet 
fields, enabling highlighting, etc...)

If you want to provide a list of params using explicit names (q, stats, 
stats,field, etc...) you can ignore the helper methods on SolrQuery and 
just direct use the low level methods it inherits from 
ModifibleSolrParams like setParam ...


SolrQuery query = new SolrQuery();
query.setParam(q, eventTimestamp:[2013-06-01T12:00:00.000Z TO 
2013-06-30T11:59:59.999Z]);
query.setParam(stats, true);
query.setParam(stats.field, numberOfBytes,eventType);
QueryResponse response = getSolrServer().query(query);


-Hoss


Re: Solrj Stats encoding problem

2013-06-05 Thread Shawn Heisey

On 6/5/2013 2:11 PM, ethereal wrote:

Hi,

I've tested a query using solr admin web interface and it works fine.
But when I'm trying to execute the same search using solrj, it doesn't
include Stats information.
I've figured out that it's because my query is encoded.
Original query is like q=eventTimestamp:[2013-06-01T12:00:00.000Z TO
2013-06-30T11:59:59.999Z]stats=truestats.field=numberOfBytesstats.facet=eventType
The query in java is like
q=eventTimestamp%3A%5B2013-06-01T12%3A00%3A00.000Z+TO+2013-06-30T11%3A59%3A59.999Z%5D%26stats%3Dtrue%26stats.field%3DnumberOfBytes%26stats.facet%3DeventType
If I copy this query to browser address bar, it doesn't work, but it does if
I replace encoded := with original values. What should I do do make it work
through java?
The code is like the following:

SolrQuery solrQuery = new SolrQuery();
solrQuery.setQuery(queryBuilder.toString());
QueryResponse query = getSolrServer().query(solrQuery);


The only QueryBuilder objects I can find are in the Lucene API, so I 
have no idea what that part of your code is doing.  Here's how I would 
duplicate the query you reference in SolrJ.  The query string is broken 
apart so that the lines won't wrap awkwardly:


String url = http://localhost:8983/solr/collection1;;
SolrServer server = new HttpSolrServer(url);


String qs = eventTimestamp:
  + [2013-06-01T12:00:00.000Z TO 2013-06-30T11:59:59.999Z];
SolrQuery query = new SolrQuery();
query.setQuery(qs);
query.set(stats, true);
query.set(stats.field, numberOfBytes);
query.set(stats.facet, eventType);

QueryResponse rsp = server.query(query);


Thanks,
Shawn



Encoding problem while indexing

2011-06-29 Thread Engy Morsy
I am working on indexing arabic documents containg arabic diacritics and 
dotless characters (old arabic characters), I am using Apache Tomcat server, 
and I am using my modified version of the aramorph analyzer as the arabic 
analyzer. I managed on the development enviorment to normalize the arabic 
diacritics and dotless characters (same concept as in the 
solr.ArabicNormalizationFilterFactory). and i can verfiy that the analyzer is 
working fine, and i get the correct stem for arabic words. the input text file 
for testing has a utf-8 encoding.

When i build the aramorph jar file and place it under solr lib, the diacritics 
and the dotless characters splits the word. I made sure that the server.xml 
contains the URI-Encoding=utf-8.

I also made sure that the text being send to solr using solj is utf-8 encoding
example : solr.addBean(new Doc(4,new String(حِباًَ.getBytes(UTF8;

but nothing is working.

I tried to use the analyze link on solr admin for both indexing and querying 
and both shows that the arabic word is splited if a diacritics or dotless 
character is found.

Do you have any idea what might be the problem


schema snippet:

fieldType name=text class=solr.TextField
analyzer type=index 
class=gpl.pierrick.brihaye.aramorph.lucene.ArabicNormalizeStemmer/
analyzer type=query 
class=gpl.pierrick.brihaye.aramorph.lucene.ArabicNormalizeStemmer/
/fieldType

I also added the following parameter to the JVM: -Dfile.encoding=UTF-8

Thanks,
engy


Re: Encoding problem with ExtractRequestHandler for HTML indexing

2010-03-24 Thread Teruhiko Kurosaka
I suppose you mean Extract_ing_RequestHandler.

Out of curiosity, I sent in a Japanese HTML file of EUC-JP encoding,
and it converted to Unicode properly and the index has correct
Japanese words.

Does your HTML files have META tag for Content-type with the value
having charset= ? For example, this is what I have:
meta http-equiv=Content-Type content=text/html; charset=EUC-JP /


On Mar 21, 2010, at 9:45 AM, Ukyo Virgden wrote:

 Hi,
 
 I'm trying to index HTML documents with different encodings. My html are
 either in win-12XX, ISO-8859-X or UTF8 encoding. handler correctly parses
 all html in their respective encodings and indexes. However on the web
 interface I'm developing I enter query terms in UTF-8 which naturally does
 not match with content with different encodings. Also the results I see on
 my web app is not utf8 encoded as expected.
 
 My question, is there any filter I can use to convert all content extracted
 by the handler to UTF-8 prior to indexing?
 
 Does it make sense to write a filter which would convert tokens to UTF-8, or
 even is it possible with multiple encodings?
 
 Thanks in advance.
 Ukyo


Teruhiko Kuro Kurosaka
RLP + Lucene  Solr = powerful search for global contents



Encoding problem with ExtractRequestHandler for HTML indexing

2010-03-21 Thread Ukyo Virgden
Hi,

I'm trying to index HTML documents with different encodings. My html are
either in win-12XX, ISO-8859-X or UTF8 encoding. handler correctly parses
all html in their respective encodings and indexes. However on the web
interface I'm developing I enter query terms in UTF-8 which naturally does
not match with content with different encodings. Also the results I see on
my web app is not utf8 encoded as expected.

My question, is there any filter I can use to convert all content extracted
by the handler to UTF-8 prior to indexing?

Does it make sense to write a filter which would convert tokens to UTF-8, or
even is it possible with multiple encodings?

Thanks in advance.
Ukyo


RE: encoding problem

2009-09-01 Thread Bernadette Houghton
Finally resolved the problem! The solution was 3-pronged on my windows PC-

Added to my.ini under mysqld-
default-character-set=utf8
collation_server=utf8_unicode_ci
character_set_server=utf8
skip-character-set-client-handshake

Added to JAVA_OPTS environmental variable –
-Dfile.encoding=UTF-8

Added to beginning of tomcat startup.bat (positioning is important!)
set JAVA_OPTS=-Dfile.encoding=UTF-8  

Thanks to everyone for their much appreciated help!

Bern

-Original Message-
From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] 
Sent: Monday, 31 August 2009 9:18 AM
To: 'solr-user@lucene.apache.org'
Subject: RE: encoding problem

Still having a few issues with encoding, although I've been able to resolve the 
particular issue below by just re-editing the affected record. 

The other encoding issue is with Greek characters. With solr turned off in our 
user-facing application, greek characters e.g. α,ω (small alpha, small omega) 
display correctly. But with solr turned on, garbage displays instead. If we 
enter the characters as decimal (e.g. #969;), all displays OK with or without 
solr. Does this suggest anything to anyone??

TIA
bern


RE: encoding problem

2009-08-30 Thread Bernadette Houghton
Still having a few issues with encoding, although I've been able to resolve the 
particular issue below by just re-editing the affected record. 

The other encoding issue is with Greek characters. With solr turned off in our 
user-facing application, greek characters e.g. α,ω (small alpha, small omega) 
display correctly. But with solr turned on, garbage displays instead. If we 
enter the characters as decimal (e.g. #969;), all displays OK with or without 
solr. Does this suggest anything to anyone??

TIA
bern

-Original Message-
From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] 
Sent: Friday, 28 August 2009 9:31 AM
To: 'solr-user@lucene.apache.org'; 'yo...@lucidimagination.com'
Subject: RE: encoding problem

Shalin, the XML from solr admin for the relevant field is displaying as -

str name=citation_ta title=Browse by Author Name for Moncrieff, Joan 
href=/fez/list/author/Moncrieff%2C+Joan/Moncrieff, Joan/a, a 
title=Browse by Author Name for Macauley, Peter 
href=/fez/list/author/Macauley%2C+Peter/Macauley, Peter/a and a 
title=Browse by Author Name for Epps, Janine 
href=/fez/list/author/Epps%2C+Janine/Epps, Janine/a a title=Browse by 
Year 2006 href=/fez/list/year/2006/2006/a, a title=Click to view 
Journal, Media Article: ldquo;My Universe is Hererdquo;: Implications For the 
Future of Academic Libraries From the Results of a Survey of Researchers 
href=/fez/view/changeme:156“My Universe is Here�: Implications 
For the Future of Academic Libraries From the Results of a Survey of 
Researchers/ai/i, vol. 38, no. 2, pp. 71-83./str


The weird thing is that the title displays OK in one place, but not in the 
href bit.

bern


RE: encoding problem

2009-08-27 Thread Bernadette Houghton
Hi Shalin, strangely, things still aren't working. I've set the JAVA_OPTS 
through either the GUI or to startup.bat, but absolutely no impact. Have tried 
reindexing also, but still no impact - results such as -

“My Universe is Here�

bern

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Wednesday, 26 August 2009 5:50 PM
To: solr-user@lucene.apache.org
Subject: Re: encoding problem

On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton 
bernadette.hough...@deakin.edu.au wrote:

 Thanks for your quick reply, Shalin.

 Tomcat is running on my Windows machine, but does not appear in Windows
 Services (as I was expecting it should ... am I wrong?). I'm running it from
 a startup.bat on my desktop - see below. Do I add the Dfile line to the
 startup.bat?

 SOLR is part of the repository software that we are running.


Tomcat respects an environment variable called JAVA_OPTS through which you
can pass any jvm argument (e.g. heap size, file encoding). Set
JAVA_OPTS=-Dfile.encoding=UTF-8 either through the GUI or by adding the
following to startup.bat:

set JAVA_OPTS=-Dfile.encoding=UTF-8

-- 
Regards,
Shalin Shekhar Mangar.


Re: encoding problem

2009-08-27 Thread Yonik Seeley
Have you determined if the problem is on the indexing side or the
query side?  I don't see any reason you should have to set/change any
encoding in the JVM.

-Yonik
http://www.lucidimagination.com



On Thu, Aug 27, 2009 at 7:03 PM, Bernadette
Houghtonbernadette.hough...@deakin.edu.au wrote:
 Hi Shalin, strangely, things still aren't working. I've set the JAVA_OPTS 
 through either the GUI or to startup.bat, but absolutely no impact. Have 
 tried reindexing also, but still no impact - results such as -

 “My Universe is Here�

 bern

 -Original Message-
 From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
 Sent: Wednesday, 26 August 2009 5:50 PM
 To: solr-user@lucene.apache.org
 Subject: Re: encoding problem

 On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton 
 bernadette.hough...@deakin.edu.au wrote:

 Thanks for your quick reply, Shalin.

 Tomcat is running on my Windows machine, but does not appear in Windows
 Services (as I was expecting it should ... am I wrong?). I'm running it from
 a startup.bat on my desktop - see below. Do I add the Dfile line to the
 startup.bat?

 SOLR is part of the repository software that we are running.


 Tomcat respects an environment variable called JAVA_OPTS through which you
 can pass any jvm argument (e.g. heap size, file encoding). Set
 JAVA_OPTS=-Dfile.encoding=UTF-8 either through the GUI or by adding the
 following to startup.bat:

 set JAVA_OPTS=-Dfile.encoding=UTF-8

 --
 Regards,
 Shalin Shekhar Mangar.



RE: encoding problem

2009-08-27 Thread Bernadette Houghton
Shalin, the XML from solr admin for the relevant field is displaying as -

str name=citation_ta title=Browse by Author Name for Moncrieff, Joan 
href=/fez/list/author/Moncrieff%2C+Joan/Moncrieff, Joan/a, a 
title=Browse by Author Name for Macauley, Peter 
href=/fez/list/author/Macauley%2C+Peter/Macauley, Peter/a and a 
title=Browse by Author Name for Epps, Janine 
href=/fez/list/author/Epps%2C+Janine/Epps, Janine/a a title=Browse by 
Year 2006 href=/fez/list/year/2006/2006/a, a title=Click to view 
Journal, Media Article: ldquo;My Universe is Hererdquo;: Implications For the 
Future of Academic Libraries From the Results of a Survey of Researchers 
href=/fez/view/changeme:156“My Universe is Here�: Implications 
For the Future of Academic Libraries From the Results of a Survey of 
Researchers/ai/i, vol. 38, no. 2, pp. 71-83./str


The weird thing is that the title displays OK in one place, but not in the 
href bit.

bern


RE: encoding problem

2009-08-26 Thread Bernadette Houghton
Hi Shalin, stupid question - I'm an apache/solr newbie - but how do I access 
the JVM???

Regards
Bern


-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Wednesday, 26 August 2009 5:10 PM
To: solr-user@lucene.apache.org
Subject: Re: encoding problem

On Wed, Aug 26, 2009 at 10:24 AM, Bernadette Houghton 
bernadette.hough...@deakin.edu.au wrote:

 We have an encoding problem with our solr application. That is, non-ASCII
 chars displaying fine in SOLR, but in googledegook in our application .

 Our tomcat server.xml file already contains URIencoding=UTF-8 under the
 relevant connector.

 A google search reveals that I should set the encoding for the JVM, but
 have no idea how to do this. I'm running Windows, and there is no tomcat
 process in my Windows Services.


Add the following parameter to the JVM:

-Dfile.encoding=UTF-8

-- 
Regards,
Shalin Shekhar Mangar.


Re: encoding problem

2009-08-26 Thread Shalin Shekhar Mangar
On Wed, Aug 26, 2009 at 12:42 PM, Bernadette Houghton 
bernadette.hough...@deakin.edu.au wrote:

 Hi Shalin, stupid question - I'm an apache/solr newbie - but how do I
 access the JVM???


When you execute the java executable, just add -Dfile.encoding=UTF-8 as a
command line argument to the executable.

How are you consuming Solr? You mentioned there is no tomcat, is your solr
client a desktop java application?

-- 
Regards,
Shalin Shekhar Mangar.


RE: encoding problem

2009-08-26 Thread Bernadette Houghton
Thanks for your quick reply, Shalin.

Tomcat is running on my Windows machine, but does not appear in Windows 
Services (as I was expecting it should ... am I wrong?). I'm running it from a 
startup.bat on my desktop - see below. Do I add the Dfile line to the 
startup.bat?

SOLR is part of the repository software that we are running.

Thanks!

BERN

Startup.bat -
@echo off
if %OS% == Windows_NT setlocal
rem ---
rem Start script for the CATALINA Server
rem
rem $Id: startup.bat 302918 2004-05-27 18:25:11Z yoavs $
rem ---

rem Guess CATALINA_HOME if not defined
set CURRENT_DIR=%cd%
if not %CATALINA_HOME% ==  goto gotHome
set CATALINA_HOME=%CURRENT_DIR%
if exist %CATALINA_HOME%\bin\catalina.bat goto okHome
cd ..
set CATALINA_HOME=%cd%
cd %CURRENT_DIR%
:gotHome
if exist %CATALINA_HOME%\bin\catalina.bat goto okHome
echo The CATALINA_HOME environment variable is not defined correctly
echo This environment variable is needed to run this program
goto end
:okHome

set EXECUTABLE=%CATALINA_HOME%\bin\catalina.bat

rem Check that target executable exists
if exist %EXECUTABLE% goto okExec
echo Cannot find %EXECUTABLE%
echo This file is needed to run this program
goto end
:okExec

rem Get remaining unshifted command line arguments and save them in the
set CMD_LINE_ARGS=
:setArgs
if %1== goto doneSetArgs
set CMD_LINE_ARGS=%CMD_LINE_ARGS% %1
shift
goto setArgs
:doneSetArgs

call %EXECUTABLE% start %CMD_LINE_ARGS%

:end





Re: encoding problem

2009-08-26 Thread Shalin Shekhar Mangar
On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton 
bernadette.hough...@deakin.edu.au wrote:

 Thanks for your quick reply, Shalin.

 Tomcat is running on my Windows machine, but does not appear in Windows
 Services (as I was expecting it should ... am I wrong?). I'm running it from
 a startup.bat on my desktop - see below. Do I add the Dfile line to the
 startup.bat?

 SOLR is part of the repository software that we are running.


Tomcat respects an environment variable called JAVA_OPTS through which you
can pass any jvm argument (e.g. heap size, file encoding). Set
JAVA_OPTS=-Dfile.encoding=UTF-8 either through the GUI or by adding the
following to startup.bat:

set JAVA_OPTS=-Dfile.encoding=UTF-8

-- 
Regards,
Shalin Shekhar Mangar.


RE: encoding problem

2009-08-26 Thread Fuad Efendi
If you are complaining about Web Application (other than SOLR) (probably
behind-the Apache HTTPD) having encoding problem - try to troubleshoot it
with Mozilla Firefox + Live Http Headers plugin.


Look at Content-Encoding HTTP response headers, and don't forget about
meta http-equiv...  tag inside HTML... 


-Fuad
http://www.tokenizer.org



-Original Message-
From: Bernadette Houghton [mailto:bernadette.hough...@deakin.edu.au] 
Sent: August-26-09 12:55 AM
To: 'solr-user@lucene.apache.org'
Subject: encoding problem 

We have an encoding problem with our solr application. That is, non-ASCII
chars displaying fine in SOLR, but in googledegook in our application .

Our tomcat server.xml file already contains URIencoding=UTF-8 under the
relevant connector.

A google search reveals that I should set the encoding for the JVM, but have
no idea how to do this. I'm running Windows, and there is no tomcat process
in my Windows Services.

TIA

Bernadette Houghton, Library Business Applications Developer
Deakin University Geelong Victoria 3217 Australia.
Phone: 03 5227 8230 International: +61 3 5227 8230
Fax: 03 5227 8000 International: +61 3 5227 8000
MSN: bern_hough...@hotmail.com
Email:
bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au
Website: http://www.deakin.edu.au
http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B
(Vic)

Important Notice: The contents of this email are intended solely for the
named addressee and are confidential; any unauthorised use, reproduction or
storage of the contents is expressly prohibited. If you have received this
email in error, please delete it and any attachments immediately and advise
the sender by return email or telephone.
Deakin University does not warrant that this email and any attachments are
error or virus free





encoding problem

2009-08-25 Thread Bernadette Houghton
We have an encoding problem with our solr application. That is, non-ASCII chars 
displaying fine in SOLR, but in googledegook in our application .

Our tomcat server.xml file already contains URIencoding=UTF-8 under the 
relevant connector.

A google search reveals that I should set the encoding for the JVM, but have no 
idea how to do this. I'm running Windows, and there is no tomcat process in my 
Windows Services.

TIA

Bernadette Houghton, Library Business Applications Developer
Deakin University Geelong Victoria 3217 Australia.
Phone: 03 5227 8230 International: +61 3 5227 8230
Fax: 03 5227 8000 International: +61 3 5227 8000
MSN: bern_hough...@hotmail.com
Email: 
bernadette.hough...@deakin.edu.aumailto:bernadette.hough...@deakin.edu.au
Website: http://www.deakin.edu.au
http://www.deakin.edu.au/Deakin University CRICOS Provider Code 00113B (Vic)

Important Notice: The contents of this email are intended solely for the named 
addressee and are confidential; any unauthorised use, reproduction or storage 
of the contents is expressly prohibited. If you have received this email in 
error, please delete it and any attachments immediately and advise the sender 
by return email or telephone.
Deakin University does not warrant that this email and any attachments are 
error or virus free



Re: Encoding problem

2009-04-01 Thread Rui Pereira
Thanks,I detected that same problem.
I have CP 1252 system file encoding and was recording data-config.xml file
in UTF-8. DIH was reading using the default encoding.
One possible workarround was using InputStream and OutputStream like DIH,
but the files won't be in UTF-8 if the system has different encoding (not
really good for XML files).
I will get the latest 1.4 build and maintain the files in UTF-8.

On Fri, Mar 27, 2009 at 9:37 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Sat, Mar 28, 2009 at 12:51 AM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

 
  I see that you are specifying the topologyname's value in the query
 itself.
  It might be a bug in DataImportHandler because it reads the data-config
 as a
  string from an InputStream. If your default platform encoding is not
 UTF-8,
  this may be the cause.
 

 I've opened SOLR-1090 to fix this issue.

 https://issues.apache.org/jira/browse/SOLR-1090

 --
 Regards,
 Shalin Shekhar Mangar.



Encoding problem

2009-03-27 Thread Rui Pereira
I'm having problems with encoding in responses from search queries. The
encoding problem only occurs in the topologyname field, if a instancename
has accents it is returned correctly. In all my configurations I have UTF-8.

?xml version=1.0 encoding=UTF-8?
dataConfig
document name=topologies
entity query=SELECT DISTINCT '3141-' || Sub0.SUBID as id, 'Inventário' as
topologyname, 3141 as topologyid, Sub0.SUBID as instancekey, Sub0.NAME as
instancename FROM ...
  field column=INSTANCEKEY name=instancekey/
  field column=ID name=id/
  field column=TOPOLOGYID name=topologyid/
  field column=INSTANCENAME name=instancename/
  field column=TOPOLOGYNAME name=topologyname/...


As an example, I can have in the response the following result:

doc
long name=instancekey285/long
str name=instancenameInformática/str
long name=topologyid3141/long
str name=topologynameInventário/str
/doc


Thanks in advance,
   Rui Pereira


Re: Encoding problem

2009-03-27 Thread aerox7

Hi,
I had the same problem with DATAIMPORTHandler : i have a utf-8 mysql
DATABASE but it's seems that DIH import data in LATIN... So i just use
Transformer to (re)encode my strings in UTF-8.


Rui Pereira-2 wrote:
 
 I'm having problems with encoding in responses from search queries. The
 encoding problem only occurs in the topologyname field, if a instancename
 has accents it is returned correctly. In all my configurations I have
 UTF-8.
 
 ?xml version=1.0 encoding=UTF-8?
 dataConfig
 document name=topologies
 entity query=SELECT DISTINCT '3141-' || Sub0.SUBID as id, 'Inventário'
 as
 topologyname, 3141 as topologyid, Sub0.SUBID as instancekey, Sub0.NAME as
 instancename FROM ...
   field column=INSTANCEKEY name=instancekey/
   field column=ID name=id/
   field column=TOPOLOGYID name=topologyid/
   field column=INSTANCENAME name=instancename/
   field column=TOPOLOGYNAME name=topologyname/...
 
 
 As an example, I can have in the response the following result:
 
 doc
 long name=instancekey285/long
 str name=instancenameInformática/str
 long name=topologyid3141/long
 str name=topologynameInventário/str
 /doc
 
 
 Thanks in advance,
Rui Pereira
 
 

-- 
View this message in context: 
http://www.nabble.com/Encoding-problem-tp22743698p22745133.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Encoding problem

2009-03-27 Thread Shalin Shekhar Mangar
On Fri, Mar 27, 2009 at 8:41 PM, Rui Pereira ruipereira...@gmail.comwrote:

 I'm having problems with encoding in responses from search queries. The
 encoding problem only occurs in the topologyname field, if a instancename
 has accents it is returned correctly. In all my configurations I have
 UTF-8.

 ?xml version=1.0 encoding=UTF-8?
 dataConfig
document name=topologies
 entity query=SELECT DISTINCT '3141-' || Sub0.SUBID as id, 'Inventário' as
 topologyname, 3141 as topologyid, Sub0.SUBID as instancekey, Sub0.NAME as
 instancename FROM ...
  field column=INSTANCEKEY name=instancekey/
  field column=ID name=id/
  field column=TOPOLOGYID name=topologyid/
  field column=INSTANCENAME name=instancename/
  field column=TOPOLOGYNAME name=topologyname/...


 As an example, I can have in the response the following result:

 doc
 long name=instancekey285/long
 str name=instancenameInformática/str
 long name=topologyid3141/long
 str name=topologynameInventário/str
 /doc


I see that you are specifying the topologyname's value in the query itself.
It might be a bug in DataImportHandler because it reads the data-config as a
string from an InputStream. If your default platform encoding is not UTF-8,
this may be the cause.

Can you try running the Solr's (or your servlet-container's) java process
with -Dfile.encoding=UTF-8 and see if that fixes the problem?

-- 
Regards,
Shalin Shekhar Mangar.


Re: Encoding problem

2009-03-27 Thread Shalin Shekhar Mangar
On Sat, Mar 28, 2009 at 12:51 AM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:


 I see that you are specifying the topologyname's value in the query itself.
 It might be a bug in DataImportHandler because it reads the data-config as a
 string from an InputStream. If your default platform encoding is not UTF-8,
 this may be the cause.


I've opened SOLR-1090 to fix this issue.

https://issues.apache.org/jira/browse/SOLR-1090

-- 
Regards,
Shalin Shekhar Mangar.


UTF-8 encoding problem on one of two Solr setups

2007-08-17 Thread Mario Knezovic
Hi all,

I have set up an identical Solr 1.1 on two different machines. One works
fine, the other one has a UTF-8 encoding problem.

#1 is my local Windows XP machine. Solr is running basically in a
configuration like in the tutorial example with Jetty/5.1.11RC0 (Windows
XP/5.1 x86 java/1.6.0). Everything works fine here as expected.

#2 is a Linux machine with Solr running inside Tomcat 6. The problem happens
here. This is the place where Solr will be running finally.

To rule out all problems in my PHP and Java code, I tested the problem with
the Solr admin page and it happens there as well. (Tested with Firefox 2
with site's char encoding UTF-8.)

When entering an arbitrary search string containing UTF-8 chars I get a
correct response from the local Windows Solr setup:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 lst name=params
  str name=indenton/str
  str name=start0/str
  str name=qMünchen/str  -- sample string containing a German
umlaut-u
  str name=rows10/str
  str name=version2.2/str
 /lst
/lst
[...]

When I do exactly the same, just on the admin page of the other Solr setup
(but from exactly the same browser), I get the following response:

[...]
str name=qitem$searchstring_de:München/str
[...]

Obviously the umlaut-u UTF-8 bytes 0xC3 0xB6 had been interpreted as two
8-bit chars instead of one UTF-8 char.

Unfortunately I am pretty new to Solr, Tomcat and related topics, so I was
not able to find the problem yet. My guess is that it is outside of Solr,
maybe in the Tomcat configuration, but so far I spent the entire day without
a further clue.

But apart from that Solr really rocks. Indexing tons of content and
searching works just fine and fast and it was pretty easy to get into
everything. Now I am changing all data to UTF-8 and ran into my first
serious obstacle... after a few weeks of Solr usage!

Any hint/help appreciated. Thank you very much.

Mario



Re: UTF-8 encoding problem on one of two Solr setups

2007-08-17 Thread Sean Timm




This may be your problem. The below docs are for the HTTP connector,
simlar configuration can be made to the AJP and other connectors

See
http://tomcat.apache.org/tomcat-6.0-doc/config/http.html

URIEncoding
This specifies the character encoding used to decode the URI bytes,
after %xx decoding the URL. If not specified, ISO-8859-1 will be used.


-Sean

[EMAIL PROTECTED] wrote:

  Hi all,

I have set up an identical Solr 1.1 on two different machines. One works
fine, the other one has a UTF-8 encoding problem.

#1 is my local Windows XP machine. Solr is running basically in a
configuration like in the tutorial example with Jetty/5.1.11RC0 (Windows
XP/5.1 x86 java/1.6.0). Everything works fine here as expected.

#2 is a Linux machine with Solr running inside Tomcat 6. The problem happens
here. This is the place where Solr will be running finally.

To rule out all problems in my PHP and Java code, I tested the problem with
the Solr admin page and it happens there as well. (Tested with Firefox 2
with site's char encoding UTF-8.)

When entering an arbitrary search string containing UTF-8 chars I get a
correct response from the local Windows Solr setup:

?xml version="1.0" encoding="UTF-8"?
response
lst name="responseHeader"
 int name="status"0/int
 int name="QTime"0/int
 lst name="params"
  str name="indent"on/str
  str name="start"0/str
  str name="q"Mnchen/str  -- sample string containing a German
umlaut-u
  str name="rows"10/str
  str name="version"2.2/str
 /lst
/lst
[...]

When I do exactly the same, just on the admin page of the other Solr setup
(but from exactly the same browser), I get the following response:

[...]
str name="q"item$searchstring_de:Mnchen/str
[...]

Obviously the umlaut-u UTF-8 bytes 0xC3 0xB6 had been interpreted as two
8-bit chars instead of one UTF-8 char.

Unfortunately I am pretty new to Solr, Tomcat and related topics, so I was
not able to find the problem yet. My guess is that it is outside of Solr,
maybe in the Tomcat configuration, but so far I spent the entire day without
a further clue.

But apart from that Solr really rocks. Indexing tons of content and
searching works just fine and fast and it was pretty easy to get into
everything. Now I am changing all data to UTF-8 and ran into my first
serious obstacle... after a few weeks of Solr usage!

Any hint/help appreciated. Thank you very much.

Mario
  





RE: UTF-8 encoding problem on one of two Solr setups

2007-08-17 Thread Charlie Jackson
You might want to check out this page
http://wiki.apache.org/solr/SolrTomcat

Tomcat needs a small config change out of the box to properly support UTF-8. 


Thanks,
Charlie


-Original Message-
From: Mario Knezovic [mailto:[EMAIL PROTECTED] 
Sent: Friday, August 17, 2007 12:58 PM
To: solr-user@lucene.apache.org
Subject: UTF-8 encoding problem on one of two Solr setups

Hi all,

I have set up an identical Solr 1.1 on two different machines. One works
fine, the other one has a UTF-8 encoding problem.

#1 is my local Windows XP machine. Solr is running basically in a
configuration like in the tutorial example with Jetty/5.1.11RC0 (Windows
XP/5.1 x86 java/1.6.0). Everything works fine here as expected.

#2 is a Linux machine with Solr running inside Tomcat 6. The problem happens
here. This is the place where Solr will be running finally.

To rule out all problems in my PHP and Java code, I tested the problem with
the Solr admin page and it happens there as well. (Tested with Firefox 2
with site's char encoding UTF-8.)

When entering an arbitrary search string containing UTF-8 chars I get a
correct response from the local Windows Solr setup:

?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 lst name=params
  str name=indenton/str
  str name=start0/str
  str name=qMünchen/str  -- sample string containing a German
umlaut-u
  str name=rows10/str
  str name=version2.2/str
 /lst
/lst
[...]

When I do exactly the same, just on the admin page of the other Solr setup
(but from exactly the same browser), I get the following response:

[...]
str name=qitem$searchstring_de:München/str
[...]

Obviously the umlaut-u UTF-8 bytes 0xC3 0xB6 had been interpreted as two
8-bit chars instead of one UTF-8 char.

Unfortunately I am pretty new to Solr, Tomcat and related topics, so I was
not able to find the problem yet. My guess is that it is outside of Solr,
maybe in the Tomcat configuration, but so far I spent the entire day without
a further clue.

But apart from that Solr really rocks. Indexing tons of content and
searching works just fine and fast and it was pretty easy to get into
everything. Now I am changing all data to UTF-8 and ran into my first
serious obstacle... after a few weeks of Solr usage!

Any hint/help appreciated. Thank you very much.

Mario