Hi,

After being able to search and show the content type, etc, now I came across the problem that my web pages, encoded in ISO-8859-1, are not "properly" indexed as the summaries and titles are missing the "non UTF-8" characters.

I tried specifying the property
*******************************************************************
<property>
   <name>parser.character.encoding.default</name>
   <value>ISO-8859-1</value>
</property>
*******************************************************************
but it made no difference.

On a related note, I can see that my documents have been properly identified with the "language-identifier" plugin and I can see the "lang" detail on the hits. However, I'm trying to do a search limited to the documents in one given language but I cannot get the query to identify which language I'm talking about. I tried using the same way one can search documents from one site using "site:my.site.com criteria" but using lang, language, Language... but nothing works and I can see in the logs:
**************
061211 145357 10 query: lang:ca CRUE
061211 145357 10 Language: null <<<<<<< here it should read ca??
061211 145357 10 searching for 20 raw hits
**************
I tried browsing the documentation and searching the web but I could not find explicit information on how to build the query to make use of that field, now that I know the documents are properly indexed.

Any hints on those subjects?

Thanks in advance,
D.

Reply via email to