Re: lucene web-app russian language

2002-03-02 Thread Andrew C. Oliver

Hi, 

Sorry, Lucene supports other languages but the webapp was written to
English.  Change out the analyzer.  If you can adapt it to make it
configurable I'd be happy to adapt both the getting started guide and
commit the changes.

Thanks,

Andy

On Fri, 2002-03-01 at 15:49, Ype Kingma wrote:
 Philipp,
 
 Hi! I was trying the lucene web-app (lucene-1.2-rc5-dev.jar). I've created
 and indexed a simple html document with both english and russian words. it
 was ANSI encoded, if I check  _3.fdt from created index, I can see my
 document indexed and both russian and english terms indexed (it opens in utf
 encoding, i suppose). but the problem starts when searching. If i search
 with russian word, it returns nothing, if I search with engglish, it returns
 a result, but all russian words are returned as ? signs. I've changed .jsp
 contenttypes to return in UTF-8 encoding, but the resukt is still the same.
 
 So, finally, does Lucene those multilingual search or not? What am I doing
 wrong? I am trying to make it work since version 1.0 with russian docs, but
 still no idea and no resutls :((
 
 Did you read the FAQ on the use of the StandardAnalyzer during indexing
 and query parsing? You might need to replace it with a RussianAnalyzer
 which you'll have to make yourself when no one has done this before
 you. Have a look at the GermanAnalyzer for some inspiration.
 
 Good luck,
 Ype
 
 -- 
 
 --
 To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
 For additional commands, e-mail: mailto:[EMAIL PROTECTED]
 
-- 
http://www.superlinksoftware.com
http://jakarta.apache.org - port of Excel/Word/OLE 2 Compound Document 
format to java
http://developer.java.sun.com/developer/bugParade/bugs/4487555.html 
- fix java generics!
The avalanche has already started. It is too late for the pebbles to
vote.
-Ambassador Kosh


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: lucene web-app russian language

2002-03-02 Thread Philipp Chudinov

Ok, I'll try to make the russian analyzer and report to you in 2-3 days.
Hope, about success. But if i fail, I'll report anyway :)


- Original Message -
From: Andrew C. Oliver [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Saturday, March 02, 2002 9:28 PM
Subject: Re: lucene web-app  russian language


 Hi,

 Sorry, Lucene supports other languages but the webapp was written to
 English.  Change out the analyzer.  If you can adapt it to make it
 configurable I'd be happy to adapt both the getting started guide and
 commit the changes.

 Thanks,

 Andy

 On Fri, 2002-03-01 at 15:49, Ype Kingma wrote:
  Philipp,
 
  Hi! I was trying the lucene web-app (lucene-1.2-rc5-dev.jar). I've
created
  and indexed a simple html document with both english and russian words.
it
  was ANSI encoded, if I check  _3.fdt from created index, I can see my
  document indexed and both russian and english terms indexed (it opens
in utf
  encoding, i suppose). but the problem starts when searching. If i
search
  with russian word, it returns nothing, if I search with engglish, it
returns
  a result, but all russian words are returned as ? signs. I've changed
.jsp
  contenttypes to return in UTF-8 encoding, but the resukt is still the
same.
  
  So, finally, does Lucene those multilingual search or not? What am I
doing
  wrong? I am trying to make it work since version 1.0 with russian docs,
but
  still no idea and no resutls :((
 
  Did you read the FAQ on the use of the StandardAnalyzer during indexing
  and query parsing? You might need to replace it with a RussianAnalyzer
  which you'll have to make yourself when no one has done this before
  you. Have a look at the GermanAnalyzer for some inspiration.
 
  Good luck,
  Ype
 
  --
 
  --
  To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
  For additional commands, e-mail:
mailto:[EMAIL PROTECTED]
 
 --
 http://www.superlinksoftware.com
 http://jakarta.apache.org - port of Excel/Word/OLE 2 Compound Document
 format to java
 http://developer.java.sun.com/developer/bugParade/bugs/4487555.html
 - fix java generics!
 The avalanche has already started. It is too late for the pebbles to
 vote.
 -Ambassador Kosh


 --
 To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
mailto:[EMAIL PROTECTED]



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: lucene web-app russian language

2002-03-01 Thread Ype Kingma

Philipp,

Hi! I was trying the lucene web-app (lucene-1.2-rc5-dev.jar). I've created
and indexed a simple html document with both english and russian words. it
was ANSI encoded, if I check  _3.fdt from created index, I can see my
document indexed and both russian and english terms indexed (it opens in utf
encoding, i suppose). but the problem starts when searching. If i search
with russian word, it returns nothing, if I search with engglish, it returns
a result, but all russian words are returned as ? signs. I've changed .jsp
contenttypes to return in UTF-8 encoding, but the resukt is still the same.

So, finally, does Lucene those multilingual search or not? What am I doing
wrong? I am trying to make it work since version 1.0 with russian docs, but
still no idea and no resutls :((

Did you read the FAQ on the use of the StandardAnalyzer during indexing
and query parsing? You might need to replace it with a RussianAnalyzer
which you'll have to make yourself when no one has done this before
you. Have a look at the GermanAnalyzer for some inspiration.

Good luck,
Ype

-- 

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




lucene web-app russian language

2002-02-28 Thread Philipp Chudinov

Hi! I was trying the lucene web-app (lucene-1.2-rc5-dev.jar). I've created
and indexed a simple html document with both english and russian words. it
was ANSI encoded, if I check  _3.fdt from created index, I can see my
document indexed and both russian and english terms indexed (it opens in utf
encoding, i suppose). but the problem starts when searching. If i search
with russian word, it returns nothing, if I search with engglish, it returns
a result, but all russian words are returned as ? signs. I've changed .jsp
contenttypes to return in UTF-8 encoding, but the resukt is still the same.

So, finally, does Lucene those multilingual search or not? What am I doing
wrong? I am trying to make it work since version 1.0 with russian docs, but
still no idea and no resutls :((


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]