Re: Are the lucene index server support for other language, like chinese?

2004-05-06 Thread Otis Gospodnetic
Yes, there is.
I've added the answer to Lucene FAQ at jGuru.com.
It will show up there, shortly.

Otis

--- Alex Aw Seat Kiong [EMAIL PROTECTED] wrote:
 Hi!
 
 Are the lucene index server support for other language, like chinese?
 What is the additional work need to be done for support it?
 
 
 Thanks,
 Alex
 
 
 
 
 - Original Message - 
 From: Otis Gospodnetic [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Thursday, May 06, 2004 3:37 AM
 Subject: Re: Where does the name lucene come from?
 
 
  Funny, earlier today I started to reply to this message, and then
  decided not to answer this question any more.  It is a FAQ entry
 now:
  http://www.jguru.com/faq/Lucene
 
  Otis
 
  --- Steven Rowe [EMAIL PROTECTED] wrote:
   Til Schneider wrote:
Hi,
   
Working now for a few months with this really great search
 engine,
   I was
wondering where the name Lucene comes from? What does it
 mean? Is
  
there any deeper sense?
  
   Doug Cutting's response:
   URL:http://tinyurl.com/2hh5c
  
   (full original URL:
  
 

URL:http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]
 .apache.orgmsgId=961817
   )
  
   Otis, shouldn't this be an FAQ?
  
   Steve
  
  
 -
   To unsubscribe, e-mail:
 [EMAIL PROTECTED]
   For additional commands, e-mail:
 [EMAIL PROTECTED]
  
 
 
 
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Scoring documents by Click Count

2004-05-06 Thread Otis Gospodnetic
Sure.
On click, get document Id (not internal docId, but something you use as
s surrogate primary key) of the clicked document.  Retrieve the
document.  Pull out the value of 'clickCount' field.  +1 it.  Delete
the document, and re-add it (there is no 'update(Document)' method).

Otis


--- Centaur zeus [EMAIL PROTECTED] wrote:
 Hi all,
 
 I want to integrate lucene into my web app. I would like to increase
 the 
 score of the document when more people click on it. Could I implement
 that 
 in lucene ?
 
 Thanks.
 
 Perseus
 
 _
 MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*. 
 http://join.msn.com/?page=features/virus
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Range searches for numbers

2004-05-06 Thread Reece . 1247688
Hi,



What's the best way to store numbers for range searching?  If someone
has some info about this I'd love to see it.



This is my current plan:

When I convert the number to a string I will zero pad it so range searches
work.  The conversions will be like this for integers:

   1 to 101

   2 to 102

1000 to 1001000



I'm just adding a 1 to the
start of the string (or adding 10).  This is so negative numbers work
too!  They will just be subtracted from a long (10):

   -1 to 099

   -2 to 098

-1000 to 0999000



This works great for range
searches.  But how do I convert negative longs?  I can't subtract 100
from a long can I?  It too big to fit in another long.



Any advice is appreciated!

-Reece

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Are the lucene index server support for other language, like chinese?

2004-05-06 Thread Alex Aw Seat Kiong
Hi!

Are the lucene index server support for other language, like chinese?
What is the additional work need to be done for support it?


Thanks,
Alex




- Original Message - 
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, May 06, 2004 3:37 AM
Subject: Re: Where does the name lucene come from?


 Funny, earlier today I started to reply to this message, and then
 decided not to answer this question any more.  It is a FAQ entry now:
 http://www.jguru.com/faq/Lucene

 Otis

 --- Steven Rowe [EMAIL PROTECTED] wrote:
  Til Schneider wrote:
   Hi,
  
   Working now for a few months with this really great search engine,
  I was
   wondering where the name Lucene comes from? What does it mean? Is
 
   there any deeper sense?
 
  Doug Cutting's response:
  URL:http://tinyurl.com/2hh5c
 
  (full original URL:
 

URL:http://issues.apache.org/eyebrowse/[EMAIL PROTECTED]
.apache.orgmsgId=961817
  )
 
  Otis, shouldn't this be an FAQ?
 
  Steve
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Scoring documents by Click Count

2004-05-06 Thread David Spencer
Otis Gospodnetic wrote:

Sure.
On click, get document Id (not internal docId, but something you use as
s surrogate primary key) of the clicked document.  Retrieve the
document.  Pull out the value of 'clickCount' field.  +1 it.  Delete
the document, and re-add it (there is no 'update(Document)' method).
 

Yeah but isn't the essence of it that Lucene is really not set up for 
dynamically adjusting the *score*?
Also, above, to clarify, I think you're implying there are 2 entries for 
given doc - one Document for the indexed content, and one for the 
clickCount, as (from memory) I didn't think you could even re-add a doc 
w/o reindexing it...

Otis

--- Centaur zeus [EMAIL PROTECTED] wrote:
 

Hi all,

I want to integrate lucene into my web app. I would like to increase
the 
score of the document when more people click on it. Could I implement
that 
in lucene ?

Thanks.

Perseus

_
MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*. 
http://join.msn.com/?page=features/virus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Scoring documents by Click Count

2004-05-06 Thread Otis Gospodnetic
Oh, I completely misunderstood the original question.  I thought the
person was asking about sorting by click-count.

Otis

--- David Spencer [EMAIL PROTECTED] wrote:
 Otis Gospodnetic wrote:
 
 Sure.
 On click, get document Id (not internal docId, but something you use
 as
 s surrogate primary key) of the clicked document.  Retrieve the
 document.  Pull out the value of 'clickCount' field.  +1 it.  Delete
 the document, and re-add it (there is no 'update(Document)' method).
   
 
 
 Yeah but isn't the essence of it that Lucene is really not set up for
 
 dynamically adjusting the *score*?
 Also, above, to clarify, I think you're implying there are 2 entries
 for 
 given doc - one Document for the indexed content, and one for the 
 clickCount, as (from memory) I didn't think you could even re-add a
 doc 
 w/o reindexing it...
 
 Otis
 
 
 --- Centaur zeus [EMAIL PROTECTED] wrote:
   
 
 Hi all,
 
 I want to integrate lucene into my web app. I would like to
 increase
 the 
 score of the document when more people click on it. Could I
 implement
 that 
 in lucene ?
 
 Thanks.
 
 Perseus


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Range searches for numbers

2004-05-06 Thread Stephane James Vaucher
Quick reference:

http://wiki.apache.org/jakarta-lucene/SearchNumericalFields

If you are stuck, you can always encode the long in a string format (the 
date formatter in lucene might do this already). Or even, you could also 
treat it like a date and use your long like a date filter.

HTH,
sv

On 6 May 2004 [EMAIL PROTECTED] wrote:

 Hi,
 
 What's the best way to store numbers for range searching?  If someone
 has some info about this I'd love to see it.
 
 This is my current plan:
 When I convert the number to a string I will zero pad it so range searches
 work.  The conversions will be like this for integers:
1 to 101
 
2 to 102
 1000 to 1001000
 
 I'm just adding a 1 to the
 start of the string (or adding 10).  This is so negative numbers work
 too!  They will just be subtracted from a long (10):
-1 to 099
 
-2 to 098
 -1000 to 0999000
 
 This works great for range
 searches.  But how do I convert negative longs?  I can't subtract 100
 from a long can I?  It too big to fit in another long.
 
 Any advice is appreciated!
 
 -Reece
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Range searches for numbers

2004-05-06 Thread Reece . 1247688
Thanks for the info.  The Date formatter doesn't work because it can only
deal with positive longs.  My problem was how to handle the negatives but
I got it figured out.



Thanks!



--- Lucene Users List [EMAIL PROTECTED]
wrote:

Quick reference:

 

 http://wiki.apache.org/jakarta-lucene/SearchNumericalFields

 

 If you are stuck, you can always encode the long in a string format
(the 

 date formatter in lucene might do this already). Or even, you could
also 

 treat it like a date and use your long like a date filter.

 


HTH,

 sv

 

 On 6 May 2004 [EMAIL PROTECTED] wrote:

 


 Hi,

  

  What's the best way to store numbers for range searching?
 If someone

  has some info about this I'd love to see it.

  

  This
is my current plan:

  When I convert the number to a string I will zero
pad it so range searches

  work.  The conversions will be like this for
integers:

 1 to 101

  

 2 to 102

  1000
to 1001000

  

  I'm just adding a 1 to the

  start of the string
(or adding 10).  This is so negative numbers work

  too!  They
will just be subtracted from a long (10):

 -1 to 099

  

 -2 to 098

  -1000 to 0999000

  

  This
works great for range

  searches.  But how do I convert negative longs?
 I can't subtract 100

  from a long can I?  It too big
to fit in another long.

  

  Any advice is appreciated!

  

  -Reece

  

  -

  To unsubscribe, e-mail: [EMAIL PROTECTED]


 For additional commands, e-mail: [EMAIL PROTECTED]


 

 

 

 -

 To unsubscribe, e-mail: [EMAIL PROTECTED]

 For
additional commands, e-mail: [EMAIL PROTECTED]

 

 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Scoring documents by Click Count

2004-05-06 Thread Ype Kingma
On Thursday 06 May 2004 18:11, David Spencer wrote:
 Otis Gospodnetic wrote:
 Sure.
 On click, get document Id (not internal docId, but something you use as
 s surrogate primary key) of the clicked document.  Retrieve the
 document.  Pull out the value of 'clickCount' field.  +1 it.  Delete
 the document, and re-add it (there is no 'update(Document)' method).

 Yeah but isn't the essence of it that Lucene is really not set up for
 dynamically adjusting the *score*?
 Also, above, to clarify, I think you're implying there are 2 entries for
 given doc - one Document for the indexed content, and one for the
 clickCount, as (from memory) I didn't think you could even re-add a doc
 w/o reindexing it...

 Otis
 
 --- Centaur zeus [EMAIL PROTECTED] wrote:
 Hi all,
 
 I want to integrate lucene into my web app. I would like to increase
 the
 score of the document when more people click on it. Could I implement
 that
 in lucene ?

Changing the click count this way is ok, but along with that you could
change the (field) norm for the document to increase it's score
in subsequent queries.
You can use Document.setBoost() and/or Field.setBoost() just before
IndexWriter.addDocument() to do this.

Regards,
Ype





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene 1.4

2004-05-06 Thread Scott Smith
We are currently using lucene 1.3 on a production web server.  For the
most part, it runs great.  However, once in a while we see some problems
which I suspect are the infamous running out of file handles bugs.  I
would claim that we are doing everything right (famous last words) so it
would be nice if someone could explain the proper methods for using the
Searcher object to avoid this problem.  I should probably mention that
I'm adding new items to the index once per minute though I close the
indexwriter each time.  I suspect the problem is that I can't close the
Searcher object because the hits list needs it to get at the documents.
 
At any rate, that brings me to the real question.  I believe I've read
that 1.4 has changes to largely eliminate these problems.  I know that
RC2 is out.  My question is has anyone tried RC2?  Is it stable?
Obviously, I'm trying to make the decision as to whether moving to
1.4RC2 or stay with 1.3.  Comments would be appreciated.
 

Scott



Re: Lucene 1.4

2004-05-06 Thread Erik Hatcher
Scott,

Lucene 1.3 added a compound file format which may solve your out of 
file handles issue.  Look at the new method on IndexWriter to use this 
mode.

	Erik

On May 6, 2004, at 2:11 PM, Scott Smith wrote:

We are currently using lucene 1.3 on a production web server.  For the
most part, it runs great.  However, once in a while we see some 
problems
which I suspect are the infamous running out of file handles bugs.  I
would claim that we are doing everything right (famous last words) so 
it
would be nice if someone could explain the proper methods for using the
Searcher object to avoid this problem.  I should probably mention that
I'm adding new items to the index once per minute though I close the
indexwriter each time.  I suspect the problem is that I can't close the
Searcher object because the hits list needs it to get at the documents.

At any rate, that brings me to the real question.  I believe I've read
that 1.4 has changes to largely eliminate these problems.  I know that
RC2 is out.  My question is has anyone tried RC2?  Is it stable?
Obviously, I'm trying to make the decision as to whether moving to
1.4RC2 or stay with 1.3.  Comments would be appreciated.
Scott



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


sorting

2004-05-06 Thread Ryan Sonnek
I've been searching around for information on how to sort a lucene search.  could 
someone point me in the right direction?

Ryan



Re: sorting

2004-05-06 Thread Erik Hatcher
On May 6, 2004, at 4:34 PM, Ryan Sonnek wrote:
I've been searching around for information on how to sort a lucene 
search.  could someone point me in the right direction?
Sorting is only available in the latest CVS builds of Lucene and the 
1.4 RC releases.

The best source of information on how Lucene works is Lucene's test 
suite.  Look at the sorting test cases and you will have immediate 
cut-and-paste examples of how to use it.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Scoring documents by Click Count

2004-05-06 Thread Boris Goldowsky
On Thu, 2004-05-06 at 13:58, Ype Kingma wrote:

 Changing the click count this way is ok, but along with that you could
 change the (field) norm for the document to increase it's score
 in subsequent queries.
 You can use Document.setBoost() and/or Field.setBoost() just before
 IndexWriter.addDocument() to do this.

There may be workable ways to do this, but the one time I tried
adjusting boosts of already-indexed documents I found it didn't work
quite as I expected.  The documentation has a warning which explains
why:

getBoost
Returns the boost factor for hits on any field of this document.
[...]
Note: This value is not stored directly with the document in the
index. Documents returned from IndexReader.document(int) and
Hits.doc(int) may thus not have the same value present as when
this document was indexed.

So be cautious and test carefully if you try this -- and let us on the
list know how it goes!

Boris



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Scoring documents by Click Count

2004-05-06 Thread Ype Kingma
On Thursday 06 May 2004 23:26, Boris Goldowsky wrote:
 On Thu, 2004-05-06 at 13:58, Ype Kingma wrote:
  Changing the click count this way is ok, but along with that you could
  change the (field) norm for the document to increase it's score
  in subsequent queries.
  You can use Document.setBoost() and/or Field.setBoost() just before
  IndexWriter.addDocument() to do this.

 There may be workable ways to do this, but the one time I tried
 adjusting boosts of already-indexed documents I found it didn't work
 quite as I expected.  The documentation has a warning which explains
 why:

 getBoost
 Returns the boost factor for hits on any field of this document.
 [...]
 Note: This value is not stored directly with the document in the
 index. Documents returned from IndexReader.document(int) and
 Hits.doc(int) may thus not have the same value present as when
 this document was indexed.

 So be cautious and test carefully if you try this -- and let us on the
 list know how it goes!

Then use Field.setBoost() only. Note that this is stored with only
3 bits precision in the index. So when you know which field is queried for
your scores, multiply the previous value with at least with 9/8 to be
sure to increase the score. Or use the score of the next higher
scoring doc as a baseline.

That may not be enough. I seem to recall that the inverse square root
of the boost is stored as the norm, so you may have to multiply
with at least 91/64 to see a guaranteed difference in the score.

At the moment I have no time left to check the details. Please have a look at
the default Similarity implementation for more.

Good night,
Ype


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Range searches for numbers

2004-05-06 Thread Matt Quail
Reece,

What's the best way to store numbers for range searching?  If someone
has some info about this I'd love to see it.
I implemented a LongField that encodes any +ve or -ve long into a
string that sorts correctly. I posted that class here:
http://www.mail-archive.com/[EMAIL PROTECTED]/msg04790.html
Doing a String range search should be fairly straight forward from
there. Let me know if you have any problems.
=Matt



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Query performance on a 315 Million document index (1TB)

2004-05-06 Thread Will Allen
Hi,
I am considering a project that would index 315+ million documents. I am 
comfortable that the indexing will work well in creating an index ~800GB in size, but 
am concerned about the query performance. (Is this a = bad
assumption?)

What are the bottlenecks of performance as an index scales?  Memory?  = Cost is not a 
concern, so what would be the shortcomings of a theoretical = machine with 16GB of 
ram, 4-16 cpus and 1-2 terabytes of space?  Would it be = better to cluster machines 
to break apart the query?

Thank you for your serious responses,
Will Allen
-- 
___
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]