Re: Are the lucene index server support for other language, like chinese?
Yes, there is. I've added the answer to Lucene FAQ at jGuru.com. It will show up there, shortly. Otis --- Alex Aw Seat Kiong [EMAIL PROTECTED] wrote: Hi! Are the lucene index server support for other language, like chinese? What is the additional work need to be done for support it? Thanks, Alex - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, May 06, 2004 3:37 AM Subject: Re: Where does the name lucene come from? Funny, earlier today I started to reply to this message, and then decided not to answer this question any more. It is a FAQ entry now: http://www.jguru.com/faq/Lucene Otis --- Steven Rowe [EMAIL PROTECTED] wrote: Til Schneider wrote: Hi, Working now for a few months with this really great search engine, I was wondering where the name Lucene comes from? What does it mean? Is there any deeper sense? Doug Cutting's response: URL:http://tinyurl.com/2hh5c (full original URL: URL:http://issues.apache.org/eyebrowse/[EMAIL PROTECTED] .apache.orgmsgId=961817 ) Otis, shouldn't this be an FAQ? Steve - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Scoring documents by Click Count
Sure. On click, get document Id (not internal docId, but something you use as s surrogate primary key) of the clicked document. Retrieve the document. Pull out the value of 'clickCount' field. +1 it. Delete the document, and re-add it (there is no 'update(Document)' method). Otis --- Centaur zeus [EMAIL PROTECTED] wrote: Hi all, I want to integrate lucene into my web app. I would like to increase the score of the document when more people click on it. Could I implement that in lucene ? Thanks. Perseus _ MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*. http://join.msn.com/?page=features/virus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Range searches for numbers
Hi, What's the best way to store numbers for range searching? If someone has some info about this I'd love to see it. This is my current plan: When I convert the number to a string I will zero pad it so range searches work. The conversions will be like this for integers: 1 to 101 2 to 102 1000 to 1001000 I'm just adding a 1 to the start of the string (or adding 10). This is so negative numbers work too! They will just be subtracted from a long (10): -1 to 099 -2 to 098 -1000 to 0999000 This works great for range searches. But how do I convert negative longs? I can't subtract 100 from a long can I? It too big to fit in another long. Any advice is appreciated! -Reece - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Are the lucene index server support for other language, like chinese?
Hi! Are the lucene index server support for other language, like chinese? What is the additional work need to be done for support it? Thanks, Alex - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, May 06, 2004 3:37 AM Subject: Re: Where does the name lucene come from? Funny, earlier today I started to reply to this message, and then decided not to answer this question any more. It is a FAQ entry now: http://www.jguru.com/faq/Lucene Otis --- Steven Rowe [EMAIL PROTECTED] wrote: Til Schneider wrote: Hi, Working now for a few months with this really great search engine, I was wondering where the name Lucene comes from? What does it mean? Is there any deeper sense? Doug Cutting's response: URL:http://tinyurl.com/2hh5c (full original URL: URL:http://issues.apache.org/eyebrowse/[EMAIL PROTECTED] .apache.orgmsgId=961817 ) Otis, shouldn't this be an FAQ? Steve - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Scoring documents by Click Count
Otis Gospodnetic wrote: Sure. On click, get document Id (not internal docId, but something you use as s surrogate primary key) of the clicked document. Retrieve the document. Pull out the value of 'clickCount' field. +1 it. Delete the document, and re-add it (there is no 'update(Document)' method). Yeah but isn't the essence of it that Lucene is really not set up for dynamically adjusting the *score*? Also, above, to clarify, I think you're implying there are 2 entries for given doc - one Document for the indexed content, and one for the clickCount, as (from memory) I didn't think you could even re-add a doc w/o reindexing it... Otis --- Centaur zeus [EMAIL PROTECTED] wrote: Hi all, I want to integrate lucene into my web app. I would like to increase the score of the document when more people click on it. Could I implement that in lucene ? Thanks. Perseus _ MSN 8 helps eliminate e-mail viruses. Get 2 months FREE*. http://join.msn.com/?page=features/virus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Scoring documents by Click Count
Oh, I completely misunderstood the original question. I thought the person was asking about sorting by click-count. Otis --- David Spencer [EMAIL PROTECTED] wrote: Otis Gospodnetic wrote: Sure. On click, get document Id (not internal docId, but something you use as s surrogate primary key) of the clicked document. Retrieve the document. Pull out the value of 'clickCount' field. +1 it. Delete the document, and re-add it (there is no 'update(Document)' method). Yeah but isn't the essence of it that Lucene is really not set up for dynamically adjusting the *score*? Also, above, to clarify, I think you're implying there are 2 entries for given doc - one Document for the indexed content, and one for the clickCount, as (from memory) I didn't think you could even re-add a doc w/o reindexing it... Otis --- Centaur zeus [EMAIL PROTECTED] wrote: Hi all, I want to integrate lucene into my web app. I would like to increase the score of the document when more people click on it. Could I implement that in lucene ? Thanks. Perseus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Range searches for numbers
Quick reference: http://wiki.apache.org/jakarta-lucene/SearchNumericalFields If you are stuck, you can always encode the long in a string format (the date formatter in lucene might do this already). Or even, you could also treat it like a date and use your long like a date filter. HTH, sv On 6 May 2004 [EMAIL PROTECTED] wrote: Hi, What's the best way to store numbers for range searching? If someone has some info about this I'd love to see it. This is my current plan: When I convert the number to a string I will zero pad it so range searches work. The conversions will be like this for integers: 1 to 101 2 to 102 1000 to 1001000 I'm just adding a 1 to the start of the string (or adding 10). This is so negative numbers work too! They will just be subtracted from a long (10): -1 to 099 -2 to 098 -1000 to 0999000 This works great for range searches. But how do I convert negative longs? I can't subtract 100 from a long can I? It too big to fit in another long. Any advice is appreciated! -Reece - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Range searches for numbers
Thanks for the info. The Date formatter doesn't work because it can only deal with positive longs. My problem was how to handle the negatives but I got it figured out. Thanks! --- Lucene Users List [EMAIL PROTECTED] wrote: Quick reference: http://wiki.apache.org/jakarta-lucene/SearchNumericalFields If you are stuck, you can always encode the long in a string format (the date formatter in lucene might do this already). Or even, you could also treat it like a date and use your long like a date filter. HTH, sv On 6 May 2004 [EMAIL PROTECTED] wrote: Hi, What's the best way to store numbers for range searching? If someone has some info about this I'd love to see it. This is my current plan: When I convert the number to a string I will zero pad it so range searches work. The conversions will be like this for integers: 1 to 101 2 to 102 1000 to 1001000 I'm just adding a 1 to the start of the string (or adding 10). This is so negative numbers work too! They will just be subtracted from a long (10): -1 to 099 -2 to 098 -1000 to 0999000 This works great for range searches. But how do I convert negative longs? I can't subtract 100 from a long can I? It too big to fit in another long. Any advice is appreciated! -Reece - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Scoring documents by Click Count
On Thursday 06 May 2004 18:11, David Spencer wrote: Otis Gospodnetic wrote: Sure. On click, get document Id (not internal docId, but something you use as s surrogate primary key) of the clicked document. Retrieve the document. Pull out the value of 'clickCount' field. +1 it. Delete the document, and re-add it (there is no 'update(Document)' method). Yeah but isn't the essence of it that Lucene is really not set up for dynamically adjusting the *score*? Also, above, to clarify, I think you're implying there are 2 entries for given doc - one Document for the indexed content, and one for the clickCount, as (from memory) I didn't think you could even re-add a doc w/o reindexing it... Otis --- Centaur zeus [EMAIL PROTECTED] wrote: Hi all, I want to integrate lucene into my web app. I would like to increase the score of the document when more people click on it. Could I implement that in lucene ? Changing the click count this way is ok, but along with that you could change the (field) norm for the document to increase it's score in subsequent queries. You can use Document.setBoost() and/or Field.setBoost() just before IndexWriter.addDocument() to do this. Regards, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene 1.4
We are currently using lucene 1.3 on a production web server. For the most part, it runs great. However, once in a while we see some problems which I suspect are the infamous running out of file handles bugs. I would claim that we are doing everything right (famous last words) so it would be nice if someone could explain the proper methods for using the Searcher object to avoid this problem. I should probably mention that I'm adding new items to the index once per minute though I close the indexwriter each time. I suspect the problem is that I can't close the Searcher object because the hits list needs it to get at the documents. At any rate, that brings me to the real question. I believe I've read that 1.4 has changes to largely eliminate these problems. I know that RC2 is out. My question is has anyone tried RC2? Is it stable? Obviously, I'm trying to make the decision as to whether moving to 1.4RC2 or stay with 1.3. Comments would be appreciated. Scott
Re: Lucene 1.4
Scott, Lucene 1.3 added a compound file format which may solve your out of file handles issue. Look at the new method on IndexWriter to use this mode. Erik On May 6, 2004, at 2:11 PM, Scott Smith wrote: We are currently using lucene 1.3 on a production web server. For the most part, it runs great. However, once in a while we see some problems which I suspect are the infamous running out of file handles bugs. I would claim that we are doing everything right (famous last words) so it would be nice if someone could explain the proper methods for using the Searcher object to avoid this problem. I should probably mention that I'm adding new items to the index once per minute though I close the indexwriter each time. I suspect the problem is that I can't close the Searcher object because the hits list needs it to get at the documents. At any rate, that brings me to the real question. I believe I've read that 1.4 has changes to largely eliminate these problems. I know that RC2 is out. My question is has anyone tried RC2? Is it stable? Obviously, I'm trying to make the decision as to whether moving to 1.4RC2 or stay with 1.3. Comments would be appreciated. Scott - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
sorting
I've been searching around for information on how to sort a lucene search. could someone point me in the right direction? Ryan
Re: sorting
On May 6, 2004, at 4:34 PM, Ryan Sonnek wrote: I've been searching around for information on how to sort a lucene search. could someone point me in the right direction? Sorting is only available in the latest CVS builds of Lucene and the 1.4 RC releases. The best source of information on how Lucene works is Lucene's test suite. Look at the sorting test cases and you will have immediate cut-and-paste examples of how to use it. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Scoring documents by Click Count
On Thu, 2004-05-06 at 13:58, Ype Kingma wrote: Changing the click count this way is ok, but along with that you could change the (field) norm for the document to increase it's score in subsequent queries. You can use Document.setBoost() and/or Field.setBoost() just before IndexWriter.addDocument() to do this. There may be workable ways to do this, but the one time I tried adjusting boosts of already-indexed documents I found it didn't work quite as I expected. The documentation has a warning which explains why: getBoost Returns the boost factor for hits on any field of this document. [...] Note: This value is not stored directly with the document in the index. Documents returned from IndexReader.document(int) and Hits.doc(int) may thus not have the same value present as when this document was indexed. So be cautious and test carefully if you try this -- and let us on the list know how it goes! Boris - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Scoring documents by Click Count
On Thursday 06 May 2004 23:26, Boris Goldowsky wrote: On Thu, 2004-05-06 at 13:58, Ype Kingma wrote: Changing the click count this way is ok, but along with that you could change the (field) norm for the document to increase it's score in subsequent queries. You can use Document.setBoost() and/or Field.setBoost() just before IndexWriter.addDocument() to do this. There may be workable ways to do this, but the one time I tried adjusting boosts of already-indexed documents I found it didn't work quite as I expected. The documentation has a warning which explains why: getBoost Returns the boost factor for hits on any field of this document. [...] Note: This value is not stored directly with the document in the index. Documents returned from IndexReader.document(int) and Hits.doc(int) may thus not have the same value present as when this document was indexed. So be cautious and test carefully if you try this -- and let us on the list know how it goes! Then use Field.setBoost() only. Note that this is stored with only 3 bits precision in the index. So when you know which field is queried for your scores, multiply the previous value with at least with 9/8 to be sure to increase the score. Or use the score of the next higher scoring doc as a baseline. That may not be enough. I seem to recall that the inverse square root of the boost is stored as the norm, so you may have to multiply with at least 91/64 to see a guaranteed difference in the score. At the moment I have no time left to check the details. Please have a look at the default Similarity implementation for more. Good night, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Range searches for numbers
Reece, What's the best way to store numbers for range searching? If someone has some info about this I'd love to see it. I implemented a LongField that encodes any +ve or -ve long into a string that sorts correctly. I posted that class here: http://www.mail-archive.com/[EMAIL PROTECTED]/msg04790.html Doing a String range search should be fairly straight forward from there. Let me know if you have any problems. =Matt - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Query performance on a 315 Million document index (1TB)
Hi, I am considering a project that would index 315+ million documents. I am comfortable that the indexing will work well in creating an index ~800GB in size, but am concerned about the query performance. (Is this a = bad assumption?) What are the bottlenecks of performance as an index scales? Memory? = Cost is not a concern, so what would be the shortcomings of a theoretical = machine with 16GB of ram, 4-16 cpus and 1-2 terabytes of space? Would it be = better to cluster machines to break apart the query? Thank you for your serious responses, Will Allen -- ___ Sign-up for Ads Free at Mail.com http://promo.mail.com/adsfreejump.htm - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]