Re: lucene query (sql kind)

2005-01-28 Thread jian chen
I like your idea and think you are quite right. I see quite some people are using lucene to the extreme such that relational database functionalities are replaced by lucene. However, storing everything in lucene and use it as a relational type of database will be kind of re-inventing the wheel.

google mini? who needs it when Lucene is there

2005-01-27 Thread jian chen
Hi, I was searching using google and just found that there was a new feature called google mini. Initially I thought it was another free service for small companies. Then I realized that it costs quite some money ($4,995) for the hardware and software. (I guess the proprietary software costs a

Re: google mini? who needs it when Lucene is there

2005-01-27 Thread jian chen
to text? Is it true that Lucene's index is about 500 times the original text size (not including image size)? I don't have one installed, so I cannot measure. Best, Sharon jian chen [EMAIL PROTECTED] wrote: Hi, I was searching using google and just found that there was a new

Re: Suggestions for documentation or LIA

2005-01-26 Thread jian chen
Hi, Just to continue this discussion. I think right now Lucene's retrieval algorithm is based purely on Vector Space Model, which is simple and efficient. However, there maybe cases where folks like me want to use another set of completely different ranking algorithms, those which do not even

Re: Suggestions for documentation or LIA

2005-01-26 Thread jian chen
[EMAIL PROTECTED] wrote: jian chen [EMAIL PROTECTED] writes: Just to continue this discussion. I think right now Lucene's retrieval algorithm is based purely on Vector Space Model, which is simple and efficient. As I understand it, it's indeed a tf-idf vector space approach, except

Re: How to give recent documents a boost?

2005-01-25 Thread jian chen
Hi, I think setting boost to the recent document is tricky. There is no clear cut except trial and error to make the boost value right. Could you let the user specify a date range and sort the documents within that range by relevance? This way, the users get what they exactly specified, and

Re: Opening up one large index takes 940M or memory?

2005-01-22 Thread jian chen
Hi, If it is really the case that every 128th term is loaded into memory. Could you use a relational database or b-tree to index to do the work of indexing of the terms instead? Even if you create another level of indexing on top of the .tii fle, it is just a hack and would not scale well. I

Re: Lucene in Action

2005-01-22 Thread jian chen
Hi, I am not sure. However I see that the book has an electronic version you can buy online... Cheers, Jian On Sun, 23 Jan 2005 10:30:24 +0800, ansi [EMAIL PROTECTED] wrote: hi,all Does anyone know how to buy Lucene in Action in China? Ansi

Re: Newbie: Human Readable Stemming, Lucene Architecture, etc!

2005-01-20 Thread jian chen
Hi, One thing to point out. I think Lucene is not using LSI as the underlying retrieval model. It uses vector space model and also proximity based retrieval. Personally, I don't know much about LSI and I don't think the fancy stuff like LSI is workable in industry. I believe we are far away from