Re: google mini? who needs it when Lucene is there

2005-01-27 Thread Otis Gospodnetic
500 times the original data? Not true! :) Otis --- Xiaohong Yang (Sharon) [EMAIL PROTECTED] wrote: Hi, I agree that Google mini is quite expensive. It might be similar to the desktop version in quality. Anyone knows google's ratio of index to text? Is it true that Lucene's index is

Re: total number of (unique) terms in the index

2005-01-28 Thread Otis Gospodnetic
I don't think there is a direct way to get the number of (unique) terms in the index, so yes, I think you'll have to loop through TermEnum and count. Otis --- Jonathan Lasko [EMAIL PROTECTED] wrote: I'm looking for the total number of unique terms in the index. I see that I can get a

Re: Loading a large index

2005-01-28 Thread Otis Gospodnetic
Edwin, --- Edwin Tang [EMAIL PROTECTED] wrote: I have three indices really that I search via ParallelMultiSearcher. All three are being updated constantly. We would like to be able to perform a search on the indices and have the results reflect the latest documents indexed. However, that

Re: Disk space used by optimize

2005-01-28 Thread Otis Gospodnetic
Morus, that description of 3 sets of index files is what I was imagining, too. I'll have to test and add to the book errata, it seems. Thanks for the info, Otis --- Morus Walter [EMAIL PROTECTED] wrote: Otis Gospodnetic writes: Hello, Yes, that is how optimize works - copies all

Re: Lucene in Action hits desk in UK

2005-01-28 Thread Otis Gospodnetic
: Is Lucene-in-Action being sold anywhere in Singapore? thanks! Otis Gospodnetic [EMAIL PROTECTED] wrote: Gospodnetiæ sounds like Gospodnetich and Eric is Erik :) Otis --- John Haxby wrote: Otis Gospodnetic wrote: I contacted both the US and UK Amazon sites and asked them

RE: carrot2 question too - Re: Fun with the Wikipedia

2005-01-31 Thread Otis Gospodnetic
Adam, Dawid posted some code that lets you use Carrot2 locally with Lucene, without the componentized pipe line system described on Carrot2 site. Otis --- Adam Saltiel [EMAIL PROTECTED] wrote: David, Hi, Would you be able to comment on coincidentally recent thread RE: - Grouping Search

Re: which HTML parser is better?

2005-02-02 Thread Otis Gospodnetic
If you are not married to Java: http://search.cpan.org/~kilinrax/HTML-Strip-1.04/Strip.pm Otis --- sergiu gordea [EMAIL PROTECTED] wrote: Karl Koch wrote: I am in control of the html, which means it is well formated HTML. I use only HTML files which I have transformed from XML. No

Re: Numbers in the Query String

2005-02-03 Thread Otis Gospodnetic
Using different analyzers for indexing and searching is not recommended. Your numbers are not even in the index because you are using StandardAnalyzer. Use Luke to look at your index. Otis --- Hetan Shah [EMAIL PROTECTED] wrote: Hello, How can one search for a document based on the query

Re: Optimize not deleting all files

2005-02-04 Thread Otis Gospodnetic
Get and try Lucene 1.4.3. One of the older versions had a bug that was not deleting old index files. Otis --- [EMAIL PROTECTED] wrote: Hi, When I run an optimize in our production environment, old index are left in the directory and are not deleted. My understanding is that an

Re: behavioral differences between Field.Keyword and Field.UnStored

2005-02-11 Thread Otis Gospodnetic
The QueryParser is analyzing your Field.Keyword (genre field) fields, because it doesn't know that genre is a Keyword field and should not be analyzed. Check section 4.4. here: http://www.lucenebook.com/search?query=queryparser+keyword Otis --- Mike Rose [EMAIL PROTECTED] wrote: Perhaps

Re: What does [] do to a query and what's up with lucene.apache.org?

2005-02-14 Thread Otis Gospodnetic
Hi, lucene.apache.org seems to work now. Here is the query syntax: http://lucene.apache.org/queryparsersyntax.html [] is used as [BEGIN-RANGE-STRING TO END-RANGE-STRING] Otis --- Jim Lynch [EMAIL PROTECTED] wrote: First I'm getting a The requested URL could not be retrieved

Re: Concurrent searching re-indexing

2005-02-16 Thread Otis Gospodnetic
Hi Paul, If I understand your setup correctly, it looks like you are running multiple threads that create IndexWriter for the ame directory. That's a no no. This section (first hit) describes all various concurrency issues with regards to adds, updates, optimization, and searches:

Re: Lucene vs. in-DB-full-text-searching

2005-02-18 Thread Otis Gospodnetic
The most obvious answer is that the full-text indexing features of RDBMS's are not as good (as fast) as Lucene. MySQL, PostgreSQL, Oracle, MS SQL Server etc. all have full-text indexing/searching features, but I always hear people complaining about the speed. A person from a well-known online

Re: Search Performance

2005-02-18 Thread Otis Gospodnetic
Or you could just open a new IndexSearcher, forget the old one, and have GC collect it when everyone is done with it. Otis --- Chris Lamprecht [EMAIL PROTECTED] wrote: I should have mentioned, the reason for not doing this the obvious, simple way (just close the Searcher and reopen it if a

Re: Document comparison

2005-02-18 Thread Otis Gospodnetic
Matt, Erik and I have some code for this in Lucene in Action, but David Spencer did this since the book was published: http://www.lucenebook.com/blog/announcements/more_like_this.html Otis --- Matt Chaput [EMAIL PROTECTED] wrote: Is there a simple, efficient way to compute similarity of

Re: Search Performance

2005-02-18 Thread Otis Gospodnetic
this leave open file handles? I had a problem where there were lots of open file handles for deleted index files, because the old searchers were not being closed. On Fri, 18 Feb 2005 13:41:37 -0800 (PST), Otis Gospodnetic [EMAIL PROTECTED] wrote: Or you could just open a new IndexSearcher

Re: Ranking Terms

2005-02-26 Thread Otis Gospodnetic
Make sure you are not indexing your documents using the compound index format (default in the newer versions of Lucene). Then you will see the .frq file. Here is an example from one of Simpy's Lucene indices: -rw-r--r--1 simpysimpy 629073 Feb 26 13:14 _1ao.frq Otis --

Re: Multiple indexes

2005-03-01 Thread Otis Gospodnetic
Ben, You do need to use a separate instance of those 3 classes for each index yes. But this is really something like: IndexWriter writer = new IndexWriter(); So it's normal code-writing process you don't really have to create anything new, just use existing Lucene API. As for locking,

<    3   4   5   6   7   8