Re: rackmount lucene/nutch - Re: google mini? who needs it when Lucene is there

2005-01-28 Thread mark harwood
Also need http://jcifs.samba.org/ so you can spider windows file shares. That project also has a very nice servlet filter that is used to provide automatic authentication of Windows clients using the NTLM protocol.

lucene query (sql kind)

2005-01-28 Thread sunil goyal
Hello all, I want to run dynamic queries against the lucene index. Is there any native syntax available for Lucene so that I can query, by first generating the query in say an XML or SQL like format (cache this query) and then use this query over lucene index. e.g. So a lucene query syntax in

Re: lucene query (sql kind)

2005-01-28 Thread PA
On Jan 28, 2005, at 12:40, sunil goyal wrote: I want to run dynamic queries against the lucene index. Is there any native syntax available for Lucene so that I can query, by first generating the query in say an XML or SQL like format (cache this query) and then use this query over lucene index.

Re: lucene query (sql kind)

2005-01-28 Thread David Escuer
Hello, To build queries, you can generate a query like (text:house OR text:car) AND (keywords:building), and then parse it with the QueryParser.parse method to get the Lucene query. Is not 100% sql-like syntax, but it's more clear than the lucene syntax. Hope it helps David sunil

RE: rackmount lucene/nutch - Re: google mini? who needs it when Lucene is there

2005-01-28 Thread Cocula Remi
In addition to this discution I would like to mention my efforts in creating a wrapper around Lucene with the LuceneServer project (http://sourceforge.net/projects/luceneserver/). It uses RMI to make indexes available over a network and includes automation tasks. I am courrently working on a

Re: lucene query (sql kind)

2005-01-28 Thread David Escuer
I've merged some different fields in one query, with the name of one of these fields as the second parameter in the static method, and it worked fine. Also, you can do a little query parser, and build the queries with BooleanQuery. David sunil goyal wrote: Hello, I was just trying that...

Re: lucene query (sql kind)

2005-01-28 Thread sunil goyal
Hello, Thanks, It works fine. The field parameter simply defines the default field for all queries without an explicit field specification (field:). Using 'field AND field' as default field does not make sense but does not hurt as long as the default field is not used. I'm not sure why you

Re: lucene query (sql kind)

2005-01-28 Thread mark harwood
I've added some user-defined lucene functions to HSQLDB and I've been able to run queries like the following one: select top 10 lucene_highlight(adText) from ads where pricePounds 200 and lucene_query('bass guitar drums',id)0 order by lucene_score(id) DESC I've had similar success with Derby

Re: Search results excerpt similar to Google

2005-01-28 Thread Erik Hatcher
On Jan 28, 2005, at 1:46 AM, Jason Polites wrote: I think they do a proximity result based on keyword matches. So... If you search for lucene and the document returned has this word at the very start and the very end of the document, then you will see the two sentences (sequences of words)

Re: Search results excerpt similar to Google

2005-01-28 Thread Maik Schreiber
Storing in the index has some performance benefits in the CVS version of Lucene, as you can store term position offset information and avoid having to re-analyze for highlighting. Speaking of which, is there a planned release date for a version that contains this feature? -- Maik Schreiber *

Re: carrot2 question too - Re: Fun with the Wikipedia

2005-01-28 Thread Akmal Sarhan
Hello, we have been experimenting with carrot2 and are very pleased so far, only one issue: there is no release not even an alpha one and the dependencies seemed to be patched (jama) is there any intentions to have any releases in the near future? thanks Akmal Am Montag, den 17.01.2005, 10:15

Re: lucene query (sql kind)

2005-01-28 Thread jian chen
I like your idea and think you are quite right. I see quite some people are using lucene to the extreme such that relational database functionalities are replaced by lucene. However, storing everything in lucene and use it as a relational type of database will be kind of re-inventing the wheel.

Loading a large index

2005-01-28 Thread Edwin Tang
I have three indices really that I search via ParallelMultiSearcher. All three are being updated constantly. We would like to be able to perform a search on the indices and have the results reflect the latest documents indexed. However, that would mean I need to refresh my searcher. Because of the

total number of (unique) terms in the index

2005-01-28 Thread Jonathan Lasko
I'm looking for the total number of unique terms in the index. I see that I can get a TermEnum of all the terms in the index, but what is the fastest way to get the total number of terms? Jonathan - To unsubscribe, e-mail:

RE: lucene query (sql kind)

2005-01-28 Thread Ross Rankin
I agree. My site is all dynamic pages created from the database. Right now, I have to have a process create dummy pages, index them with Lucene, then translate the Lucene results into meaningful links. It actually works better than it sounds, however it could be easier. If I could just give

Re: lucene query (sql kind)

2005-01-28 Thread Erik Hatcher
Ross - I'm really perplexed by your message. You create HTML from a database so that you can index it with Lucene, yet wish you could simply index the data in your database tied to a primary key directly, right? Well, you're in luck - you already can do this! What are you using for indexing?

document numbers

2005-01-28 Thread Jonathan Lasko
Yet another burning question :-). Can someone explain how the document numbers in Lucene documents work? For example, the TermDocs.doc() method returns the current doc number. How can I get this doc number if I just have a Document? Here's the context. I'm working on implementing Justin

Re: carrot2 question too - Re: Fun with the Wikipedia

2005-01-28 Thread Owen Densmore
I looked at the Carrot2 docs which mentioned dimension reduction via singular value decomposition (SVD) .. and other forms too I think. Question: Does anyone have pointers to successful clustering techniques used with lucene? I'm particularly interested in 2D and 3D graphics as well, possibly

Re: total number of (unique) terms in the index

2005-01-28 Thread Otis Gospodnetic
I don't think there is a direct way to get the number of (unique) terms in the index, so yes, I think you'll have to loop through TermEnum and count. Otis --- Jonathan Lasko [EMAIL PROTECTED] wrote: I'm looking for the total number of unique terms in the index. I see that I can get a

Re: Loading a large index

2005-01-28 Thread Otis Gospodnetic
Edwin, --- Edwin Tang [EMAIL PROTECTED] wrote: I have three indices really that I search via ParallelMultiSearcher. All three are being updated constantly. We would like to be able to perform a search on the indices and have the results reflect the latest documents indexed. However, that

Re: Disk space used by optimize

2005-01-28 Thread Otis Gospodnetic
Morus, that description of 3 sets of index files is what I was imagining, too. I'll have to test and add to the book errata, it seems. Thanks for the info, Otis --- Morus Walter [EMAIL PROTECTED] wrote: Otis Gospodnetic writes: Hello, Yes, that is how optimize works - copies all

Re: Lucene in Action hits desk in UK

2005-01-28 Thread Otis Gospodnetic
Hello, I've asked the publisher ( http://www.manning.com ) yesterday. I don't know about the exact stores, but apparently they do have a distributor in Singapore, so you should be able to find Lucene in Action there soon. Otis --- jac jac [EMAIL PROTECTED] wrote: Just wondering: Is

Re: query term frequency

2005-01-28 Thread Grant Ingersoll
I implemented a Query version of the TermVector org.apache.lucene.search.QueryTermVector Works off of an array of Strings or a String and an Analyzer. Is this what you are looking for? [EMAIL PROTECTED] 1/28/2005 6:33:18 AM On Jan 27, 2005, at 10:24 PM, Jonathan Lasko wrote: No, the number

Re: query term frequency

2005-01-28 Thread markharw00d
This from the highlighter package will give you the IDF : WeightedTerm[] QueryTermExtractor.getIdfWeightedTerms(Query query, IndexReader reader, String fieldName) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional

Penalty for storing unrelated field?

2005-01-28 Thread Bill Tschumy
I have an index containing a lot of documents with common fields. Is there any speed/space penalty for adding an unrelated document with a totally unrelated field? I want to store a version number and maybe a few other bits of meta-info in the index. I just want to make sure that adding the

Re: Penalty for storing unrelated field?

2005-01-28 Thread Andy Goodell
You should be fine. On Fri, 28 Jan 2005 15:21:50 -0600, Bill Tschumy [EMAIL PROTECTED] wrote: I just want to make sure that adding the unrelated field to a single doc won't cause all the other documents to increase their storage space. -- I have lots of fields that only occur in one

Simple question about concurrency

2005-01-28 Thread Peter Kim
Hi, I'm still mostly a beginner, both with Java and Lucene, so I apologize if this may be dumb questions. Is making index-modifying operations safe as simple just doing the following? synchronized (writer) { while (IndexReader.isLocked(directory)) wait();