sub search

2006-03-07 Thread Anton Potehin
Is it possible to make search among results of previous search?   For example: I made search: Searcher searcher =... Query query = ... Hits hits = hits = Searcher.search(query);   After it I want to not make a new search, I want to make search among found results

Re: sub search

2006-03-07 Thread hu andy
2006/3/7, Anton Potehin <[EMAIL PROTECTED]>: > > Is it possible to make search among results of previous search? > > > > > > For example: I made search: > > > > Searcher searcher =... > > > > Query query = ... > > > > Hits hits = > > > > hits = Searcher.search(query); > > > > > > > > After it

Re: Distributed Lucene..

2006-03-07 Thread Andrzej Bialecki
Prasenjit Mukherjee wrote: I think nutch has a distributed lucene implementation. I could have used nutch straightaway, but I have a different crawler, and also dont want to use NDFS(which is used by nutch) . What I have proposed earlier is basically based on mapReduce paradigm, which is used b

RE: sub search

2006-03-07 Thread anton
As far as I understood that will make new search throughout the index. But what the difference between that and search described below: TermQuery termQuery = new TermQuery( BooleanQuery bq = .. bq.add(termQuery,true,false); bq.add(query,true,false); hits = Searcher.search(bq,queryFilter);

about lucene 1.9

2006-03-07 Thread Haritha_Parvatham
Hi, I have downloaded the latest release lucene 1.9.I have deployed in tomcat. When i search from the front end.It gives me the message.Please tell me how to use lucene 1.9 . Welcome to the Lucene Template application. (This is the header) Document Summary null

Writing terms/freq pairs directly to the inverted file

2006-03-07 Thread Murat Yakici
Hi, I would like to by-pass the IndexWriter and directly write the terms and their frequencies to the index (and may proximity info later on). I might have missed any discussion if previously. As far as I know, the high level API in Lucene only allows you to add documents (which are populated

Re: MultiPhraseQuery

2006-03-07 Thread Erik Hatcher
On Mar 7, 2006, at 2:35 AM, Eric Jain wrote: Daniel Naber wrote: Please try to add this to MultiPhraseQuery and let us know if it helps: public List getTerms() { return termArrays; } That is indeed all I need (the list wouldn't have to be mutable though). Any chance this could be c

Re: sub search

2006-03-07 Thread hu andy
It uses cache mechanism. The detail is described in the book Lucene in Action. Maybe you can test it to decide which is faster 2006/3/7, [EMAIL PROTECTED] <[EMAIL PROTECTED]>: > > As far as I understood that will make new search throughout the index. But > what the difference between that and sear

Question

2006-03-07 Thread Thomas Papke
Hello, anyone implement the "Google Suggest" Feature using Lucene? The Frontend is clear - but i need a very fast way to retrieve matching terms. For example: The user typed "Ab" and i want to give him a list of 10 possible words in term "name" starting with "Ab*". So i don't need the hole do

RE: Question

2006-03-07 Thread Pasha Bizhan
Hi, > From: Thomas Papke [mailto:[EMAIL PROTECTED] > anyone implement the "Google Suggest" Feature using Lucene? > The Frontend is clear - but i need a very fast way to > retrieve matching terms. For > example: The user typed "Ab" and i want to give him a list of > 10 possible words in term

RE: Question

2006-03-07 Thread Pasha Bizhan
Hi, > From: Thomas Papke [mailto:[EMAIL PROTECTED] > anyone implement the "Google Suggest" Feature using Lucene? > The Frontend is clear - but i need a very fast way to > retrieve matching terms. For > example: The user typed "Ab" and i want to give him a list of > 10 possible words in term

Re: Question

2006-03-07 Thread gekkokid
would lucene even have to be accessed? couldnt you save the queries when submitted and search that via a sql database? _gk - Original Message - From: "Thomas Papke" <[EMAIL PROTECTED]> To: Sent: Tuesday, March 07, 2006 12:11 PM Subject: Question Hello, anyone implement the "Google

Re: Question

2006-03-07 Thread Leon Chaddock
Hi, I am very interested in this aswell, as I wish to display related searches for users. Does anyone know if this work is open source and is there an api available? Thanks Leon - Original Message - From: "Pasha Bizhan" <[EMAIL PROTECTED]> To: Sent: Tuesday, March 07, 2006 12:39 PM

Lucene version 1.9

2006-03-07 Thread WATHELET Thomas
I've created an index with the Lucene version 1.9 and when I try to open this index I have always this error mesage: java.lang.ArrayIndexOutOfBoundsException. if I use an index built with the lucene version 1.4.3 it's working. Wath's wrong?

RE: Question

2006-03-07 Thread Pasha Bizhan
Hi, > From: Leon Chaddock [mailto:[EMAIL PROTECTED] > I am very interested in this aswell, as I wish to display > related searches for users. What does "related" mean? > Does anyone know if this work is open source and is there an > api available? Ask David or use web.archive: http://web.

RE: Question

2006-03-07 Thread Pasha Bizhan
Hi, > From: Leon Chaddock [mailto:[EMAIL PROTECTED] > I am very interested in this aswell, as I wish to display > related searches for users. What does "related" mean? > Does anyone know if this work is open source and is there an > api available? Ask David or use web.archive: http://web.

Re: Question

2006-03-07 Thread Jeff Rodenburg
We've done this, and it's not that complex. (Sorry, client won't allow me to release the code.) It's AJAX on the front end, so that background call is simply executing a search against an index that consists of the aggregated search terms. We do wildcard queries to get the results we want. For u

Get only count

2006-03-07 Thread Anton Potehin
Now I create new search for get number of results. For example: IndexSearcher is = ... Query q = ... numberOfResults = Is.search(q).length(); Can I accelerate this example ? And how ?

RE: Search for synonyms - implemenetation for review

2006-03-07 Thread Ziv Gome
Hi all, I have few more remarks to Andrew's already thorough mail... I fear though Andrew gave me too much credit, for a cooperative, brain-storming work we both did. 1. How are the results? We have not conducted a real research on the results we got, in terms of recall and precision measurements

Re: sub search

2006-03-07 Thread Erik Hatcher
On Mar 7, 2006, at 7:03 AM, hu andy wrote: It uses cache mechanism. The detail is described in the book Lucene in Action. Maybe you can test it to decide which is faster Major caveat here is that the caching QueryFilter employs really only works if you use the same instance of QueryFilter for

Re: Get only count

2006-03-07 Thread Eric Jain
Anton Potehin wrote: Now I create new search for get number of results. For example: IndexSearcher is = ... Query q = ... numberOfResults = Is.search(q).length(); Can I accelerate this example ? And how ? Perhaps something like: class CountingHitCollector implements HitCollector { pu

Re: sub search

2006-03-07 Thread Eric Jain
Anton Potehin wrote: After it I want to not make a new search, > I want to make search among found results... Perhaps something like this would work: final BitSet results = toBitSet(Hits); searcher.search(newQuery, new Filter() { public BitSet bits(IndexReader reader) { return results;

Unreported IOException received for SpanTermQuery class

2006-03-07 Thread Murat Yakici
Hi, I was building the Lucene 1.9.1 source code. I have received the following error msg: "Unreported exceptions: java.io.IOException must be caught or declared to be thrown. " in class SpanOrQuery, line number 154. Any ideas how to resolve it? Regards, Murat --

RE: Get only count

2006-03-07 Thread anton
While you added "if (score > 0.0f)". Javadoc contain lines "HitCollector.collect(int,float) is called for every non-zero scoring". -Original Message- From: Eric Jain [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 07, 2006 5:08 PM To: java-user@lucene.apache.org Subject: Re: Get only count

RE: Get only count

2006-03-07 Thread anton
While you added "if (score > 0.0f)". Javadoc contain lines "HitCollector.collect(int,float) is called for every non-zero scoring". -Original Message- From: Eric Jain [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 07, 2006 5:08 PM To: java-user@lucene.apache.org Subject: Re: Get only count

Re: Lucene version 1.9

2006-03-07 Thread Paul Elschot
Thomas, On Tuesday 07 March 2006 13:57, WATHELET Thomas wrote: > I've created an index with the Lucene version 1.9 and when I try to open > this index I have always this error mesage: > java.lang.ArrayIndexOutOfBoundsException. > if I use an index built with the lucene version 1.4.3 it's working.

Re: Get only count

2006-03-07 Thread Yonik Seeley
On 3/7/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > While you added "if (score > 0.0f)". Javadoc contain lines > "HitCollector.collect(int,float) is called for every non-zero scoring". That should probably read "is called for every matching document". -Yonik ---

Re: Unreported IOException received for SpanTermQuery class

2006-03-07 Thread Paul Elschot
On Tuesday 07 March 2006 15:35, Murat Yakici wrote: > Hi, > I was building the Lucene 1.9.1 source code. I have received the > following error msg: > > "Unreported exceptions: java.io.IOException must be caught or declared > to be thrown. " in class SpanOrQuery, line number 154. > > Any ideas h

RE: Get only count

2006-03-07 Thread anton
Can have matching document score equals zero ? -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 07, 2006 6:20 PM To: java-user@lucene.apache.org Subject: Re: Get only count Importance: High On 3/7/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Whil

RE: Get only count

2006-03-07 Thread anton
Can have matching document score equals zero ? -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 07, 2006 6:20 PM To: java-user@lucene.apache.org Subject: Re: Get only count Importance: High On 3/7/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Whil

indexing problems

2006-03-07 Thread Apache Lucene
Hi, I am using Lucene 1.9.1 to index the files. The index writer created the following files (1) segment file "segments" (2) deletable file "deletable" (3) compound file "cfs" None of the other files like term info, frequency..etc were created. Is there something obvious, I am doing wrong?

Re: Unreported IOException received for SpanTermQuery class

2006-03-07 Thread Murat Yakici
The compiler is Sun Java 1.4.2_08. Paul Elschot wrote: On Tuesday 07 March 2006 15:35, Murat Yakici wrote: Hi, I was building the Lucene 1.9.1 source code. I have received the following error msg: "Unreported exceptions: java.io.IOException must be caught or declared to be thrown. " in cl

Re: indexing problems

2006-03-07 Thread Yonik Seeley
You are using the compound file format (the default since 1.4) and the .cfs file contains all those individual parts. -Yonik On 3/7/06, Apache Lucene <[EMAIL PROTECTED]> wrote: > Hi, >I am using Lucene 1.9.1 to index the files. The index writer created > the following files > (1) segment

Re: Get only count

2006-03-07 Thread Yonik Seeley
On 3/7/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: > Can have matching document score equals zero ? Yes. Scorers don't generally use "score" to determine if a document matched the query. Scores <= 0.0f are currently screened out at the top level search functions, but not when you use a HitCo

Re: indexing problems

2006-03-07 Thread Apache Lucene
Is it advisable to use compound file format? or should I revert it back to simple file format? How do I revert it back? thanks, lucenenator On 3/7/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > You are using the compound file format (the default since 1.4) and the > .cfs file contains all thos

Re: Unreported IOException received for SpanTermQuery class

2006-03-07 Thread Paul Elschot
On Tuesday 07 March 2006 16:34, Murat Yakici wrote: > The compiler is Sun Java 1.4.2_08. I'm using sun javac 1.5.0_01 and this compiles the current trunk without any problems, so I cannot reproduce the error msg. The common-build.xml file uses source and target 1.4 for javac, (in the compile macro

Re: indexing problems

2006-03-07 Thread Erik Hatcher
On Mar 7, 2006, at 10:41 AM, Apache Lucene wrote: Is it advisable to use compound file format? or should I revert it back to simple file format? How do I revert it back? There is a setter on IndexWriter to set it back if you like. The compound format avoids the issues that cropped up a

Re: Unreported IOException received for SpanTermQuery class

2006-03-07 Thread Murat Yakici
Yeah, I know, sorry for that. The reason is, first I tried to solve the problem by wrapping the line with a try-catch block. Then, the next build gave the same error for SpanTermQuery and some other classes. I will try to compile that on 1.5.0_01. Thanks, Murat Paul Elschot wrote: On Tuesd

Classification / Change Scoring during search

2006-03-07 Thread Rainer Dollinger
Hello, I want to use Lucene to get similar documents based on a Boolean Query (similar metadata with OR clauses) and ratings of the user for already searched documents. I intend to implement a Naive Bayes classifier to categorize documents into liked/disliked classes and would do this by using a

Re: indexing problems

2006-03-07 Thread Apache Lucene
This line is throwing a null pointer exception for the index I created as I mentioned in my previous emails. searcher = new IndexSearcher(IndexReader.open(indexPath) ); Any ideas? I made sure the indexPath is a valid path. thanks, lucenenator On 3/7/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:

RE: Using NOT queries inside parentheses

2006-03-07 Thread Satuluri, Venu_Madhav
> Query at = new TermQuery(new Term("alwaysTrueField","true)); > Query user = queryParser.parse(userInput); > if (user instanceof BooleanQuery) { > BooleanQuery bq = (BooleanQuery)user; > if (! usableBooleanQuery(bq)) { > bq.add(at, true, false); /* add 'always true' clause

Re: indexing problems

2006-03-07 Thread Apache Lucene
BTW, I could access that index using Luke. It works fine. On 3/7/06, Apache Lucene <[EMAIL PROTECTED]> wrote: > > This line is throwing a null pointer exception for the index I created as > I mentioned in my previous emails. > > searcher = new IndexSearcher(IndexReader.open(indexPath) ); > > Any

Scoring with FunctionQueries?

2006-03-07 Thread Sebastian Marius Kirsch
Hello, I have been trying out Yonik's excellent FunctionQuery (from Solr), but am having some problems regarding the scoring of FunctionQueries in conjunction with other queries. I am currently researching a data fusion approach, where you have several separate scores for a document and combine t

Re: Lucene version 1.9

2006-03-07 Thread Doug Cutting
WATHELET Thomas wrote: I've created an index with the Lucene version 1.9 and when I try to open this index I have always this error mesage: java.lang.ArrayIndexOutOfBoundsException. if I use an index built with the lucene version 1.4.3 it's working. Wath's wrong? Are you perhaps trying to open

Re: Throughput doesn't increase when using more concurrent threads

2006-03-07 Thread Peter Keegan
I ran a query performance tester against 8-cpu and 16-cpu Xeon servers (16/32 cpu hyperthreaded). on Linux. Here are the results: 8-cpu: 275 qps 16-cpu: 305 qps (the dual-core Opteron servers are still faster) Here is the stack trace of 8 of the 16 query threads during the test: at org.

Re: Throughput doesn't increase when using more concurrent threads

2006-03-07 Thread Doug Cutting
Peter Keegan wrote: I ran a query performance tester against 8-cpu and 16-cpu Xeon servers (16/32 cpu hyperthreaded). on Linux. Here are the results: 8-cpu: 275 qps 16-cpu: 305 qps (the dual-core Opteron servers are still faster) Here is the stack trace of 8 of the 16 query threads during the

Re: Distributed Lucene..

2006-03-07 Thread Otis Gospodnetic
Hi, Just curious about this: > We hacked :-) IndexWriter of Lucene to start all segment names with a > prefix unique for each small index part. > Then, when adding it to the actual index, we simply copy the new segment > to the folder with the other segments, and add it in such a way so that > the

Re: Scoring with FunctionQueries?

2006-03-07 Thread Chris Hostetter
: but I want only the score of the full-text query to be multiplied by : the query norm. The function queries should be added to the final : query as they are (the factors a, b, ... could be set using a query : boost.) : : How do I achieve that? I'm rather lost in the forest of Scorer, : Similarit

ReIndex or rework query

2006-03-07 Thread Jennifer Sears
We've built an index that has 8 stored, tokenized text fields. For optimizing search results, should we: 1. build the query programmatically and try to determine which field the searchTerm might fit in (i.e. Terms that would match in City, country, would not match in award or amenity) 2. Do a mult

Lucene 1.9.1 and timeToString() apparent incompatibility with 1.4.3

2006-03-07 Thread George Washington
I recently converted from Lucene 1.4.3 to 1.9.1 and in the process replaced all deprecated classes with the new ones as recommended (for forward compatibility with Lucene 2.0). This however seems to introduce an incompatibilty when the new timeToString() and stringToTime() classes are used. Us

Re: BooleanQuery$TooManyClauses with 1.9.1 when Number RangeQuery

2006-03-07 Thread Youngho Cho
Hello - Original Message - From: "Chris Hostetter" <[EMAIL PROTECTED]> To: "Lucene Users" Sent: Tuesday, March 07, 2006 3:49 PM Subject: Re: BooleanQuery$TooManyClauses with 1.9.1 when Number RangeQuery > > : I upgade to 1.9.1 and reindexing > : I used NumberTool when I index the numb

Weighted Terms Per Document

2006-03-07 Thread Matthew O'Connor
Hello, I'm using Lucene 1.9 to replace an in-house search engine where all of the documents to be searched are also created in-house. One of the features of the search engine is something called 'xtras' which are associated with the documents. I am wondering how best to model this feature using

Re: Scoring with FunctionQueries?

2006-03-07 Thread Sebastian Marius Kirsch
Dear Chris, thanks very much for your quick answer. I tried both approaches, and both don't seem to do what I want. Perhaps I did not understand you properly. I generated a small in-memory index (six documents) for testing your suggestions, with some text in field "content" and a numeric score i

Re: Using NOT queries inside parentheses

2006-03-07 Thread Daniel Noll
Satuluri, Venu_Madhav wrote: If you want this to work, the most elegant way I've found is to override the getBooleanQuery(Vector) method in QueryParser and insert a MatchAllDocsQuery into the boolean query if every clause is prohibited. Daniel I tried this, but it looks like the overridden

Lucene 1.9.1 and timeToString() apparent incompatibility with 1.4.3

2006-03-07 Thread Victor Negrin
I recently converted from Lucene 1.4.3 to 1.9.1 and in the processed replaced all deprecated classes with the new ones as recommended (for forward compatibility with Lucene 2.0). This however seems to introduce an incompatibilty when the new timeToString() and stringToTime() classes are used. Using

Re: sub search

2006-03-07 Thread Daniel Noll
Anton Potehin wrote: Is it possible to make search among results of previous search? After it I want to not make a new search, I want to make search among found results... Simple. Create a new BooleanQuery and put the original query into it, along with the new query. Daniel -- Daniel

Re: BooleanQuery$TooManyClauses with 1.9.1 when Number RangeQuery

2006-03-07 Thread Youngho Cho
Hello, > > > > : I upgade to 1.9.1 and reindexing > > : I used NumberTool when I index the number. > > : > > : after upgrade I got following error when number range query. > > : with query > > > > The possibility of a TooManyClauses exception has always existed with > > RangeQuery and numbers, e

Re: BooleanQuery$TooManyClauses with 1.9.1 when Number RangeQuery

2006-03-07 Thread Chris Hostetter
: > You mean Theoritically : > RangeQuery should be forbidden because it always has potential time bomb ? : > Should we comment it in javadoc ? In my opinion, the only reason to use RangeQuery is if you are dealing with very controlled ranges, where you know hte number of terms it will expand to

Re: Lucene 1.9.1 and timeToString() apparent incompatibility with 1.4.3

2006-03-07 Thread Chris Hostetter
: timeToString() and stringToTime() classes are used. Using an index created : with 1.4.3 and searched with 1.9.1 I now receive the following errors: As the deprecation comment in DateField says... If you build a new index, use DateTools instead. For existing indices you can co

Re: Scoring with FunctionQueries?

2006-03-07 Thread Chris Hostetter
: I tried both approaches, and both don't seem to do what I : want. Perhaps I did not understand you properly. >From what I can tell it looks like you understood me perfectly, I too am baffled by the results you are getting. I have a couple of thoughts: 1) check the raw core you get from these

Re: Lucene 1.9.1 and timeToString() apparent incompatibility with 1.4.3

2006-03-07 Thread George Washington
Thanks Chris for making it clear, I had read the comment but I had not understood that it implied incompatibility. But will the code be preserved in Lucene 2.0, in light of the comment contained in the Lucene 1.9.1 announcement ? QUOTE Applications must compile against 1.9 without deprecation w

RE: Distributed Lucene..

2006-03-07 Thread Andrew Schetinin
Hi, Sure not. We created another IndexWriter class and modified its function addIndexes (if I remember the function name correctly) so it will not call to optimize at the end - that's all. Having unique segment names was necessary because the segment file name is used inside the file itself, and c