Re: newbie: about grouping field

2004-08-12 Thread Ype Kingma
Fernando, On Thursday 12 August 2004 17:44, Wermus Fernando wrote: Luceners I have to search a string in 30 fields. I know how to do it in a long way. I wanna know if exists a shorter way. String for searching: what's your name? Long way: +firstname:what's your name? OR +lastname: what's

Re: Search Hit Score

2004-07-07 Thread Ype Kingma
On Wednesday 07 July 2004 08:25, Ype Kingma wrote: For a single term query, one can iterate through IndexReader.termDocs(Term) and store the document numbers by TermDocs.docFreq(). That should be TermDocs.freq() Oops, Ype

Re: Range Query Sombody HELP please

2004-06-03 Thread Ype Kingma
On Thursday 03 June 2004 07:10, Karthik N S wrote: Hey Ype the Query of range +button +shirt +filename:[b10181_p100 TO b10181_p200] did not work for me but on other way around +(button OR shirt) +filename:[b10181_p100 TO b10181_p200] resulted to me in 2 hits with either one

Re: Range Query Sombody HELP please

2004-06-02 Thread Ype Kingma
On Wednesday 02 June 2004 14:46, Erik Hatcher wrote: On Jun 2, 2004, at 6:20 AM, Karthik N S wrote: ... I still have 3 small Questions. 1)While creating the Range Query Is it possible for Lucene to do somthing similar.. +(button AND shirt) +filename:[b10181_p100 TO b10181_p200]

Re: Range Query Sombody HELP please

2004-05-31 Thread Ype Kingma
Karthik, On Monday 31 May 2004 06:12, Karthik N S wrote: Hey Ype ... My Question now is, If I want to Use Range Query to get search hits between fileName B10181_P702 and B10181_P355 only Instead of all the 67 hits , In this case there is no need to override range query, just use

Re: Range Query Sombody HELP please

2004-05-31 Thread Ype Kingma
On Monday 31 May 2004 11:09, Karthik N S wrote: ... I re indexed my folder 10181 [Seem's to be corrupted] Was the index writer closed? Now I am getting the hits as D:\JAVA\lucene\src\demojava org.lucene.src.indexer.search.SearchFiles Search Keyword : +button+filename:[B10181_P702 TO

Re: Range Query Sombody HELP please

2004-05-31 Thread Ype Kingma
Karthik, On Monday 31 May 2004 13:47, Karthik N S wrote: Hey Ype... 1) I switched Off the Multi search Senerio. 2) Changing the Field type from Text to Keyword will fail When I search for the the Field type filename so,I still maintained it to be Text Just make sure the file name

Re: Range Query Sombody HELP please

2004-05-28 Thread Ype Kingma
-Original Message- From: Ype Kingma [mailto:[EMAIL PROTECTED] Sent: Thursday, May 27, 2004 11:03 PM To: [EMAIL PROTECTED] Subject: Re: Range Query Sombody HELP please On Thursday 27 May 2004 09:37, Karthik N S wrote: Hi Lucene -Developer My main intention was Search for an word hit

Re: Range Query Sombody HELP please

2004-05-28 Thread Ype Kingma
On Friday 28 May 2004 10:54, Karthik N S wrote: Hey ype Thx for the advice but still I need to get the exact situation working , 1) I have a unique Field [ called filename ] which is indexed of type Text. It accepts the name of the HTML files as the indexing parameter , Also there

Re: Range Query Sombody HELP please

2004-05-27 Thread Ype Kingma
On Thursday 27 May 2004 07:00, Karthik N S wrote: Hi Lucene developers Is it possible to do Search and retrieve relevant information on the Indexed Document within in specific range settings which may be similar to an Query in SQL = select * from BOOKSHELF where book1 between 100

Re: Range Query Sombody HELP please

2004-05-27 Thread Ype Kingma
On Thursday 27 May 2004 09:37, Karthik N S wrote: Hi Lucene -Developer My main intention was Search for an word hit in a Unique Field between ranges say book100 - book 200 indexed numbers It's something like creating a SUBSEARCH with in the SEARCHINDEX. You don't need to

Re: Query for the existence of a Lucene field in a document?

2004-05-25 Thread Ype Kingma
David, On Tuesday 25 May 2004 03:05, you wrote: I have an application using Lucene 1.3 final. In this application, I am loading data where the main text for each document is stored into a body field, a couple of other internal fields, and basically some meta-data fields driven by the data

Re: How to handle range queries over large ranges and avoid Too Many Boolean clauses

2004-05-18 Thread Ype Kingma
On Tuesday 18 May 2004 19:38, Claude Devarenne wrote: Hi, I have over 60,000 documents in my index which is slightly over a 1 GB in size. The documents range from the late seventies up to now. I have indexed dates as a keyword field using a string because the dates are in MMDD format.

Re: Are lucene have a configuration feature for storage compression option?

2004-05-14 Thread Ype Kingma
Alex, Otis, On Friday 14 May 2004 13:58, Otis Gospodnetic wrote: Moving to lucene-user list. Hello, Didn't I already answer these questions? 1. No :( There is bit more to say, see below. ... --- Alex Aw Seat Kiong [EMAIL PROTECTED] wrote: Hi! Some question about lucene: 1. Are

Re: Memory Requirements

2004-05-13 Thread Ype Kingma
Paul, On Thursday 13 May 2004 22:03, Paul wrote: Stephane James Vaucher wrote: On Thu, 13 May 2004, Matt Quail wrote: do you know of any method to reduce the memory consumption of lucene when searching? Avoid prefix queries and wildcards, since they can be rewritten into large

Re: Mixing database and lucene searches

2004-05-11 Thread Ype Kingma
On Tuesday 11 May 2004 17:26, Gerard Sychay wrote: Eric Jain [EMAIL PROTECTED] 05/11/04 04:47AM Hits hits = searcher.search(new TermQuery(text, foo) Set hitPKs = new Set(); for each doc in hits: hitPKs.put(doc.getField(pk)) Retrieving even one custom field for every document

Re: Returning Separate Hits from Multisearcher

2004-05-10 Thread Ype Kingma
On Monday 10 May 2004 14:13, David Townsend wrote: We have a number of small indices and also an uber-index made up of all the smaller indices. We need to get do a search across a number of the sub-indices and get back a hit count from each. Currently we search each index, we've also tried

Re: Scoring documents by Click Count

2004-05-06 Thread Ype Kingma
On Thursday 06 May 2004 18:11, David Spencer wrote: Otis Gospodnetic wrote: Sure. On click, get document Id (not internal docId, but something you use as s surrogate primary key) of the clicked document. Retrieve the document. Pull out the value of 'clickCount' field. +1 it. Delete the

Re: Scoring documents by Click Count

2004-05-06 Thread Ype Kingma
On Thursday 06 May 2004 23:26, Boris Goldowsky wrote: On Thu, 2004-05-06 at 13:58, Ype Kingma wrote: Changing the click count this way is ok, but along with that you could change the (field) norm for the document to increase it's score in subsequent queries. You can use Document.setBoost

Re: Count for a keyword occurance in a file

2004-04-29 Thread Ype Kingma
On Thursday 29 April 2004 08:14, Nader S. Henein wrote: Tricky, scoring has to do with the frequency of the occurrence of the word as opposed to the amount of words in the file in general (Somebody correct me if I'm wrong) , so short of an educated approximation, you could hack Lucene uses two

Re: Help with scoring, coordination factor?

2004-04-29 Thread Ype Kingma
On Thursday 29 April 2004 20:09, Matthew W. Bilotti wrote: I can't help you with your first question about coordination of disjunctions in conjunctions. Actually, I would like to have the possibility to provide all terms in an OR query with the same idf weight, eg. some avarage of their IDF's,

Re: lucene applicability and performance

2004-04-28 Thread Ype Kingma
Greg, On Wednesday 28 April 2004 21:44, Greg Conway wrote: Hello. Apologies if this has come up before, I'm new to the list and didn't see anything in the archives that exactly matched my situation. It has, but each situation is different. Try this:

Re: lucene applicability and performance

2004-04-28 Thread Ype Kingma
Greg, Yes, see RemoteSearchable and MultiSearcher in org.apache.lucene.search. (See the javadoc on the website) I meant ParallelMultiSearcher. Good night, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional

Re: Result scoring question

2004-04-15 Thread Ype Kingma
On Wednesday 14 April 2004 20:55, Armbrust, Daniel C. wrote: I should have remembered that. Here are the 3 explanations for the top 3 documents returned (contents below) 3.3513687 = product of: 6.7027373 = weight(preferred_designation:renal calculus in 48270), product of: 0.8114604 =

Re: ValueListHandler pattern with Lucene

2004-04-09 Thread Ype Kingma
On Friday 09 April 2004 21:18, [EMAIL PROTECTED] wrote: Hi! I implemented a VLH pattern Lucene's search hits but noticed that hits.doc() is quite slow (3000+ hits took about 500ms). So, I want to ask people here for a solution. I tought about something like a wrapper for the VO

Re: Similarity - position in Field[] effects scoring - how to change?

2004-03-23 Thread Ype Kingma
On Tuesday 23 March 2004 16:05, Joachim Schreiber wrote: Hallo, I run in following problem. Perhaps somebody can help me. I have a index with different ids in the same field something like s s45678565 s87854546 Situation: I have different documents with the entry s in

Re: Similarity - position in Field[] effects scoring - how to change?

2004-03-23 Thread Ype Kingma
Joachim, ... you think its possible to order by e.g. date field without retrieving all the values from the index?? Yes, the new sorting feature from CVS does that, see Doug's last note on the subject. (It might have been on lucene-dev, I didn't keep a copy). Have fun, Ype

Re: boosting StandardAnalyzer, stop words

2003-12-09 Thread Ype Kingma
Stefan, I didn't provide the patch, I just remembered the code from a recent reading. I took another look whether there are more such cases in the Term() method, but I couldn't find anything clear in the .jj file. The generated .java file didn't help much either. Could you provide a line number

Re: boosting StandardAnalyzer, stop words

2003-12-09 Thread Ype Kingma
On Tuesday 09 December 2003 17:58, Ype Kingma wrote: Stefan, I didn't provide the patch, I just remembered the code from a recent reading. I took another look whether there are more such cases in the Term() method, but I couldn't find anything clear in the .jj file. The generated .java

Re: boosting StandardAnalyzer, stop words

2003-12-08 Thread Ype Kingma
Stefan, It's a bug, and there is a fix for this in the latest CVS near the end of the QueryParser.jj file: // avoid boosting null queries, such as those caused by stop words if (q != null) { q.setBoost(f); } Kind regards, Ype On Monday 08 December 2003 20:20, Stefan

Re: Real Boolean Model in Lucene?

2003-12-01 Thread Ype Kingma
Ralph, On Monday 01 December 2003 04:11, [EMAIL PROTECTED] wrote: Hi, is it possible to use a real boolean model in lucene for searching. When one is using the Queryparser with a boolean query (i.e. dog AND horse) one does get a list of documents from the Hits object. However these

Re: How to change similarity measure...

2003-12-01 Thread Ype Kingma
On Monday 01 December 2003 05:38, Ralph wrote: Hi, does somebody has an example of how to use another similarity class implementation for searching? Assuming I have implemented MySimilarity class MySimilarity implements Similarity{ how do I have to plug it in to acutally use it for a

Re: raw hit count

2003-11-30 Thread Ype Kingma
Kent, Erik, On Saturday 29 November 2003 17:20, Erik Hatcher wrote: I enjoy at least attempting to answer questions here, even if I'm half wrong, so by all means correct me if I misspeak Me too, :) On Saturday, November 29, 2003, at 06:37 PM, Kent Gibson wrote: All I would like to

Re: Dates and others

2003-11-24 Thread Ype Kingma
Erik, On Sunday 23 November 2003 12:51, Erik Hatcher wrote: On Saturday, November 22, 2003, at 06:33 PM, Dion Almaer wrote: 3. I have some fields suck as title, owner, etc as well as the content blob which I index and use as the default search field. Is there an easy way to extend the

Re: Dates and others

2003-11-23 Thread Ype Kingma
on you data, so you might experiment a bit. You might eg. index all fields seperately, and also index a default concatenated field. Kind regards, Ype Kingma - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e

Re: score for MultipleSearcher

2003-10-08 Thread Ype Kingma
Hui, On Tuesday 07 October 2003 19:31, hui wrote: Hi, When I use the Mutliple index seach on one large index and one small index, look like sometimes the documents from the small index get higher score compared the documents from the big index. But when I look at the score formular, this

Re: scoring algorithm

2003-09-23 Thread Ype Kingma
On Tuesday 23 September 2003 00:12, Chris Hennen wrote: Hi, what is the purpose of tf_q * idf_t / norm_q in Lucene's scoring algorithm: score_d = sum_t( tf_q * idf_t / norm_q * tf_d * idf_t / norm_d_t) I dont understand, why the score has to be higher, when the frequency of a term in the

Re: How to implement Similarity for custom sorting by field ( or by docID)?

2003-07-15 Thread Ype Kingma
fields anyway, this doesn't hurt performance. Kind regards, Ype Kingma - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Advice on updating an index?

2003-07-12 Thread Ype Kingma
Reece, On Friday 11 July 2003 16:05, Wilton, Reece wrote: Hi, I'm having a bit of trouble figuring out the logic for deleting documents from an index. Any advice is appreciated! snip 75% of the experiments 4) I created an index with an IndexWriter and then optimized it and closed it.

Re: Understanding how indexing works

2003-07-03 Thread Ype Kingma
Claes, On Thursday 03 July 2003 05:36, Claes Holmerson wrote: Hi, In my job, I have become the new maintainer of a search feature that uses Lucene. I am trying to understand how it works by examining the index it produces. When I list index fields by opening an IndexReader, looping over

Re: Using Lucene in an multiple index/large io scenario

2003-06-30 Thread Ype Kingma
. Lucene gives you the balance in your hands. Kind regards, Ype Kingma - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: String similarity search vs. typcial IR application...

2003-06-06 Thread Ype Kingma
On Thursday 05 June 2003 14:12, Jim Hargrave wrote: Our application is a string similarity searcher where the query is an input string and we want to find all fuzzy variants of the input string in the DB. The Score is basically dice's coefficient: 2C/Q+D, where C is the number of terms

Re: Finding out which field caused the hit?

2003-05-27 Thread Ype Kingma
because you need have to retrieve the stored field(s) for each document only once. However, it's not as flexible. Kind regards, Ype Kingma - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: multiple collections indexing

2003-03-19 Thread Ype Kingma
Morus, On Wednesday 19 March 2003 00:44, Morus Walter wrote: Hi, we are currently evaluating lucene. The data we'd like to index consists of ~ 80 collections of documents (a few hundred up to 20 documents per collection, ~ 1.5 million documents total; medium document size is in the

Re: Quickest way to build a Document - (Keyword, Freq)* map

2003-02-14 Thread Ype Kingma
On Friday 14 February 2003 15:10, you wrote: Hi, I am using Lucene right now to index several semi-structured documents. I recently had to implement a method 'getFrequencyVector()' to simply return a mapping of keyword - frequency from the information already in the lucene index. I

Re: how to get an extra count

2003-02-04 Thread Ype Kingma
On Tuesday 04 February 2003 09:12, you wrote: Hi all, I'm trying to gather information about my non-searched (ie not used for the search) fields. Let's take an index with 2 fields: 'artist' (for the artist name) an 'type' (for his type of music). I need to perform a search on the 'artist'

Re: Score-Limited Hits?

2003-02-03 Thread Ype Kingma
On Monday 03 February 2003 22:35, you wrote: Is there an existing API that allows you to conduct a search such that only hits with a score greater than X are returned? Not directly, but it's straightforward to compose from Searcher.search(query, hitcollector) and a hitcollector that implements

Re: Book

2002-11-20 Thread Ype Kingma
William, On Wednesday 20 November 2002 21:14, you wrote: I would like to buy a book about Lucene. Who could write it ? : ) AFAIK there is no book, but some articles might help: http://citeseer.nj.nec.com/cs?q=doug+cuttingsubmit=Search+Documentscs=1 Optimizations for Dynamic Inverted Index

Re: Not getting any results from query

2002-11-15 Thread Ype Kingma
On Friday 15 November 2002 14:40, Rob Outar wrote: That is exactly what is happening, I was using the QueryParser class because I wanted to do stuff like this: field1 = value and field2 = value2 or field2 = value3 But from what you are telling me I cannot use the Query Parser class because

Re: Not getting any results from query

2002-11-14 Thread Ype Kingma
On Thursday 14 November 2002 19:36, you wrote: Hello all, I am storing the field in this fashion: doc.add(new Field(releaseability, releaseability, true, true, false)); so it is indexed and stored but not tokenized. The value is Test Releaseability;

Re: How to get all field names

2002-11-12 Thread Ype Kingma
On Tuesday 12 November 2002 18:58, Rob Outar wrote: Enumeration fields() Returns an Enumeration of all the fields in a document. Yes, but it seems there is no such enumerator for a complete index. Regards, Ype Thanks, Rob -Original Message- From: Christoph Kiehl

Re: Working with a Distributed System

2002-11-01 Thread Ype Kingma
On Friday 01 November 2002 15:05, Rob Outar wrote: All, I have what I think is an interesting problem. I am working on a distributed system where all repositories on each node have to be kept in sync. I am using Lucene on each node to index the data. Users are allowed to associate

Re: page searchin, jython

2002-10-31 Thread Ype Kingma
On Thursday 31 October 2002 17:21, Felipe Schnack wrote: What you mean with Jyton? Lucene isn't java? Lucene is written in java, and Jython is also written in java. Jython is an implementation of the python scripting language that allows very easy access to java and to Lucene. Jython ideal

Re: Query Boosting

2002-10-31 Thread Ype Kingma
On Thursday 31 October 2002 18:45, [EMAIL PROTECTED] wrote: Hi, My application requires a facility to have security build into the documents so that when i search for a given word depending on the security credentials stored in a field in the document the results are filtered . Now the

Re: How to count number of entries

2002-10-15 Thread Ype Kingma
On Tuesday 15 October 2002 08:50, you wrote: I want to write a function countIndexEntries(key) to find out how many entries are there in the index database for a key. I read the faq entry about counting number of hits, but somehow it doesnt work as expected, please help: I create entries

Re: indexing documents that arrive in pieces

2002-10-13 Thread Ype Kingma
On Sunday 13 October 2002 04:18, you wrote: What is the cleanest way in Lucene to add documents to an index, if the entire document is not readily available at one time? E.g., I want to index the text as well as the anchor-text of a stream of html pages, where the anchor-text terms get

Re: Phrase match with wildcards e.g. search for st*

2002-10-10 Thread Ype Kingma
Eoin, Get the cvs version and have a look at: org/apache/lucene/search/PhrasePrefixQuery.java It sais: /** * PhrasePrefixQuery is a generalized version of PhraseQuery, with an added * method {@link #add(Term[])}. * To use this class, to search for the phrase Microsoft app* first use *

1.2 source jar incomplete?

2002-10-02 Thread Ype Kingma
Hello, I just downloaded the lucene-1.2-src jar but to my suprise it only contains the analysis and queryParser packages in org/apache/lucene. Is the source jar incomplete or am I looking in the wrong place? Regards, Ype -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional

Re: 1.2 source jar incomplete?

2002-10-02 Thread Ype Kingma
-1.2-src.jar file is included? I looked for an explanation, but couldn't find one. Regards, Ype --- Ype Kingma [EMAIL PROTECTED] wrote: Hello, I just downloaded the lucene-1.2-src jar but to my suprise it only contains the analysis and queryParser packages in org/apache/lucene

Re: Sorting

2002-06-20 Thread Ype Kingma
, Ype Many Thanks, Fanny From: Ype Kingma [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Subject: Re: Sorting Date: Wed, 19 Jun 2002 19:40:59 +0100 Fanny, I want to implement search function using Lucene. As I need to sort the result

Re: Italian web sites

2002-04-24 Thread Ype Kingma
Laura Hi all, I'm using Jobo for spidering web sites and lucene for indexing. The problem is that I'd like spidering only Italian web sites. How can I see discover the country of a web site? Dou you know some method that tou can suggest me? The best method I know is using n-grams of

Re: Search question

2002-04-17 Thread Ype Kingma
Aruna, Hi, I am looking for ways to cancel a search in response to a cancel from a user interface. I don't see any thing like a timeout on the Searcher.search() method. Is there a way to terminate a search request? You can use the low level search api with a collector that checks for cancelling

Re: Question Deleting/Reindexing Files

2002-03-20 Thread Ype Kingma
Joe, Hi, I am using Lucene for indexing a relatively large article based system where articles change from time to time so i have to reindex them. reindexing had the effekt that a query would return the hit for a file multiple times (according to the number of updates. The only solution to

Re: Indexing and Duplication

2002-03-18 Thread Ype Kingma
of docs on the queue can be limited by eg. the total size of the docs. I assumed you need to delete old docs while adding new ones. In case you don't need to delete old docs, you you might not need an index reader at all. Ype Regards, Kelvin - Original Message - From: Ype Kingma [EMAIL

Re: corrupted index

2002-03-17 Thread Ype Kingma
Otis, You can remove the .lock file and try re-indexing or continuing indexing where you left off. I am not sure about the corrupt index. I have never seen it happen, and I believe I recall reading some messages from Doug Cutting saying that index should never be left in an inconsistent

Re: Indexing and Duplication

2002-03-16 Thread Ype Kingma
Kelvin, I've got a little problem with indexing that I'd like to throw to everyone. My objects have a unique identifier. When indexing, before I create a new document, I'd like to check if a document has already been created with this identifier. If so, I'd like to retrieve the document

Re: Adding multiple paths to a document

2002-03-05 Thread Ype Kingma
Grim, I am looking at using lucene to index a large set of documents. In order to be able to search a subset of documents, I've added a path-field to each document (indexed, not stored, not tokenized). Using a prefix-query seems to work fine. My problem: Our documents can have several

Re: lucene web-app russian language

2002-03-01 Thread Ype Kingma
Philipp, Hi! I was trying the lucene web-app (lucene-1.2-rc5-dev.jar). I've created and indexed a simple html document with both english and russian words. it was ANSI encoded, if I check _3.fdt from created index, I can see my document indexed and both russian and english terms indexed (it

Re: Indexing and Searching happening together

2002-01-31 Thread Ype Kingma
Kelvin, In the case where indexing takes a non-trivial amount of time, what is the expected behaviour when a search is performed while indexing is still going on? Would it be a good solution to index in a temporary location, then copying the index files over to the final location when done?

Term ordering for IndexReader.termDocs()

2002-01-25 Thread Ype Kingma
terms. Thanks in advance, Ype Kingma -- -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]

Re: Attribute Search

2001-11-21 Thread Ype Kingma
Paula, I came across a tutorial which had some details on the static factory Field methods. But none of the factory methods return a Field object with the following settings: Store = false Index = true Tokenize = false I'm beginning to think this is a bug - that this combination is handled