Re: modify existing non-indexed field

2006-07-11 Thread dan2000
Thanks for your advice Doron. I've tried changing to one indexing thread (instead of 5) but still get the same problem. can't figure out why this happens. -- View this message in context: http://www.nabble.com/modify-existing-non-indexed-field-tf1905726.html#a5266343 Sent from the Lucene - Java

Query?

2006-07-11 Thread WATHELET Thomas
How to parse this kind of query? COM(2006) 0001

combined filesystem and web search

2006-07-11 Thread Tomi NA
I plan to make lucene (and nutch) a key element in an intranet solution, but I only know about lucene what I've read in the last couple of days. Here's what I'd like opinions about. I would like to build a single point of access to data on intranet web pages and LAN shared documents. I've

Missing fields used for a sort

2006-07-11 Thread Rob Staveley (Tom)
If I want to sort on a field that doesn't exist in all documents in my index, can I have a default value for documents which lack that field (e.g. MAXINT or 0)? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional

Query using parenthesis

2006-07-11 Thread WATHELET Thomas
How to parse this query COM(2005) 0123 in LukeAll? I have this result cocnumber: com

Re: Query using parenthesis

2006-07-11 Thread Erik Hatcher
On Jul 11, 2006, at 8:57 AM, WATHELET Thomas wrote: How to parse this query COM(2005) 0123 in LukeAll? I have this result cocnumber: com Your question is not clear. But I'm always happy to lend a hand... Try the query: COM\(2005\) 0123 Parentheses are special characters with

RE: Query using parenthesis

2006-07-11 Thread WATHELET Thomas
I have an index with this field: stored/uncompressed,indexed,tokenizeddocnumber:SEC(2006) 0350. I'm using LukeAll to query myIndex and when I try to search in the docnumber field with this query COM\(2005\) 0123 in the query detail panel I retrive this: docnumber:sec () Do you know LukeAll?

Re: Query?

2006-07-11 Thread Erick Erickson
Could you provide a bit more information? What's important or not about this query? And how does that import relate to what you've indexed? In other words, what do you *want* it to mean? Best Erick

Re: Lucene WordExtractor

2006-07-11 Thread Suba Suresh
There is a separate user mailing list for poi. Use it. There are three jar files. Check the scratchpad jar. You have to send in a FileInputStream(not the filename) as an argument to the WordExtractor class. suba suresh. mcarcelen wrote: Hi all! I´m working with poi-bin-3.0-alpha2-20060616

Re: combined filesystem and web search

2006-07-11 Thread Erick Erickson
I can answer a few of these. If you haven't yet, you'd do yourself a favor to pick up the book Lucene in Action. It's written to the 1.4 code-base, the examples compile but give deprecated warnings for the 1.9 code base, and need a few more tweaks for the 2.0 code base. Also, download a copy of

RE: Lucene WordExtractor

2006-07-11 Thread mcarcelen
Thanks suba Sorry -Mensaje original- De: Suba Suresh [mailto:[EMAIL PROTECTED] Enviado el: martes, 11 de julio de 2006 15:51 Para: java-user@lucene.apache.org Asunto: Re: Lucene WordExtractor There is a separate user mailing list for poi. Use it. There are three jar files. Check the

Re: Query using parenthesis

2006-07-11 Thread Erik Hatcher
On Jul 11, 2006, at 9:28 AM, WATHELET Thomas wrote: I have an index with this field: stored/uncompressed,indexed,tokenizeddocnumber:SEC(2006) 0350. I'm using LukeAll to query myIndex and when I try to search in the docnumber field with this query COM\(2005\) 0123 in the query detail panel I

Re: Query?

2006-07-11 Thread Erick Erickson
What tokenizer did you use to index the document number? Just about all tokenizers split on spaces, so you'd have indexed this as at least two separate terms because of the space before the 0350. I'd really recommend downloading a copy of Luke so you can examine your index and see exactly what

RE: Query?

2006-07-11 Thread WATHELET Thomas
Ok now I have UN_TOKENIZED this field and now in LUKE I see the entire term(SEC(2006) 0123) instead before I only see SEC. And the wonderfull thing now that it's working. Thank's a lot to Erik and Erick. -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: 11 July 2006

Searching for a phrase which spans on 2 pages

2006-07-11 Thread Mile Rosu
Hello, I am working on an application similar to google books which allows searching on documents which represent a scanned page. Of course, one might search for a phrase starting at the end of one page and ending at the beginning of the next one. In this case I do not know how I might treat

RE: Missing fields used for a sort

2006-07-11 Thread Rob Staveley (Tom)
Thanks for the info both of you. Of course Lucene obeys Murphy's law that the missing ones appear first when you reverse sort, which is what Murphy's law says you want to do. Does solr have a custom build of Lucene in it, or is the functionality required to required to get the missing ones to

Re: Missing fields used for a sort

2006-07-11 Thread Yonik Seeley
On 7/11/06, Rob Staveley (Tom) [EMAIL PROTECTED] wrote: Thanks for the info both of you. Of course Lucene obeys Murphy's law that the missing ones appear first when you reverse sort, which is what Murphy's law says you want to do. Does solr have a custom build of Lucene in it, or is the

RE: Missing fields used for a sort

2006-07-11 Thread Rob Staveley (Tom)
I can't thank you enough, Yonik :-) -Original Message- From: Yonik Seeley [mailto:[EMAIL PROTECTED] Sent: 11 July 2006 18:05 To: java-user@lucene.apache.org Subject: Re: Missing fields used for a sort On 7/11/06, Rob Staveley (Tom) [EMAIL PROTECTED] wrote: Thanks for the info both of

Re: Missing fields used for a sort

2006-07-11 Thread Yonik Seeley
Oh, and here is how Solr uses it to construct the correct lucene Sort objects: http://svn.apache.org/viewvc/incubator/solr/trunk/src/java/org/apache/solr/search/Sorting.java?view=markup -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server

SortComparatorSources and ScoreDocComparators

2006-07-11 Thread James Pine
Hey Everyone, I've had success in the past creating my own SortComparatorSources and ScoreDocComparators (basing my code on sec 6.1 from LIA); however, I'm starting to run into some performance issues with large indexes. When I started to probe deeper it seems that enumerating through the

Re: Missing fields used for a sort

2006-07-11 Thread Erick Erickson
On 7/11/06, Rob Staveley (Tom) [EMAIL PROTECTED] wrote: I can't thank you enough, Yonik :-) send money G.

RangeQuery question?

2006-07-11 Thread Van Nguyen
Is there a RangeQuery equivalent that can query date range on two different fields? Term startTerm = new Term(startDate, 20060710); Term endTerm = new Term(endDate, 20060711); RangeQuery q = new RangeQuery(startTerm, endTerm, true);

Re: Searching for a phrase which spans on 2 pages

2006-07-11 Thread Erick Erickson
I can think of several approaches, but the experts will no doubt show me up G.. 1 index the entire book as a single document. Also, index the beginning and ending offset of each page in separate documents. Assuming you can find the offset in the big doc of each matching phrase, you can also find

Re: modify existing non-indexed field

2006-07-11 Thread Doron Cohen
I've tried changing to one indexing thread (instead of 5) but still get the same problem. can't figure out why this happens. The program as listed seems to accesss an existing index - since 'create' is always false for both 'FSDirectory.getDirectoy(,)' and 'new IndexWriter(,,)'. Perhaps an old

Re: combined filesystem and web search

2006-07-11 Thread Tomi NA
On 7/11/06, Erick Erickson [EMAIL PROTECTED] wrote: I can answer a few of these. If you haven't yet, you'd do yourself a favor to pick up the book Lucene in Action. It's written to the 1.4 code-base, the examples compile but give deprecated warnings for the 1.9 code base, and need a few more

Re: combined filesystem and web search

2006-07-11 Thread Tomi NA
On 7/12/06, Steven Rowe [EMAIL PROTECTED] wrote: Tomi NA wrote: I wish people would start selling .pdf books online... :( Your wish is granted: http://www.manning.com/hatcher2/ Wow, that was fast! Thanks for the link. Then there's IndexMergeTool which I haven't used, but looks

Storing Part of Speech information in Lucene Indices

2006-07-11 Thread Amit Kumar
Hi, A new project that I am investigating lucene for needs the Parts of speech information for the tokens. I can get that information using NLP techniques (GATE etc.), by pre processing the documents but I would like to store that information in the Indices. Something along the lines of

Adding synonym-index to an other index

2006-07-11 Thread Ramesh Salla
Hi, can we ever add the WordNet Synonym-Index to an other Index.? I think this is a bit painful process. For now, I retrieve the Synonyms of the words from the Search-Query and hence reform the Search-Query. Will the AddIndexes(indexes) do this for us? Does the Merged Index give meaningful