Re: Searching with words that contain % , / and the like

2005-01-27 Thread Robinson Raju
Hi Jason , yes , the doc'n does mention escaping . but thats only for special characters used in queries , right ? but i've tried 'escaping' too. to answer ur question , am sure it is not HTTP request which is eating it up. Query query = MultiFieldQueryParser.parse(test/s,

Re: Searching with words that contain % , / and the like

2005-01-27 Thread Chris Lamprecht
Without looking at the source, my guess is that StandardAnalyzer (and StandardTokenizer) is the culprit. The StandardAnalyzer grammar (in StandardTokenizer.jj) is probably defined so x/y parses into two tokens, x and y. s is a default stopword (see StopAnalyzer.ENGLISH_STOP_WORDS), so it gets

Re: text highlighting

2005-01-27 Thread Youngho Cho
Hello, When I used the code with CJKAnalyzer and search English Text (Because the text is mixed with Korean and English ) sometimes the return Stirng is none. Others works well. Is the code analyzer dependancy ? Thanks. Youngho --- Test Code ( Just copy of the Book code ) -

Re: text highlighting

2005-01-27 Thread Youngho Cho
More test result if the text contains ... Family ... Than family query string woks OK. But if the query stirng is Family than the highlighter return none. Thanks. Youngho - Original Message - From: Youngho Cho [EMAIL PROTECTED] To: Lucene Users List lucene-user@jakarta.apache.org

Re: text highlighting

2005-01-27 Thread mark harwood
sometimes the return Stirng is none. Is the code analyzer dependancy ? When the highlighter.getBestFragments returns nothing this is because there was no match found for query terms in the TokenStream supplied. This is nearly always because of Analyzer issues. Check the post-analysis tokens

LuceneRAR nearing first release

2005-01-27 Thread Joseph Ottinger
https://lucenerar.dev.java.net LuceneRAR is now working on two containers, verified: The J2EE 1.4 RI and Orion. Websphere testing is underway, with JBoss to follow. LuceneRAR is a resource adapter for Lucene, allowing J2EE components to look up an entry in a JNDI tree, using that reference to

Different Documents (with fields) in one index?

2005-01-27 Thread Karl Koch
Hello all, perhaps not such a sophisticated question: I would like to have a very diverse set of documents in one index. Depending on the inside of text documents, I would like to put part of the text in different fields. This means in the searches, when searching a particular field, some of

Re: Different Documents (with fields) in one index?

2005-01-27 Thread Otis Gospodnetic
Karl, This is completely fine. You can have documents with different fields in the same index. Otis --- Karl Koch [EMAIL PROTECTED] wrote: Hello all, perhaps not such a sophisticated question: I would like to have a very diverse set of documents in one index. Depending on the inside

Re: Different Documents (with fields) in one index?

2005-01-27 Thread Aad Nales
Nope, it is very possible. We have an index that holds the search info for documents, messages in discussion threads, filled in forms etc. etc. each having their own structure. cheers, Aad Karl Koch wrote: Hello all, perhaps not such a sophisticated question: I would like to have a very

Index Layout Question

2005-01-27 Thread Jerry Jalenak
I am in the process of indexing about 1.5 million documents, and have started down the path of indexing these by month. Each month has between 100,000 and 200,000 documents. From a performance standpoint, is this the right approach? This allows me to use MultiSearcher (or

Reloading an index

2005-01-27 Thread Greg Gershman
I have an index that is frequently updated. When indexing is completed, an event triggers a new Searcher to be opened. When the new Searcher is opened, incoming searches are redirected to the new Searcher, the old Searcher is closed and nulled, but I still see about twice the amount of memory in

Re: Index Layout Question

2005-01-27 Thread Ian Soboroff
Jerry Jalenak [EMAIL PROTECTED] writes: I am in the process of indexing about 1.5 million documents, and have started down the path of indexing these by month. Each month has between 100,000 and 200,000 documents. From a performance standpoint, is this the right approach? This allows me to

RE: Reloading an index

2005-01-27 Thread Cocula Remi
Make sure that the older searcher is not referenced elsewhere otherwise the garbage collector should delete it. Just remember that the Garbage collector runs when memory is needed but not immediatly after changing a reference to null. -Message d'origine- De : Greg Gershman

Boosting Questions

2005-01-27 Thread Luke Shannon
Hi All; I just want to make sure I have the right idea about boosting. So if I boost a document (Document A) after I index it (lets say a score of 2.0) Lucene will now consider this document relativly more important than other documents in the index with a boost factor less than 2.0. This boost

Re: Boosting Questions

2005-01-27 Thread Otis Gospodnetic
Luke, Boosting is only one of the factors involved in Document/Query scoring. Assuming that by applying your boosts to Document A or a single field of Document A increases the total score enough, yes, that Document A may have the highest score. But just because you boost a single Document and

Re: Boosting Questions

2005-01-27 Thread Luke Shannon
Thanks Otis. - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List lucene-user@jakarta.apache.org Sent: Thursday, January 27, 2005 12:11 PM Subject: Re: Boosting Questions Luke, Boosting is only one of the factors involved in Document/Query scoring.

XML index

2005-01-27 Thread Karl Koch
Hi, I want to use kXML with Lucene to index XML files. I think it is possible to dynamically assign Node names as Document fields and Node texts as Text (after using an Analyser). I have seen some XML indexing in the Sandbox. Is anybody here which has done something with a thin pull parser

RE: Index Layout Question

2005-01-27 Thread Jerry Jalenak
That's good to know. I'm indexing on 11 fields (9 keyword, 2 text). The documents themselves are between 1K to 2K in size. Is there a point at which IndexSearcher performance begins to fall off? (in term of # of index records?) Jerry Jalenak Senior Programmer / Analyst, Web Publishing LabOne,

Re: XML index

2005-01-27 Thread Otis Gospodnetic
Hello Karl, Grab the source code for Lucene in Action, it's got code that parses and indexes XML with DOM and SAX. You can see the coverage of that stuff here: http://lucenebook.com/search?query=indexing+XML+section%3A7* I haven't used kXML, but I imagine the LIA code should get you going

Re: Opening up one large index takes 940M or memory?

2005-01-27 Thread Doug Cutting
Kevin A. Burton wrote: Is there any way to reduce this footprint? The index is fully optimized... I'm willing to take a performance hit if necessary. Is this documented anywhere? You can increase TermInfosWriter.indexInterval. You'll need to re-write the .tii file for this to take effect.

Re: Sort Performance Problems across large dataset

2005-01-27 Thread Doug Cutting
Peter Hollas wrote: Currently we can issue a simple search query and expect a response back in about 0.2 seconds (~3,000 results) with the Lucene index that we have built. Lucene gives a much more predictable and faster average query time than using standard fulltext indexing with mySQL. This

query term frequency

2005-01-27 Thread Jonathan Lasko
What do I call to get the term frequencies for terms in the Query? I can't seem to find it in the Javadoc... Thanks. Jonathan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: query term frequency

2005-01-27 Thread David Spencer
Jonathan Lasko wrote: What do I call to get the term frequencies for terms in the Query? I can't seem to find it in the Javadoc... Do you mean the # of docs that have a term? http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#docFreq(org.apache.lucene.index.Term)

Disk space used by optimize

2005-01-27 Thread Kauler, Leto S
Just a quick question: after writing an index and then calling optimize(), is it normal for the index to expand to about three times the size before finally compressing? In our case the optimise grinds the disk, expanding the index into many files of about 145MB total, before compressing down

LuceneReader.delete (term t) Failure ?

2005-01-27 Thread akedar
Hi, I am trying to delete a document from Lucene index using: Term aTerm = new Term( uid, path ); aReader.delete( aTerm ); aReader.close(); If the variable path=xxx/foo.txt then I am able to delete the document. However, if path variable has - in the string, the delete method

google mini? who needs it when Lucene is there

2005-01-27 Thread jian chen
Hi, I was searching using google and just found that there was a new feature called google mini. Initially I thought it was another free service for small companies. Then I realized that it costs quite some money ($4,995) for the hardware and software. (I guess the proprietary software costs a

Re: Disk space used by optimize

2005-01-27 Thread Otis Gospodnetic
Hello, Yes, that is how optimize works - copies all existing index segments into one unified index segment, thus optimizing it. see hit #1: http://www.lucenebook.com/search?query=optimize+disk+space However, three times the space sounds a bit too much, or I make a mistake in the book. :) You

Re: google mini? who needs it when Lucene is there

2005-01-27 Thread Xiaohong Yang \(Sharon\)
Hi, I agree that Google mini is quite expensive. It might be similar to the desktop version in quality. Anyone knows google's ratio of index to text? Is it true that Lucene's index is about 500 times the original text size (not including image size)? I don't have one installed, so I

RE: Disk space used by optimize

2005-01-27 Thread Kauler, Leto S
Our copy of LIA is in the mail ;) Yes the final three files are: the .cfs (46.8MB), deletable (4 bytes), and segments (29 bytes). --Leto -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Hello, Yes, that is how optimize works - copies all existing index

rackmount lucene/nutch - Re: google mini? who needs it when Lucene is there

2005-01-27 Thread David Spencer
This reminds me, has anyone every discussed something similar: - rackmount server ( or for coolness factor, that mini mac) - web i/f for config/control - of course the server would have the following s/w: -- web server -- lucene / nutch Part of the work here I think is having a decent web i/f to

Re: google mini? who needs it when Lucene is there

2005-01-27 Thread John Wang
I think Google mini also includes crawling and a server wrapper. So it is not entirely an 1-to-1 comparison. Of couse extending lucene to have those features are not at all difficult anyway. -John On Thu, 27 Jan 2005 16:04:54 -0800 (PST), Xiaohong Yang (Sharon) [EMAIL PROTECTED] wrote: Hi,

Re: LuceneReader.delete (term t) Failure ?

2005-01-27 Thread Erik Hatcher
How did you index the uid field? Field.Keyword? If not, that may be the problem in that the field was analyzed. For a key field like this, it needs to be unanalyzed/untokenized. Erik On Jan 27, 2005, at 6:21 PM, [EMAIL PROTECTED] wrote: Hi, I am trying to delete a document from

Re: text highlighting

2005-01-27 Thread Youngho Cho
Thanks for your reply. I use QueryParser instead of TermQuery. And all works good !. Thanks. Youngho - Original Message - From: mark harwood [EMAIL PROTECTED] To: lucene-user@jakarta.apache.org Sent: Thursday, January 27, 2005 7:05 PM Subject: Re: text highlighting sometimes the

Re: rackmount lucene/nutch - Re: google mini? who needs it when Lucene is there

2005-01-27 Thread Erik Hatcher
I've often said that there is a business to be had in packaging up Lucene (and now Nutch) into a cute little box with user friendly management software to search your intranet. SearchBlox is already there (except they don't include the box). I really hope that an application like

Re: Reloading an index

2005-01-27 Thread Chris Lamprecht
I just ran into a similar issue. When you close an IndexSearcher, it doesn't necessarily close the underlying IndexReader. It depends which constructor you used to create the IndexSearcher. See the constructors javadocs or source for the details. In my case, we were updating and optimizing the

Re: rackmount lucene/nutch - Re: google mini? who needs it when Lucene is there

2005-01-27 Thread Otis Gospodnetic
I discuss this with myself a lot inside my head... :) Seriously, I agree with Erik. I think this is a business opportunity. How many people are hating me now and going shh? Raise your hands! Otis --- David Spencer [EMAIL PROTECTED] wrote: This reminds me, has anyone every discussed

RE: Disk space used by optimize

2005-01-27 Thread Otis Gospodnetic
Have you tried using the multifile index format? Now I wonder if there is actually a difference in disk space cosumed by optimize() when you use multifile and compound index format... Otis --- Kauler, Leto S [EMAIL PROTECTED] wrote: Our copy of LIA is in the mail ;) Yes the final three

Re: google mini? who needs it when Lucene is there

2005-01-27 Thread Otis Gospodnetic
500 times the original data? Not true! :) Otis --- Xiaohong Yang (Sharon) [EMAIL PROTECTED] wrote: Hi, I agree that Google mini is quite expensive. It might be similar to the desktop version in quality. Anyone knows google's ratio of index to text? Is it true that Lucene's index is

Re: Reloading an index

2005-01-27 Thread Chris Hostetter
: processes ended. If you're under linux, try running the 'lsof' : command to see if there are any handles to files marked (deleted). : Searcher, the old Searcher is closed and nulled, but I : still see about twice the amount of memory in use well : after the original searcher has been

Re: rackmount lucene/nutch - Re: google mini? who needs it when Lucene is there

2005-01-27 Thread Chris Lamprecht
As they say, nothing lasts forever ;) I like the idea. If a project like this gets going, I think I'd be interested in helping. The Google mini looks very well done (they have two demos on the web page). For $5000, it's probably a very good solution for many businesses. If the demos are

Re: rackmount lucene/nutch - Re: google mini? who needs it when Lucene is there

2005-01-27 Thread Jason Polites
I think everyone agrees that this would be a very neat application of opensource technology like Lucene... however (opens drawer, pulls out devil's advocate hat, places on head)... there are several complexities here not addressed by Lucene (et. al). Not because Lucene isn't damn fantastic,

Re: Re: LuceneReader.delete (term t) Failure ?

2005-01-27 Thread akedar
Erik, I am using the keyword field doc.add(Field.Keyword(uid, pathRelToArea)); anything else I can check on ? thanks atul PS we worked together for Darden project From: Erik Hatcher [EMAIL PROTECTED] Date: 2005/01/27 Thu PM 07:46:40 EST To: Lucene Users List

Re: LuceneReader.delete (term t) Failure ?

2005-01-27 Thread Erik Hatcher
Could you work up a self-contained RAMDirectory-using example that demonstrates this issue? Erik On Jan 27, 2005, at 9:10 PM, [EMAIL PROTECTED] wrote: Erik, I am using the keyword field doc.add(Field.Keyword(uid, pathRelToArea)); anything else I can check on ? thanks atul PS we

Re: google mini? who needs it when Lucene is there

2005-01-27 Thread jian chen
Overall, even if google mini gives a lot of cool features compared to a bare-born lucene project, what is good with the 50,000 documents limit. It is useless with that limit. That is just their way of trying to turn it into another cash cow. Jian On Thu, 27 Jan 2005 17:45:03 -0800 (PST), Otis

Re: rackmount lucene/nutch - Re: google mini? who needs it when Lucene is there

2005-01-27 Thread David Spencer
Jason Polites wrote: I think everyone agrees that this would be a very neat application of opensource technology like Lucene... however (opens drawer, pulls out devil's advocate hat, places on head)... there are several complexities here not addressed by Lucene (et. al). Not because Lucene

Search results excerpt similar to Google

2005-01-27 Thread Ben
Hi Is it hard to implement a function that displays the search results excerpts similar to Google? Is it just string manipulations or there are some logic behind it? I like their excerpts. Thanks - To unsubscribe, e-mail:

Re: google mini? who needs it when Lucene is there

2005-01-27 Thread David Spencer
Xiaohong Yang (Sharon) wrote: Hi, I agree that Google mini is quite expensive. It might be similar to the desktop version in quality. Anyone knows google's ratio of index to text? Is it true that Lucene's index is about 500 times the original text size (not including image size)? I don't

Re: Search results excerpt similar to Google

2005-01-27 Thread Jason Polites
I think they do a proximity result based on keyword matches. So... If you search for lucene and the document returned has this word at the very start and the very end of the document, then you will see the two sentences (sequences of words) surrounding the two keyword matches, one from the