Re: new to lucene- some questions regarding internals

2015-08-11 Thread Erick Erickson
1-3 are really answered by the same explanation: When you open a searcher, lucene "knows" what all the closed segments are (i.e., the last commit point). And you can't commit when only part of a document has been written to the current segment. You can think of commits as atomic at the document le

Re: new to Lucene

2015-08-07 Thread Erick Erickson
2. Is the "Index" saved as a file or loaded into the memory? Adding to Modassar's comments: Almost all "real" implementations save the index to disk and read selected portions back in to memory as needed, otherwise the data isn't permanent. In the Lucene world, I'd start with NRTCachingDirectory.

Re: new to Lucene

2015-08-07 Thread Modassar Ather
Please see my comments in-line. 1. For the indexing of these chapters, how many fields that need to be declared? Can I just declare only one field for the contents? This depends on what you need to search with. E.g if only plain content (chapters) are to be searched then one indexed field is requ

Re: NEW TO LUCENE

2012-03-05 Thread Saurabh Gokhale
Hi Rahul, The first thing you should do it get a copy of "Lucene in Action, Second Edition" book and start reading from head to toe. This book is fabulous book on lucene and give you complete insight into this framework. Also if you want to use lucene to develop some kind of seach interface, then

Re: NEW TO LUCENE

2012-03-05 Thread Shashi Kant
This book is your best buddy: http://www.manning.com/hatcher3/ On Fri, Mar 2, 2012 at 2:01 PM, rahul reddy wrote: > Hi , > > > I'm new to Lucene.Can anyone tell me how can i start learning about it with > the code base. > I have knowledge of endeca search engine and have worked on it. > So, if

Re: new to lucene, non standard index

2011-05-06 Thread Michael Sokolov
I believe creating a large number of fields is not a good match w/the underlying architecture, and you'd be better off w/a large number of documents/small number of fields, where the same field occurs in every document. There is some discussion here: http://markmail.org/message/hcmt5syca7zdeac

Re: new to lucene, non standard index

2011-05-05 Thread Chris Schilling
Hey Mike, My only concern is that I am replacing a large number of fields inside of a Document with a (very large ~50e6) number of Documents. Will I not run into the same memory issues? Or do I create only one doc object and reuse it? With so many Doc/Token pairs, won't searching the index t

Re: new to lucene, non standard index

2011-05-05 Thread Mike Sokolov
I think the solution I gave you will work. The only problem is if a token appears twice in the same doc: doc1 has foo with two different sets of weights and frequencies... but I think you're saying that doesn't happen On 05/05/2011 06:09 PM, Chris Schilling wrote: Hey Mike, Let me clarify:

Re: new to lucene, non standard index

2011-05-05 Thread Chris Schilling
Oh, yes, they are unique within a document. I was also thinking about something like this. But I would be replacing a large number of fields within a document by a large number of documents. Let me see if I can work that out. On May 5, 2011, at 3:01 PM, Mike Sokolov wrote: > Are the tokens

Re: new to lucene, non standard index

2011-05-05 Thread Chris Schilling
Hey Mike, Let me clarify: The tokens are not unique. Let's say doc1 contains the token foo and has the properties weight1 = 0.75, weight2 = 0.90, frequency = 10 Now, let's say doc2 also contains the token foo with properties: weight1 = 0.8, weight2 = 0.75, frequency = 5 Now, I want to search

Re: new to lucene, non standard index

2011-05-05 Thread Mike Sokolov
Are the tokens unique within a document? If so, why not store a document for every doc/token pair with fields: id (doc#/token#) doc-id (doc#) token weight1 weight2 frequency Then search for token, sort by weight1, weight2 or frequency. If the token matches are unique within a document you will

Re: New to Lucene - some questions about demo

2009-07-28 Thread ohaya
Matthew, Ok, thanks for the clarifications. When I have some quiet time, I'll try to re-do the tests I did earlier and post back if any questions. Thanks again, Jim Matthew Hall wrote: > Oh.. no. > > If you specifically include a fieldname: blah in your clause, you don't > need a Mult

Re: New to Lucene - some questions about demo

2009-07-28 Thread Matthew Hall
Oh.. no. If you specifically include a fieldname: blah in your clause, you don't need a MultiFieldQueryParser. The purpose of the MFQP is to turn queries like this "blah" automatically into this "field1: blah" AND "field2: blah" AND "field3: blah" (Or OR if you set it up properly) When you

Re: New to Lucene - some questions about demo

2009-07-28 Thread ohaya
Matthew, I'll keep your comments in mind, but I'm still confused about something. I currently haven't changed much in the demo, other than adding that doc.add for "summary". With JUST that doc.add, having done my reading, I kind of expected NOT to be able to search on the "summary" at all, but

Re: New to Lucene - some questions about demo

2009-07-28 Thread Matthew Hall
You can choose to do either, Having items in multiple fields allows you to apply field specific boosts, thusly making matches to certain fields more important to others. But, if that's not something that you care about the second technique is useful in that it vastly simplifies your index str

Re: New to Lucene - some questions about demo

2009-07-28 Thread ohaya
Hi Matthew and Ian, Thanks, I'll try that, but, in the meantime, I've been doing some reading (Lucene in Action), and on pg. 159, section 5.3, it discusses "Querying on multiple fields". I was just about to try to what's described in that section, i.e., using MultiFieldQueryParser.parse(), o

Re: New to Lucene - some questions about demo

2009-07-28 Thread Matthew Hall
Yeah, Ian has it nailed on the head here. Can't believe I missed it in the initial writeup. Matt Ian Lea wrote: Jim Glancing at SearchFiles.java I can see Analyzer analyzer = new StandardAnalyzer(); ... QueryParser parser = new QueryParser(field, analyzer); ... Query query = parser.parse(li

Re: New to Lucene - some questions about demo

2009-07-28 Thread Ian Lea
Jim Glancing at SearchFiles.java I can see Analyzer analyzer = new StandardAnalyzer(); ... QueryParser parser = new QueryParser(field, analyzer); ... Query query = parser.parse(line); so any query term you enter will be run through StandardAnalyzer which will, amongst other things, convert it t

Re: New to Lucene - some questions about demo

2009-07-28 Thread ohaya
Ian and Matthew, I've tried "foofoo", "summary:foofoo", "FooFoo", and "summary:FooFoo". No results returned for any of those :(. Also, Matthew, I bounced Tomcat after running IndexFiles, so I don't think that's the problem either :(... I looked at the SearchFiles.java code, and it looks like

Re: New to Lucene - some questions about demo

2009-07-28 Thread Ian Lea
Hi Field.Index.NOT_ANALYZED means it will be stored as is i.e. "FooFoo" in your example, and if you search for "foofoo" it won't match. A search for "FooFoo" would, assuming that your search terms are not being lowercased. -- Ian. On Tue, Jul 28, 2009 at 1:56 PM, Ohaya wrote: > Hi, > > I'm

Re: New to Lucene - some questions about demo

2009-07-28 Thread Matthew Hall
Oh, also check to see which Analyzer the demo webapp/indexer is using. Its entirely possible the analyzer that has been chosen isn't lowercasing input, which could also cause you issues. I'd be willing to bet your issue lies in one of these two problems I've mentioned ^^ Matt Matthew Hall

Re: New to Lucene - some questions about demo

2009-07-28 Thread Matthew Hall
Restart tomcat. When the indexes are read in at initialization time they are a snapshot of what the indexes contained at that moment. Unless the demo specifically either closes its IndexReader and creates a new one, or calls IndexReader.reopen periodically (Which I don't remember it doing) y