1-3 are really answered by the same explanation:
When you open a searcher, lucene "knows" what all the closed segments
are (i.e., the last commit point). And you can't commit when only part
of a document has been written to the current segment. You can think
of commits as atomic at the document le
2. Is the "Index" saved as a file or loaded into the memory?
Adding to Modassar's comments:
Almost all "real" implementations save the index to disk and
read selected portions back in to memory as needed, otherwise
the data isn't permanent. In the Lucene world, I'd start with
NRTCachingDirectory.
Please see my comments in-line.
1. For the indexing of these chapters, how many fields that need to be
declared? Can I just declare only one field for the contents?
This depends on what you need to search with. E.g if only plain content
(chapters) are to be searched then one indexed field is requ
Hi Rahul,
The first thing you should do it get a copy of "Lucene in Action, Second
Edition" book and start reading from head to toe. This book is fabulous
book on lucene and give you complete insight into this framework.
Also if you want to use lucene to develop some kind of seach interface,
then
This book is your best buddy: http://www.manning.com/hatcher3/
On Fri, Mar 2, 2012 at 2:01 PM, rahul reddy wrote:
> Hi ,
>
>
> I'm new to Lucene.Can anyone tell me how can i start learning about it with
> the code base.
> I have knowledge of endeca search engine and have worked on it.
> So, if
I believe creating a large number of fields is not a good match w/the
underlying architecture, and you'd be better off w/a large number of
documents/small number of fields, where the same field occurs in every
document. There is some discussion here:
http://markmail.org/message/hcmt5syca7zdeac
Hey Mike,
My only concern is that I am replacing a large number of fields inside of a
Document with a (very large ~50e6) number of Documents. Will I not run into
the same memory issues? Or do I create only one doc object and reuse it? With
so many Doc/Token pairs, won't searching the index t
I think the solution I gave you will work. The only problem is if a
token appears twice in the same doc:
doc1 has foo with two different sets of weights and frequencies...
but I think you're saying that doesn't happen
On 05/05/2011 06:09 PM, Chris Schilling wrote:
Hey Mike,
Let me clarify:
Oh, yes, they are unique within a document. I was also thinking about
something like this. But I would be replacing a large number of fields within
a document by a large number of documents. Let me see if I can work that out.
On May 5, 2011, at 3:01 PM, Mike Sokolov wrote:
> Are the tokens
Hey Mike,
Let me clarify:
The tokens are not unique. Let's say doc1 contains the token
foo and has the properties weight1 = 0.75, weight2 = 0.90, frequency = 10
Now, let's say doc2 also contains the token
foo with properties: weight1 = 0.8, weight2 = 0.75, frequency = 5
Now, I want to search
Are the tokens unique within a document? If so, why not store a document
for every doc/token pair with fields:
id (doc#/token#)
doc-id (doc#)
token
weight1
weight2
frequency
Then search for token, sort by weight1, weight2 or frequency.
If the token matches are unique within a document you will
Matthew,
Ok, thanks for the clarifications.
When I have some quiet time, I'll try to re-do the tests I did earlier and post
back if any questions.
Thanks again,
Jim
Matthew Hall wrote:
> Oh.. no.
>
> If you specifically include a fieldname: blah in your clause, you don't
> need a Mult
Oh.. no.
If you specifically include a fieldname: blah in your clause, you don't
need a MultiFieldQueryParser.
The purpose of the MFQP is to turn queries like this "blah"
automatically into this "field1: blah" AND "field2: blah" AND "field3:
blah" (Or OR if you set it up properly)
When you
Matthew,
I'll keep your comments in mind, but I'm still confused about something.
I currently haven't changed much in the demo, other than adding that doc.add
for "summary".
With JUST that doc.add, having done my reading, I kind of expected NOT to be
able to search on the "summary" at all, but
You can choose to do either,
Having items in multiple fields allows you to apply field specific
boosts, thusly making matches to certain fields more important to others.
But, if that's not something that you care about the second technique is
useful in that it vastly simplifies your index str
Hi Matthew and Ian,
Thanks, I'll try that, but, in the meantime, I've been doing some reading
(Lucene in Action), and on pg. 159, section 5.3, it discusses "Querying on
multiple fields".
I was just about to try to what's described in that section, i.e., using
MultiFieldQueryParser.parse(), o
Yeah, Ian has it nailed on the head here.
Can't believe I missed it in the initial writeup.
Matt
Ian Lea wrote:
Jim
Glancing at SearchFiles.java I can see
Analyzer analyzer = new StandardAnalyzer();
...
QueryParser parser = new QueryParser(field, analyzer);
...
Query query = parser.parse(li
Jim
Glancing at SearchFiles.java I can see
Analyzer analyzer = new StandardAnalyzer();
...
QueryParser parser = new QueryParser(field, analyzer);
...
Query query = parser.parse(line);
so any query term you enter will be run through StandardAnalyzer which
will, amongst other things, convert it t
Ian and Matthew,
I've tried "foofoo", "summary:foofoo", "FooFoo", and "summary:FooFoo". No
results returned for any of those :(.
Also, Matthew, I bounced Tomcat after running IndexFiles, so I don't think
that's the problem either :(...
I looked at the SearchFiles.java code, and it looks like
Hi
Field.Index.NOT_ANALYZED means it will be stored as is i.e. "FooFoo"
in your example, and if you search for "foofoo" it won't match. A
search for "FooFoo" would, assuming that your search terms are not
being lowercased.
--
Ian.
On Tue, Jul 28, 2009 at 1:56 PM, Ohaya wrote:
> Hi,
>
> I'm
Oh, also check to see which Analyzer the demo webapp/indexer is using.
Its entirely possible the analyzer that has been chosen isn't
lowercasing input, which could also cause you issues.
I'd be willing to bet your issue lies in one of these two problems I've
mentioned ^^
Matt
Matthew Hall
Restart tomcat.
When the indexes are read in at initialization time they are a snapshot
of what the indexes contained at that moment.
Unless the demo specifically either closes its IndexReader and creates a
new one, or calls IndexReader.reopen periodically (Which I don't
remember it doing) y
22 matches
Mail list logo